Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

regular expression- match a string

by hweefarn (Acolyte)
on Dec 12, 2003 at 12:13 UTC ( [id://314283] : perlquestion . print w/replies, xml ) Need Help??

hweefarn has asked for the wisdom of the Perl Monks concerning the following question:

hello everyone :)
i have a small problem here. i try to extract something from a text input file, but i cant get exactly what i want.

this is the input file example:

From: hweefarn@yahoo.com
Subject: forward to me
To: hweefarn@hotmail.com
Message-id:
<20031212113224.26750.qmail@web10409.mail.yahoo.com>
*******************************************************************************

this is my coding:

if (/^Subject:(\s+)(\S+).*$/)
{
$subject = "$2";
}
******************************************************************************

after run this coding, the output is only "forward".

how can i get the whole subject, "forward to me" ?

thank you very much.

regards,
hweefarn

Replies are listed 'Best First'.
Re: regular expression- match a string
by Abigail-II (Bishop) on Dec 12, 2003 at 12:17 UTC
    if (/^Subject:\s+(.*)/) {$subject = $1}

    Abigail

Re: regular expressions - match the string
by b10m (Vicar) on Dec 12, 2003 at 12:20 UTC
    You get only "forward", because you ask regexp to filter it for you ;)
    if (/^Subject:(\s+)(\S+).*$/
    This means, show me the whitespaces between "Subject:" and the next non-whitespace as $1. Then, show me every non-whitespace character to the next whitespace, as $2.

    This will solve it:
    if (/^Subject:(\s+)(.*)$/) { $subject = $2; }
    --
    b10m

    Update: Spelling ... Me no speaky Engrish ...
Re: regular expression- match a string
by davido (Cardinal) on Dec 12, 2003 at 16:45 UTC
    Since (\S+) means to capture all contiguous non-whitespace, the part that gets captured must end when the RE engine encounters a space after "forward".

    You're successfully capturing a space into $1, and "forward" into $2.

    The RE you probably seek is:

    /^Subject:\s+(\S.*)$/
    That way you capture beginning with the first non-space character after "Subject: ", and continue your match, including whitespace between words, through the end of the line. $1 will contain your subject.


    Dave