in reply to Re: html page search/parse
in thread html page search/parse

Sorry to sound dumb, but...

...it doesn't work. The two values are empty. What exactly is going on in this regex expression? If somebody could explain it bit by bit I could probably figure it out.

Thanks!
~evan

PS. What are the +'s about? Are they supposed to be there?

Replies are listed 'Best First'.
Re: Re: Re: html page search/parse
by BrowserUk (Patriarch) on Jun 19, 2003 at 00:09 UTC

    Sorry, my bad. Try this.

    #! perl -slw use strict; my $re = qr[ <!--QBlastInfoBegin # Match the start of comment \s+ # 1 or more whitespace including newlines RID # 'RID' literal \s+ # One or more whitespace = # '=' \s+ # more whitespace ( # start capturing to $1 [\d-]+ # 1 or more '0-9' or '-' ) # end capture \s+ # yet more whitespace RTOE # 'RTOE' literal \s+ # And more whitespace = # '=' literal \s+ # more ( # start capture to $2 \d+ # 1 or more digits ) # end capture \s+ # more whitespace QBlastInfoEnd # the end token \s+ # final whitespace (including newlines) --> # The end comment card ]x; # Ignore incidental spacing and comments in + regex. my $html = do{ local $/; <DATA> }; Grab the data from <DATA> into a st +ring my( $RID, $RTOE ) = $html =~ $re; # Execute the regex and assign the c +aptures to variables. print "RID:$RID RTOE:$RTOE"; # Print the results. __DATA__ <!--QBlastInfoBegin RID = 1055976860-01972-17207 RTOE = 7 QBlastInfoEnd -->

    Without the verbose commenting, the (now tested and working) regex looks like this

    my $re = qr[ <!--QBlastInfoBegin \s+ RID \s+ = \s+ ( [\d-]+ ) \s+ RTOE \s+ = \s+ ( \d+ ) \s+ QBlastInfoEnd \s+ --> ]x;

    The +'s mean match 1 or more of the preceeding element. See perlre and perlretut for more.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Re: Re: html page search/parse
by The Mad Hatter (Priest) on Jun 18, 2003 at 23:45 UTC
    He didn't allow for spaces between the two equals signs. Try this version:
    my $re = qr[<!--QBlastInfoBegin \s+ RID \s* = \s* ([\d-]+) \s+ RTOE \s* = \s* (\d+) \s+ QBlastInfoEnd \s+ -->]x;
    As for the pluses, they are quantifiers and make the expression match one or more spaces (in this case). See perldoc perlre for more info.