in reply to Re: Help with getting everything from X until Y
in thread Help with getting everything from X until Y

Thanks gents - these are all quite useful. I tamed-down the HTML a bit in my original post. It was actually a bit uglier. Here's my results w/ the 3 examples:
local $/ = "<br>"; while ( <DATA> ) { print $1 if /Title:(.*?)<br>/s; }
This works nice, but I got a lot of white spaces. Also, for some reason, I was getting an extra <\\b> in addition to the STRING_I_WANT. Easy to take out, but seem my question at the bottom if you could. #2:
$whole_file =~ /Title:.*?</b>(.*?)<br>/ms;
I actually assumed this would be the easiest way. I tried:
undef $/; chomp($whole_file = <IN>); $whole_file =~ s/.*?Title:.*?<\/b>(.*?)<br>/$1/ ; print "$whole_file";
The only thing this got me was the whole file printed out. :(

#3
while ( <DATA> ) { print if /<b>/ .. /<br>/; }
This worked well. It would print out the whole line if it found something that fit in the range description. But I'm curiuos on one thing on this and the first example - how would I put the value into a variable when using the
print if /<b>/ .. /<br>/;
Thanks for your help. I'm making some progress but I'm still "in the books", so there's quiet a few tricks I've still yet to learn.

Replies are listed 'Best First'.
Re: Re: Re: Help with getting everything from X until Y
by rob_au (Abbot) on May 29, 2003 at 05:21 UTC
    When the while is written in this manner, each line that is processed is assigned to the default perl variable, $_ - When subsequent matching and printing is performed without specifying a variable or specific string to act upon, the use of the default variable $_ is assumed and it is this which is acted upon.

    See the perlvar and perlop man pages for further detail.

     

    perl -le 'print+unpack"N",pack"B32","00000000000000000000001001100010"'

Re: Re: Re: Help with getting everything from X until Y
by perrin (Chancellor) on May 29, 2003 at 13:18 UTC
    Modifiers matter. And didn't you say there was more than one occurrence in this file? Here's an untested example:

    undef $/; chomp($whole_file = <IN>); while ($whole_file =~ /Title:.*?<\/b>(.*?)<br>/sg) { print $1 . "\n"; }

      Just for the sake of scientific experiment I tested it and it works ... sort of.

      As there is an end-of-line in front of the first 'STRING_I_WANT' and no end-of-line characters after it or around the 'ANOTHER STRING_I_WANT', the result looks rather odd, difficult to read and (almost) impossible to use in a meaningful way.

      If you want to further use the result, it seems better to put some 'start' and 'stop' markers around the results:

      use strict; use warnings; undef $/; chomp(my $whole_file = <DATA>); print "***\n$whole_file\n***\n"; $whole_file =~ s/.*?Title:.*?<\/b>\s?(.*?)<br>/<WANTED>$1<\/WANTED>\n/ +gs ; print "$whole_file"; __DATA__ a whole lot of worthless stuff <b>Title: </b> STRING_I_WANT<br> more worthless meaningless stuff or sometimes <b>Title: </b>ANOTHER STRING_I_WANT<br>

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law