Re: Pattern Search on HTML source.

Although you have correctly made your regex non-greedy, it "can't get a match" because 1nd (in the regex) does not match 1st (in the data):

my $output = "<!-- 1st table -->What I want 1<!-- /1st table -->more s
+tuff...<!-- 2st movie -->What I want 2<!-- /2st movie -->...more stuf
+f...<!-- 3st movie -->What I want 3<!-- /3st movie -->...more stuff";

if ($output =~ /<!-- 1st table -->(.*?)<!-- \/1st table -->/g) {
    print  $1;
}
else{
    print "Nothing Here!";
}
[download]

cheerfully spits out

perl 23.pl What I want 1

However

Your /g isn't doing what your think. You've tried to specify a single set of tags. /g will find the content between them if they're repeated, but it won't find "2st ^sic movie
Your pseudo-html makes no sense: tables without rows or data cells?
Using LWP or similar, if you're not, could save you the trouble of saving the source data as a text file
It's a tad peculiar to name the input FH in your code as "OUTPUT"
and, if you're going to parse html, use a module. There are just too many ways to go wrong while rolling your own.

Comment on Re: Pattern Search on HTML source. Download Code

Replies are listed 'Best First'.
Re^2: Pattern Search on HTML source. by Anonymous Monk on Dec 31, 2007 at 19:05 UTC
The problem is that is the tags has sometihing like: `my $output = "<!-- 1st table --> What I want 1<!-- /1st table -->more stuff...<!-- 2st movie --> What I want 2<!-- /2st movie -->...more stuff...<!-- 3st movie -->What + I want 3<!-- /3st movie -->...more stuff";` [download] Like a carriage return or something like that I can't get it to match.	[reply] [d/l]
Re^3: Pattern Search on HTML source. by ww (Archbishop) on Dec 31, 2007 at 19:20 UTC
am: use the `download` download link beneath the code to capture it rather than copy-pasting... or remove the newlines from what you copy-pasted until you have the $output as a single line in your editor. and... updating the previous: I realized, belatedly, that you appear to want to capture the contents of all the tag pairs, rather than just the first. Sorry, the code I posted captures only the first and so far, I haven't worked out a simple (aka, "elegant") and understandable way to do them all with a regex. CF advise to use an html parser or (new suggestion) a module designed to deal with matching pairs. Perhaps wiser monks will offer more particular suggestions.	[reply]