Re^2: PERL scripts-Yahoo Groups Download-i get an error

The actual part of the code that causes problems looks like this:

my ($cells) = $content =~ /<!-- start content include -->\s+(.+?)\s+<!
+-- end content include -->/s;
        
while ($cells =~ /<tr>.+?<span class="title">\s+<a href="(.+?)">(.+?)<
+\/a>\s+<\/span>.+?<\/tr>/sg)
[download]

The $cells variable gets in the while loop unintialized.

Why does this happen? The $content variable contains the HTML Yahoo login page(I've checked it by outputting its contents to a html file).

Also, what does this mean?

"/\s+(.+?)\s+/s"

I can understand only simple m// patterns, so I can't figure what's the string that it searches for.

Please help.

Comment on Re^2: PERL scripts-Yahoo Groups Download-i get an error Select or Download Code

Replies are listed 'Best First'.
Re^3: PERL scripts-Yahoo Groups Download-i get an error by pemungkah (Priest) on Mar 22, 2009 at 19:29 UTC
`$cells` is unititialized because the "find stuff between these to comments" match is failing. The pattern's looking for the shortest string it can find between the start and end comments for the content include, throwing away any leading spaces. Here's how it breaks down: `\s+ # one or more spaces - don't capture these ( # start capturing .+? # one or more anythings, shortest match to what follows ) # end capture` [download] Check the text of the page in `$content`; my guess is that the Groups folks have reformatted their pages, and the comments that this match is looking for have changed.	[reply] [d/l] [select]
Re^4: PERL scripts-Yahoo Groups Download-i get an error by Anonymous Monk on Mar 22, 2009 at 21:18 UTC
So it is copying in $cells every string that is between "( )" from $content? $content is full of strings between "( )", links, functions, arguments,etc.	[reply]
Re^5: PERL scripts-Yahoo Groups Download-i get an error by pemungkah (Priest) on Mar 24, 2009 at 06:14 UTC
Sorry - I assumed a little too much about what you know about regular expressions. In Perl regular expressions, unescaped parentheses signal that we're going to capture anything that matches the pattern inside the parens. There are no parens expected to be matched in the text we're analyzing; they're just a shorthand for "See this pattern here, between these parens? I want to save the stuff that this part of the pattern matches". Here's an example. If we had this: `my($meta_thing) = ($string =~ /(foo\|bar\|baz)/;` [download] we'd be saying, "please look for a 'foo', a 'bar' or a 'baz' string in the variable `$string`. If you find any of those matches, please save the thing you matched into `$meta_thing`". The pattern you have in the code you're looking at essentially says "please find an HTML comment that looks like this (don't save that); then skip however many whitespace characters you find (don't save those either); then start capturing every character you see up to the first time you find the other HTML comment (and save those); when you see the second comment, quit capturing, and save all the stuff you captured into `/$cells`. Don't save the second comment either." The code after that is trying to break out table rows and cells from the stuff that was captured. Does that help clear it up a bit more? I highly recommend reading `perldoc perlre` if you have no Perl books. I also recommend getting hold of "Learning Perl" if you want to learn more about this stuff.	[reply] [d/l] [select]