Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This snippet below isn't parsing meta tags the way it should. This is using LWP::Simple and the source code (which works) is in $content.

This needs to match ANY meta tag in the format shown in the script, and yes I know I shouldn't do it this way but that's the question isn't on using a different module, it's how to fix this problem. Problem is, meta tags aren't always on the same line, they can all be bunched together like a paragraph and that's where this must be messing up.

Nothing prints at all, when I use an array that's split on /n it doesn't work because not all meta tags are separated by new lines.

Where is the error in this, anyone know?

my @meta_results; my $count = 0; my @lines = split /\n/, $content; while(<$content>) { if (/<meta name=\"(.+?)\" content=\"(.+?)\">/gi) { $count++; $meta_results[$count] = "$1::$2"; } } foreach (@meta_results) {print "$_\n";}

Replies are listed 'Best First'.
Re: meta parsing problems
by Joost (Canon) on May 23, 2004 at 21:59 UTC
Re: meta parsing problems
by exussum0 (Vicar) on May 23, 2004 at 20:27 UTC
      So what's the solution for the string problem? The array didn't work so I was forced to use a scalar, which obviously doesn't work either.

      I'm prepared for the reversed meta tags, I thought about that already but that's to be worked on after the meta tags work the first time around with this.

      thanks

Re: meta parsing problems
by Ctrl-z (Friar) on May 23, 2004 at 22:54 UTC
    this isnt fool proof, but it'll probably do what you want...
    foreach( $content =~ m#<meta (.*?)>#sgoi ) { my $name = $1 if( $_ =~ m#name\s*?=\s*?["'](.*?)["']#sgoi); my $cont = $1 if( $_ =~ m#content\s*?=\s*?["'](.*?)["']#sgoi); }



    time was, I could move my arms like a bird and...
      Eventhough the thread's a bit old... There are problems with this approach. You should rellay consider using HTML::TreeBuilder, it's as easy as
      use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new()->parse($data); for my $tag ($tree->look_down( _tag => "meta")) { $kWords{$tag->attr("name")} = $tag->attr("content"); }
      The above code takes care of spaces/linebreaks &s.o. And its fast and widely used. Just my 5cents. FJ
Re: meta parsing problems
by mrpeabody (Friar) on May 24, 2004 at 23:37 UTC
    This needs to match ANY meta tag in the format shown in the script, and yes I know I shouldn't do it this way but that's the question isn't on using a different module, it's how to fix this problem. Problem is, meta tags aren't always on the same line, they can all be bunched together like a paragraph and that's where this must be messing up.
    You don't want to hear about using a module, but then you complain about having to solve precisely the nontrivial problem that the module is designed to solve. "I'm having trouble mowing my lawn with these scissors. I don't want to use my lawnmower, so don't suggest that. How can I make these scissors work better?"

    You can either use the module and be done with a robust solution in ten minutes, or spend hours re-implementing the module and have a weak, brittle solution. Your choice.