Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

What I am doing is content migration from .htmls to .asps. Im only grabbing everything thats within the body tags and copying it over to the new asp in the same directory/folder. I also need to capture any of the meta information found in the meta tag. Like
<meta http-equiv="Keywords" content="Sun, Sun Bank, Bank, banking>
This content can sometimes be found on multiple lines. I am able to grab the content that is found on only one line by the following code
#if the meta is only on one line if (/\<meta(.*)\>/i) { # grab the meta tag lines print OUTFILE $_ . "\n"; }
I am having trouble when I need to grab the content from multiple lines the code I am using to try and do this is
#grabbing and printing everything between the meta tags if it +is on multiple lines if (/<meta.*?>/i ... /.*?>/i){ # this is a title line # extract the title $meta_temp = $_; $meta_temp =~ s/(.*?)\<meta\>(.*?)\>/$2/i; chomp($meta_temp); $meta = "$meta_temp" ; # Write the meta to the output file print OUTFILE $meta . "\n"; }

Replies are listed 'Best First'.
Re: grabbing and printing text
by merlyn (Sage) on May 09, 2001 at 19:31 UTC
      Merlyn, thanks for the tip, but I have no idea how to use a HTML parser, I just started using Perl about a week ago. If you have any tips for how to do this without a parser, I would really appreciate it.
        You are hand-writing a parser. You'll always be "using a parser". It will take you less time to learn to use HTML::HeadParser than it will to study the HTML specifications and learn to write regular expressions that match arbitrary meta-equiv strings correctly. Trust me on that.

        -- Randal L. Schwartz, Perl hacker

        It's a pretty straight-forward module to use--start with the docs from perldoc.com, and if you can't figure it out from there, post what you're having trouble with (though you might first want to read this note, to make sure you're not overlooking something obvious).

        Note that you will have to install the whole HTML::Parser distribution to make any of this work. But it'll still be easier than writing a regex-based parser yourself.



        If God had meant us to fly, he would *never* have give us the railroads.
            --Michael Flanders