grabbing and printing text

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

What I am doing is content migration from .htmls to .asps. Im only grabbing everything thats within the body tags and copying it over to the new asp in the same directory/folder. I also need to capture any of the meta information found in the meta tag. Like

<meta http-equiv="Keywords" content="Sun, Sun Bank, Bank, banking>
[download]

This content can sometimes be found on multiple lines. I am able to grab the content that is found on only one line by the following code

#if the meta is only on one line
        if (/\<meta(.*)\>/i) {
         # grab the meta tag lines
         print OUTFILE $_ . "\n";
        }
[download]

I am having trouble when I need to grab the content from multiple lines the code I am using to try and do this is


        #grabbing and printing everything between the meta tags if it 
+is on multiple lines
        if (/<meta.*?>/i ... /.*?>/i){
         # this is a title line
          # extract the title
        
        
         $meta_temp = $_;
         $meta_temp =~ s/(.*?)\<meta\>(.*?)\>/$2/i;
         chomp($meta_temp);

         $meta = "$meta_temp" ;
        
         # Write the meta to the output file
         print OUTFILE $meta . "\n";
        }
[download]

Comment on grabbing and printing text Select or Download Code

Replies are listed 'Best First'.
Re: grabbing and printing text by merlyn (Sage) on May 09, 2001 at 19:31 UTC
HTML::HeadParser is great at grabbing metadata. Just say no to hand-rolled regex! -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: grabbing and printing text by Anonymous Monk on May 09, 2001 at 19:36 UTC
Merlyn, thanks for the tip, but I have no idea how to use a HTML parser, I just started using Perl about a week ago. If you have any tips for how to do this without a parser, I would really appreciate it.	[reply]
Re: Re: Re: grabbing and printing text by merlyn (Sage) on May 09, 2001 at 19:37 UTC
You are hand-writing a parser. You'll always be "using a parser". It will take you less time to learn to use HTML::HeadParser than it will to study the HTML specifications and learn to write regular expressions that match arbitrary meta-equiv strings correctly. Trust me on that. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: grabbing and printing text by ChemBoy (Priest) on May 09, 2001 at 20:12 UTC
It's a pretty straight-forward module to use--start with the docs from perldoc.com, and if you can't figure it out from there, post what you're having trouble with (though you might first want to read this note, to make sure you're not overlooking something obvious). Note that you will have to install the whole HTML::Parser distribution to make any of this work. But it'll still be easier than writing a regex-based parser yourself. If God had meant us to fly, he would never have give us the railroads. --Michael Flanders	[reply]