How to extract text present in 3 lines within the HTML tags

Saket has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to extract text present in 3 lines within the HTML tags by Nikhil Jain (Monk) on May 17, 2011 at 07:24 UTC
It would be nice practice if you use html parsers for parsing a html like HTML::Parser, Same practice is also mentioned in perlfaq6 - How do I match XML, HTML, or other nasty, ugly things with a regex? if you really want to have regex for it then try `my $str ="<title> This is a very good site for regular Expressions. Very helpful. Thanks. </title>"; #print"$str\n"; $str =~ m#<title>(.*?)</title>#gis; my $matched_output = $1; #print"\n$matched_output\n";` [download]	[reply] [d/l]
Re: How to extract text present in 3 lines within the HTML tags by moritz (Cardinal) on May 17, 2011 at 07:28 UTC
Let somebody else write the regexes for you: `use Mojo::DOM; print Mojo::DOM->new->parse($yourtext)->at('title')->text;` [download] Requires Mojolicious to be installed. Perl 6 - second systems done right	[reply] [d/l]
Re^2: How to extract text present in 3 lines within the HTML tags by Anonymous Monk on May 17, 2011 at 07:33 UTC
Or URI::Title or Web::Scraper, or WWW::Mechanize#$mech->title()	[reply]
Re^3: How to extract text present in 3 lines within the HTML tags by jellisii2 (Hermit) on May 17, 2011 at 12:41 UTC
and since that snippet is XML valid, you can use XML::Twig	[reply]
Re^2: How to extract text present in 3 lines within the HTML tags by raybies (Chaplain) on May 17, 2011 at 12:41 UTC
moritz wrote: Let somebody else write the regexes for you: He kinda did, by asking perlmonks... :)	[reply]
Re: How to extract text present in 3 lines within the HTML tags by Anonymous Monk on May 17, 2011 at 07:13 UTC
Read perlintro, perlrequick, perlretut regex html ?node_id=3989;HIT=regex%20html;re=N html regex problem	[reply]
Re: How to extract text present in 3 lines within the HTML tags by ambrus (Abbot) on May 17, 2011 at 15:04 UTC
Come on, just use a real full HTML parser. `use warnings; use XML::Twig; our $doc = q( <title> This is a very good site for regular Expressions. Very helpful. Thanks. </title> <p> Some other text we don't want to extract. ); my $twig = XML::Twig->new; $twig->parse_html($doc); my($title_elt) = $twig->findnodes("//title"); my $title = $title_elt->trimmed_text; print "$title\n" __END__` [download]	[reply] [d/l]
Re: How to extract text present in 3 lines within the HTML tags by wind (Priest) on May 17, 2011 at 15:58 UTC
HTML::Parser comes with an example specifically for that: htitle However, I'd probably go with HTML::TreeBuilder::XPath: `use HTML::TreeBuilder::XPath; use strict; use warnings; my $data = do {local $/; <DATA>}; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($data); print $tree->findvalue('//title'); __DATA__ <html> <head> <title> This is a very good site for regular Expressions. Very helpful. Thanks. </title> </head> <body> <p>Hello world</p> </body> </html>` [download]	[reply] [d/l]