Re: How to extract text between two tags?

You should use an xml parser, however, after seeing it even does not have a </body> tag, here is a oneliner I use often:
cat foo.html |perl -ne 'print if /Paper ID/ .. /\/SELECT/'

you also need to unescape & lt ; back to text, see Unescape characters from XML::Twig

curl -s http://forum.vingrad.ru/act-Print/client/printer/f-5/t-326992.
+html |perl -pe 's{<br />}{\n}g' |perl -ne 'print if /Paper ID/ .. /\/
+SELECT/'
[download]

I am surprised the browser can handle and display that webpage...

Comment on Re: How to extract text between two tags? Select or Download Code

Replies are listed 'Best First'.
Re^2: How to extract text between two tags? by Anonymous Monk on May 28, 2015 at 22:31 UTC
Many will complain that you should use an xml parser, however You don't need an XML parser to parse html, HTML::TreeBuilder will do just fine	[reply]
Re^3: How to extract text between two tags? by bitingduck (Deacon) on May 28, 2015 at 22:46 UTC
Not only will HTML::TreeBuilder do fine, but if it's an HTML file an XML parser is likely to die quickly on it. XML parsers are required to fail on invalid XML, while HTML parsers are allowed to be more forgiving (e.g. HTML::TreeBuilder defaults to inserting implicit end tags that would cause an XML parser to quit)	[reply]
Re^4: How to extract text between two tags? by Anonymous Monk on May 28, 2015 at 23:19 UTC
:) And then there is XML::LibXML , it can `load_html` just fine, see xpather.pl for example	[reply]