Re: Problem with <> and regex

It seems you are trying to handle HTML with regexes. It is a painful way. Instead, take a look at a real parsers to help you: HTML::TreeBuilder, XML::LibXML.

For example, in XML::XSH2, a wrapper around XML::LibXML, you can write just

open :F html file.html ;
my $words = //span[@itemprop="author"]/text() ;
[download]

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Comment on Re: Problem with <> and regex Select or Download Code

Replies are listed 'Best First'.
Re^2: Problem with <> and regex by AnomalousMonk (Archbishop) on Mar 11, 2014 at 22:57 UTC
People often object that using a full-blown HTML/XML parser on "just a simple string" is overkill: it's "too much code". The reply to this is that a "simple string" all too often becomes complicated (*ML is, after all, a complicated spec), and then the overhead of maintaining a regex-based solution can explode. Do you know of a tutorial or discussion on this or any site along the lines of Dominus's Why it's stupid to `use a variable as a variable name' that addresses "Why It's Stupid to Parse HTML/XML With Regexes"?	[reply]
Re^3: Problem with <> and regex by choroba (Cardinal) on Mar 11, 2014 at 23:09 UTC
I usually link to this question on StackOverflow. Its top answer is quite funny, but some of the other answers are more informative. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]