in reply to Re^2: Help extracting text from XML data
in thread Help extracting text from XML data

Yes.

$xml = '<string>Everyone knows that 1 &lt; 2</string>';; print $xml =~ m[>([^<]+)</string>]sm;; Everyone knows that 1 &lt; 2

From the spec:

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&" and "<" respectively.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^4: Help extracting text from XML data
by Jenda (Abbot) on Oct 21, 2008 at 10:51 UTC

    No.

    use XML::Simple; my $data = XMLin( '<string>Everyone knows that 1 &lt; 2</string>'); print $data; # ==> Everyone knows that 1 < 2

    Your code did not unescape the string. And before you attempt to add that, keep in mind that there might have been <string><![CDATA[Everyone knows that 1 < 2]]></string>. Or the encoding specified by the <?xml ...?> might have been different and there might have been some accentuated characters that need to be converted. Or. Or. Or. If you do know your files will never contain anything like that, go ahead. But don't say your script processes XML then, because it doesn't.

      To be fair, BrowserUK certainly didn't claim that his regex processes XML, only that it does the job as reliably as the other possibilities. Since 'the job' was rather under-specified ("extract someresult from the following string …", which could of course be done by perl -e 'print "someresult\n"'), I think it's difficult to say that BrowserUK's solution doesn't (or does, for that matter) do it.

        I understand the sentiment behind the first link, though now that I've found my way of handling XML I do not share the need anymore. The thread in the second link looks enormous and the first 20 or so messages do not seem to lead anywhere. And I have no idea what did you mean by the last link. Sorry.