in reply to Re: Help extracting text from XML data
in thread Help extracting text from XML data

Would it?

<string>Everyone knows that 1 &lt; 2</string>

Replies are listed 'Best First'.
Re^3: Help extracting text from XML data
by BrowserUk (Patriarch) on Oct 21, 2008 at 00:16 UTC

    Yes.

    $xml = '<string>Everyone knows that 1 &lt; 2</string>';; print $xml =~ m[>([^<]+)</string>]sm;; Everyone knows that 1 &lt; 2

    From the spec:

    The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&" and "<" respectively.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      No.

      use XML::Simple; my $data = XMLin( '<string>Everyone knows that 1 &lt; 2</string>'); print $data; # ==> Everyone knows that 1 < 2

      Your code did not unescape the string. And before you attempt to add that, keep in mind that there might have been <string><![CDATA[Everyone knows that 1 < 2]]></string>. Or the encoding specified by the <?xml ...?> might have been different and there might have been some accentuated characters that need to be converted. Or. Or. Or. If you do know your files will never contain anything like that, go ahead. But don't say your script processes XML then, because it doesn't.

        To be fair, BrowserUK certainly didn't claim that his regex processes XML, only that it does the job as reliably as the other possibilities. Since 'the job' was rather under-specified ("extract someresult from the following string …", which could of course be done by perl -e 'print "someresult\n"'), I think it's difficult to say that BrowserUK's solution doesn't (or does, for that matter) do it.