Zapano has asked for the wisdom of the Perl Monks concerning the following question:

Hi, what does this do: $xml =~ m|<$fieldname>(.*?)</$fieldname>|i Thanx, Zapano.

Replies are listed 'Best First'.
Re: regex and XML
by CountZero (Bishop) on Dec 13, 2010 at 07:39 UTC
    use Modern::Perl; use YAPE::Regex::Explain; my $fieldname = 'SomeTag'; say YAPE::Regex::Explain->new(qr|<$fieldname>(.*?)</$fieldname>|i)->ex +plain();
    gives you:
    The regular expression: (?i-msx:<SomeTag>(.*?)</SomeTag>) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?i-msx: group, but do not capture (case-insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- <SomeTag> '<SomeTag>' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- </SomeTag> '</SomeTag>' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    But it says nothing of how inappropriate the naive approach of parsing XML with a simple regular expression is.

    Using an XML parser is advised.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: regex and XML
by Anonymous Monk on Dec 13, 2010 at 07:16 UTC
    Probably not what you want it to. The variable probably doesn't have metacharacters quoted, the end tag you'll find probably won't be the right one, and you should probably be using an XML parser to parse XML.

    But if you're asking what regular expressions do, perhaps you should look at the documentation.

Re: regex and XML
by chrestomanci (Priest) on Dec 13, 2010 at 09:23 UTC

    Another bit of the puzzle is that in the example you have give, the regular expression is being quoted using pipe symbols (|) instead of the usual forward slash. (Perl allows any non alphanumeric character to be used to quote a regular expression, if it starts with an explicit m).

    As others have said, this is an unreliable way to parse regular expressions for a number of reasons. It is much better to use an existing XML parsing library from CPAN.

      Another bit of the puzzle is that in the example you have give, the regular expression is being quoted using pipe symbols (|) instead of the usual forward slash.

      Whoever wrote the regex probably did this to avoid having to escape the / in the closing tag. Personally, I always use /, so I would have written $xml =~ /<$fieldname>(.*?)<\/$fieldname>/i

      Talking about parsing XML, what easy-to-use XML parser would the monks recommend? From the quick look I have taken at the documentation of a couple of XML parsers, it would take a week for me to figure out how to do something like "print out the content of every XYZ tag in the document, each one on a new line".
        If you just want to extract certain elements, I'd say XPath is the way to go. There is XML::XPath and XML::XPath::Simple.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James