biswanath_c has asked for the wisdom of the Perl Monks concerning the following question:


Hi

I have this scenario: I have XML documents that are just 1 liners - the whole doc would be in a single line

I'd like to find the 1st and last occurrence of an XML node in an XMl doc.

I used something liek this - this works very well if i want to find the 1st occurrence of an XML node which is named, say :

if ( $doc =~ m/a:b>(.*?)</) { $data = $1; }

this would just pop the value of the 1st occurrence of the node and populate it into $data - this works perfectly fine. Now, I'd like to get the value of the last occurrence of the node . How can i do it? I do not want to use XML parsing - I want to use regex.


  • Comment on How to find the value of the first and last occurence of an XML node?
  • Download Code

Replies are listed 'Best First'.
Re: How to find the value of the first and last occurence of an XML node?
by Your Mother (Archbishop) on Feb 19, 2010 at 01:15 UTC

    The fact that something is one line is not relevant. Consider this document as a single line-

    $the_oxford_english_dictionary =~ s/\r?\n/ /g;

    It wouldn't make it any easier to parse. In your case, I'm guessing you're resisting a parser because your XML is invalid (aka, garbage). A parser *is* the way to go and if you provide real sample XML you'll probably get help with it. That said, this might do what you want without a parser-

    my $doc = '<?xml version="1.0"?><root><a:b>one</a:b><a:b>two</a:b></ro +ot>'; my $last = [ $doc =~ /a:b>([^<]+)/g ]->[-1]; print $last, $/;

    It won't be reliable or flexible though. :|

Re: How to find the value of the first and last occurence of an XML node?
by Corion (Patriarch) on Feb 19, 2010 at 08:14 UTC

    If you use an XPath capable parser/query engine, this is very easy, as getting the first and last node is just:

    my $doc = ...; # read/parse XML file my @first = $doc->findNodes('./*[1]'); my @last = $doc->findNodes('./*[last()]');

    This is one of the huge advantages that XML (and XPath) buy you - your code gets much clearer and quicker to write.

Re: How to find the value of the first and last occurence of an XML node?
by ww (Archbishop) on Feb 18, 2010 at 23:12 UTC
    You statement that you "do not want to use XML parsing" reminds me of the observation that "the lawyer representing himself has a fool for a client."

    If you reject the best advice here, you're representing yourself.

    There's good reason to avoid using regexen for your task: By and large, the mature XML modules are well tested, reliable, and consistent. They reflect the developer's (or developers') extensive knowledge of XML and the potential "edge-case gotcha's" ...and the community's.

    Your regex isn't going to have those advantages.

    Afterthought: You'll want to use &lt; and &gt; when you need less-than and greater-than symbols in narrative text... and foreswear the use of </br> (it isn't among the PM-approved markup tags.) Markup in the Monastery may help.

Re: How to find the value of the first and last occurence of an XML node?
by Anonymous Monk on Feb 18, 2010 at 22:54 UTC
    Use XML parser, its not a joke
Re: How to find the value of the first and last occurence of an XML node?
by zeni (Beadle) on Feb 19, 2010 at 07:19 UTC

    It is relatively easier if you use Expat Parser to parse the xml file and extract the contents based wrt tag. You may just require two functions strt and end.