ForeverLearning has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for a quick way to change the date format within an XML file from UTC (e.g. Fri, 18 Mar 2016 03:41:43 GMT) to an XPATH compliant format so that I can use XPATH date queries.

All the dates I need to change are nicely contained within <Last-Modified></Last-Modified> tags.

I have tried this... perl -pe 's/<Last-Modified>(.*)<\/Last-Modified>/`date -d \"$1\" \"+%s\"`/ge&&s/\n//' mydoc.xml

But despite the presence of the global flag, it only seems to change the first and not all instances within the doc ?

What am I missing ? ;-(

p.s. yes, I know epoch is not xpath but that was just for my test purposes

Replies are listed 'Best First'.
Re: Global matching not working as expected
by Corion (Patriarch) on Mar 19, 2016 at 09:35 UTC

    Depending on how line-oriented your XML is, that regular expression might find none, one or all matches.

    See perlre on "greedy" - .* will match as much as possible, including an intervening </Last-Modified> tag if possible.

    There are two approaches to fixing this. The easy approach is to read the whole file into memory and do the replacement directly:

    perl -0777 -le 's!<Last-Modified>(.*?)</Last-Modified>!<Last-Modified> +`date ...`</Last-Modified>!ge' mydoc.xml

    The saner way is to use one of the XML parsing modules, for example XML::Twig.

    Also, I would look at Time::Piece or Time::Local for doing the time conversion instead of shelling out to the date program for every replacement.

      Even corion's regex will fail if there is a newline anywhere between the tags. This can easily be fixed with the /s modifier, but are there are probably more special cases yet to be found. Use a proven module.
      Bill
      Yes, you're right. I decided to go the saner route and use XML::Twig & Time::Piece. ;-)
Re: Global matching not working as expected
by afoken (Chancellor) on Mar 19, 2016 at 10:56 UTC
    perl -pe 's/<Last-Modified>(.*)<\/Last-Modified>/`date -d \"$1\" \"+%s\"`/ge&&s/\n//' mydoc.xml

    Let's hope that mydoc.xml never contains something like this:

    <Last-Modified>hehe"; rm -rf / ; true "</Last-Modified>

    NEVER pass unverified user input to a shell (i.e. ``, qx(), system $string, exec $string), and NEVER without quoting problematic characters. And don't hope that all shells have the same quoting rules. They differ even for the common unix shells (see e.g. http://www.in-ulm.de/~mascheck/various/), and it gets very much worse as soon as you leave unix. CMD.EXE and especially COMMAND.COM just cause mental illness. If you want to stay sane, try to avoid the shell. Stay with perl modules, or at least use the list forms of system and exec. perlipc has some examples for safely replacing `` and qx().

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)