FloydATC has asked for the wisdom of the Perl Monks concerning the following question:

Yesterday I needed to do a quick fix of an XML file; all occurrences of a field containing 11-digit numbers needed to be replaced with the MD5 hash of said number. The structure of the XML file is very simple so there was no need to parse the actual structure to solve the problem at hand.

So, I threw together this little script to get the job done:

#!/usr/bin/perl use strict; use warnings; use Digest::MD5 qw( md5_hex ); while (my $xml = <STDIN>) { $xml =~ s/(\d{11})/md5_hex($1)/eg; print $xml; }
This did the job, because there were no other 11+ digit numbers in this particular XML file. Problem solved.

But what if there were? What if I only wanted to change 11 digit numbers enclosed in a particular tag? Let's say I wanted to inject more than just a single sub?

$xml =~ s/(\<tag\>)(\d{11})(\<\/tag\>)/$1md5_hex($2)$3/eg;
This ofcourse doesn't work because there is no such variable as "$1md5_hex". And if I start throwing "s and .s in there, I'm just telling Perl to insert those characters into my output.

Can someone point me in the right direction here? What if I really needed to substitute with a combination of static text and a sub? I can imagine ways of doing it manually ofcourse so that's beside the point. Can it be done using a simple regex substitution?

Or is the best/only solution in this case to write a "wrapper" sub to create the exact string to substitute with?

$xml =~ s/(\<tag\>)(\d{11})(\<\/tag\>)/transformed($1,$2,$3)/eg; ... sub transformed { my ($prefix,$number,$suffix) = @_ return $prefix . md5_sum($number) . $suffix; }
-- FloydATC

Time flies when you don't know what you're doing

Replies are listed 'Best First'.
Re: Regex substitute with both a sub and other data
by tobyink (Canon) on Aug 23, 2013 at 10:35 UTC

    When you use /e then the stuff in the second part of s/// needs to be valid Perl code. $1md5sum($2)$3 is not valid Perl code; you need to add some concatenation operators (.) in there...

    $xml =~ s/(\<tag\>)(\d{11})(\<\/tag\>)/$1.md5_hex($2).$3/eg;

    Or, prettier:

    $xml =~ s{ (<tag>) (\d{11}) (</tag>) }{ $1 . md5_hex($2) . $3 }xeg;
    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name"

      ... or even prettier (but untested):
          $xml =~ s{ <tag> \K (\d{11}) (?= </tag>) }{ md5_hex($1) }xmseg;

Re: Regex substitute with both a sub and other data
by choroba (Cardinal) on Aug 23, 2013 at 10:36 UTC
    Just put a normal Perl expression to the pattern with /e.
    $xml =~ s/(\<tag\>)(\d{11})(\<\/tag\>)/ $1 . md5_hex($2) . $3 /eg;

    Update: Or, use a proper XML handling tool:

    { package XML::XSH2::Map; use Digest::MD5 qw( md5_hex ); } use XML::XSH2; xsh << '__XSH__'; open 1.xml; for my $n in //num[xsh:matches(., '^[0-9]{11}$')]/text() { my $s = string($n); insert text { md5_hex($s) } replace $n ; } ls / ; __XSH__
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      So simple. *smack*... Thanks both of you :-D

      -- FloydATC

      Time flies when you don't know what you're doing

Re: Regex substitute with both a sub and other data
by Laurent_R (Canon) on Aug 23, 2013 at 18:49 UTC

    I would probably stick to the solutions already suggested with the concatenation operator, but you could also use zero-width positive look-behind and look-ahead assertions to check whether the tags are there around your 11-digit number, but not consume them in your substitution. Something like (?<= pattern) and (?= pattern), with pattern describing your start and end tags.