Yesterday I needed to do a quick fix of an XML file; all occurrences of a field containing 11-digit numbers needed to be replaced with the MD5 hash of said number. The structure of the XML file is very simple so there was no need to parse the actual structure to solve the problem at hand.

So, I threw together this little script to get the job done:

#!/usr/bin/perl use strict; use warnings; use Digest::MD5 qw( md5_hex ); while (my $xml = <STDIN>) { $xml =~ s/(\d{11})/md5_hex($1)/eg; print $xml; }
This did the job, because there were no other 11+ digit numbers in this particular XML file. Problem solved.

But what if there were? What if I only wanted to change 11 digit numbers enclosed in a particular tag? Let's say I wanted to inject more than just a single sub?

$xml =~ s/(\<tag\>)(\d{11})(\<\/tag\>)/$1md5_hex($2)$3/eg;
This ofcourse doesn't work because there is no such variable as "$1md5_hex". And if I start throwing "s and .s in there, I'm just telling Perl to insert those characters into my output.

Can someone point me in the right direction here? What if I really needed to substitute with a combination of static text and a sub? I can imagine ways of doing it manually ofcourse so that's beside the point. Can it be done using a simple regex substitution?

Or is the best/only solution in this case to write a "wrapper" sub to create the exact string to substitute with?

$xml =~ s/(\<tag\>)(\d{11})(\<\/tag\>)/transformed($1,$2,$3)/eg; ... sub transformed { my ($prefix,$number,$suffix) = @_ return $prefix . md5_sum($number) . $suffix; }
-- FloydATC

Time flies when you don't know what you're doing


In reply to Regex substitute with both a sub and other data by FloydATC

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.