Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a buffer of data which contains some strings I want to replace, but I want to put the replacing string before and after where the "bad" string was: Lets say the input data has the line
<entry>22<?Pub _hardspace?>AWG or larger</entry>
I want the output to be
<entry><xx>22 AWG</xx> or larger</entry>
(Yes, it is XML, and no , I am not using XML::Simple - I am fighting a different battle trying to install that!) So, I am using this relatively simple search and replace
perl -pi -e 's/(\S+)<\?Pub \_hardspace\?\>(\S+)/<xx\>$1 $2<\/xx\>/gs' +test
but it's coming out a little incorrect as
<xx><entry>22 AWG</xx> or larger</entry>
ie the <xx> is starting too early - I'm matching a word with the "S" so maybe that is matching '<entry>22' rather than just 22 ?? Any ideas ? The character immediately preceeding could be anything, not just '>' ...

Replies are listed 'Best First'.
Re: reg expression needs a tweek
by jethro (Monsignor) on Mar 15, 2010 at 17:38 UTC
    The character immediately preceeding could be anything

    If that were true, how could anyone find out where the real string begins? Surely some characters are not allowed there, probably numbers and/or a-z

    If you have problems installing XML::Simple, maybe try XML:Twig or some other XML Parser

Re: reg expression needs a tweek
by JavaFan (Canon) on Mar 15, 2010 at 17:30 UTC
    You may be able to get away by replacing \S with [^\s>] (untested). But are you sure you want \S? It does mean that:
    <entry>2 2<?Pub _hardspace?>AWG or larger</entry>
    turns into
    <entry>2 <xx>2 AGM</xx> or larger</entry>
Re: reg expression needs a tweek
by Anonymous Monk on Mar 15, 2010 at 17:28 UTC
    If you can't write it yourself, use a proper XML parser :D
Re: reg expression needs a tweek
by Anonymous Monk on Mar 16, 2010 at 07:58 UTC
    You should have some logical thing like -- by which it should be separated... we cannot do it by magic. So decide which you want like -- a preceding angle bracket or space or something else.