pop18 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have a string like

<citref idrefs="cit">5c,5d</citref>

and want to get the output as

<citref idrefs="cit5c cit5d">5<it>c</it>,5<it>d</it></citref>

Please help!
Thanks,
POP

Replies are listed 'Best First'.
Re: Capture a string
by olus (Curate) on Feb 28, 2008 at 12:12 UTC
    use strict; use warnings; my $s1 = '<citref idrefs="cit">5c,5d,5Es</citref>'; my $s2 = '<qwerty alrefs="xad">5c,5d,5Es</qwerty>'; print parse($s1)."\n"; print parse($s2)."\n"; sub parse { my $orig = shift; my ($cit, $idrefs, $values, @values); $orig =~ m/.*?>(.*)<\//; $values = $1; @values = split ',', $values; $orig =~ m/.*?"(.*)">/; $cit = $1; $idrefs .= "$cit$_ " for @values; chop $idrefs; $orig =~ s/(.*?=").*(">.*)/$1$idrefs$2/; @values = map{ s/([a-z]+)/<it>$1<\/it>/i; $_;} @values; $values = join ',', @values; $orig =~ s/>.*?</'>'.$values.'<'/es; return $orig; }
    And the output
    <citref idrefs="cit5c cit5d cit5Es">5<it>c</it>,5<it>d</it>,5<it>Es</i +t></citref> <qwerty alrefs="xad5c xad5d xad5Es">5<it>c</it>,5<it>d</it>,5<it>Es</i +t></qwerty>
Re: Capture a string
by Punitha (Priest) on Feb 28, 2008 at 11:53 UTC

    Hi,

    I have tried to produce the exact output of yours with the same input, i got these codes,

    use strict; while(<DATA>){ chomp; $_=~s/(<citref idrefs=\")([^"]*)(\">)((?:(?!<\/citref>).)*)(<\/cit +ref>)/$1.idgen($2,$4).$3.citeref($4).$5/sgie; print "$_\n"; } sub idgen{ my ($id,$idcon) = @_; if($idcon =~/,/){ $idcon=~s/([^,]+)(?=,|$)/$id$1/gi; $idcon=~s/,/ /gi; } else{ $idcon=$id.$idcon; } return($idcon); } sub citeref{ my ($con) = @_; if($con =~/,/){ my (@con) = split/,/,$con; map{s/[a-z]+/<it>$&<\/it>/i} @con; $con =join(',',@con); } else{ $con =~s/[a-z]+/<it>$&<\/it>/gi; } return($con); }

    __DATA__ <citref idrefs="cit">5c,5d</citref> <citref idrefs="cit">5d</citref>

    But please explain us furthermore to provide you the better solution like,

      1. The content of the 'citref' tag (i.e) it will always contain the comma or not
      2. The idrefs generated in the output should be preceded with the idref content of input or it will always be 'cit'

    Punitha

Re: Capture a string
by grizzley (Chaplain) on Feb 28, 2008 at 11:58 UTC
    With one regexp or with piece of code containing regexp? What exactly is the format of idrefs attribute (only letters a-z)? What is the format of value inside markers? Is it <digit><letter>,<digit><letter> (2 data strings) or <digit><letter>,<digit><letter>,... (2 or more) or maybe <any char><any char>,...?