wx27 has asked for the wisdom of the Perl Monks concerning the following question:

I have a similar problem to http://www.perlmonks.org/?node_id=647133 but can't get the \G solution to work. I have data coming in that has a variable number of data pairs, with pipe delimiting. It should be <name>foo</name><value>bar</value>

The current feed comes in as:

<name>test</name><value>431|alpha|123|bravo|542|charlie|412</value>
so I am transforming this via:
perl -pe "do{s/(\<value\>.+?)\|(.+?)\|(.*?)(?=\<\/value\>)/$1\<\/value +\>\<name\>$2\<\/name\>\<value\>$3/gi;} while /\|/;"
which gives me the correct output of
<name>test</name><value>431</value><name>alpha</name><value>123</value +><name>bravo</name><value>542</value><name>charlie</name><value>412</ +value>
When I try to recode using Oha's solution with \G, it seems to only process the line once
perl -pe "s/(\<value\>|\G)(.+?)\|(.+?)\|(.*?)(?=\<\/value\>)/$1$2\<\/v +alue\>\<name\>$3\<\/name\>\<value\>$4/g" <name>test</name><value>431</value><name>alpha</name><value>123|bravo| +542|charlie|412</value>

What am I doing wrong here?

Replies are listed 'Best First'.
Re: Recursive substitution difficulties
by ikegami (Patriarch) on Mar 15, 2010 at 20:49 UTC
    Your pattern with less leaning toothpicks (removed needless escapes of "<" and ">", and used an alternate delimiter to avoid escaping "/"):
    s{(<value>|\G)(.+?)\|(.+?)\|(.*?)(?=</value>)} {$1$2</value><name>$3</name><value>$4}sg;

    What it should match for each pass of /g:

    Pass 1: <name>(test)</name><value>(431)| Pass 2: (alpha)|(123)| Pass 3: (bravo)|(542)| Pass 4: (charlie)|(412)</value>

    What it does match for each pass of /g:

    Pass 1: <value>(431)|(alpha|123|bravo|542|charlie|412)

    Fix:

    s{(?:<name>(.*?)</name><value>|\G([^|]*?)\|)([^|]*?)(?:\||</value>)}{ "<name>" . (defined($1)?$1:$2) . "</name><value>$3</value>" }seg;

    A much more robust fix:

    s{<name>(.*?)</name><value>(.*?)</value>}{ my $s = "|$1|$2"; $s =~ s{\|([^|]*)\|([^|]*)}{<name>$1</name><value>$2</value>}sg; $s }seg;
Re: Recursive substitution difficulties
by ikegami (Patriarch) on Mar 15, 2010 at 20:27 UTC
    s{(<name>.*?</name>)<value>(.*?)</value>}{ $1 . join '', map "<value>$_</value>", split /\|/, $2 }seg;

    Better version for 5.10+:

    s{<name>.*?</name>\K<value>(.*?)</value>}{ join '', map "<value>$_</value>", split /\|/, $1 }seg;

    Update: Oops, didn't pay enough attn to the desired output.

Re: Recursive substitution difficulties
by wx27 (Initiate) on Mar 15, 2010 at 20:52 UTC

    If it helps, the original data is coming in from a tab-delimited file, and I have control over getting that data from

    [tab]test|431|alpha|123|bravo|542|charlie|412[tab]
    to any intermediate step necessary.

    Right now, the field above is being dumped into my input XML file as:

    <name>test</name><value>431|alpha|123|bravo|542|charlie|412</value>

      My earlier solution can easily be adapted:
      s{\t([^\t]+)\t}{ my $s = "|$1"; $s =~ s{\|([^|]*)\|([^|]*)}{<name>$1</name><value>$2</value>}sg; $s }eg;

        Thanks. That looks pretty straightforward. Is it possible to use that all on a command line invocation? Right now I have something like:

        perl -pe "s/^(.*?)\t(.*?)\|(.*?)\t(.*?)(?=\n)/<wrapper1>$1</wrapper1>< +wrapper2>$2\|$3</wrapper2><wrapper3>$4</wrapper3>/g;" inputfile
        and then a follow-up to expand wrapper2 and the delimited fields within.