phi95aly has asked for the wisdom of the Perl Monks concerning the following question:

I've got a string inherited from an outside source that looks like this: $data = "five[5]=12345&ten[10]=12345=678=90&six[6]=123456" Now, the goal is to turn this into a name, value hash. I could do this like this:
my ($name, $value); my @vars = split "&", $data; foreach (@vars) { ($name, $value) = split "="; $hash->{$name} = $value; }
problem is, hash values can both have ampersands (&) and equal signs (=) in it, causing this to break. As you can see, each name is appended by the length index of it's value. I'm now trying to write a reg expression that will split this apart based on this length index:
while ($data =~ /(.*?)\[(\d+)\]=(.{[B]$2[/B]})/g) { $hash->{$1} = $3; }
However, this doesnt work because i cant figure out how to reference the length index (captured with $2) in the regex itself. The example i've given above where i simply included $2 as the length match doesnt work. also, each value still ends with an ambersand (&). If i were to include this into the regex it wouldnt match the last value which isnt followed by one. Anybody has an idea how to fix this or a more elegant way to do this. thank you.

Replies are listed 'Best First'.
Re: splitting string based on length index
by ikegami (Patriarch) on Nov 05, 2004 at 22:18 UTC

    Why does ten[10] have 12 characters after the =? If I change ten[10] to ten[12], the following works.

    #$data = "five[5]=12345&ten[10]=12345=678=90&six[6]=123456"; $data = "five[5]=12345&ten[12]=12345=678=90&six[6]=123456"; for (;;) { $data =~ s/(.+?)\[(\d+)\]=// or last; $hash{$1} = substr($data, 0, $2, ''); $data =~ s/^&// or last; } die("Bad input\n") if length $data; use Data::Dumper; print Dumper(\%hash);

    This works too, but it's probably slower:

    #$data = "five[5]=12345&ten[10]=12345=678=90&six[6]=123456"; $data = "five[5]=12345&ten[12]=12345=678=90&six[6]=123456"; for (;;) { $data =~ s/(.+?)\[(\d+)\]=((??{".{$2}"}))// or last; $hash{$1} = $3; $data =~ s/^&// or last; } die("Bad input\n") if length $data; use Data::Dumper; print Dumper(\%hash);
      may i say you are a genius, ikegami. this is the second time in days you have helped enormously. this works splendidly. and yes, you right, the 10 should be 10 characters.
        $h{$1} = substr $str, 0, $2, '' while $str =~ s/^&?([^\[]+)\[(\d+)\]=/ +/;

        cheers

        tachyon

Re: splitting string based on length index
by diotalevi (Canon) on Nov 05, 2004 at 22:31 UTC

    This isn't a job for a regular expression. You have to decide whether you have read enough in perl code and you can't get that decision back to the regex engine without using experimental features. This is what I was able to throw together in a moment.

    my %hash; $data = "five[5]=12345&ten[10]=12345=678=90&six[6]=123456" my $reading_key = 1; my $key; my $expected_length; my $value_length; my $value; while ( $reading_key ? $data =~ /([^=]*)(=)/g : $data =~ /([^=&]*)([&=]|$)/g ) { my $chunk = $1; my $separator = $2; if ( $reading_key ) { # I expect the chunk to be complete and to # contain a number in square brackets to indicate how many # non-&/= characters the value contains. ( $key, $expected_length ) = ( $chunk =~ /^([^\[]+)\[(\d+)\]/ ); $reading_key = 0; $value = ''; $value_length = 0; } else { $value_length += length $chunk; $value .= $chunk . $separator; if ( $value_length >= $expected_length ) { $hash->{ $key } = $value; $reading_key = 1; $value = $value_length = $expected_length = undef; } } }
Re: splitting string based on length index
by BrowserUk (Patriarch) on Nov 05, 2004 at 23:16 UTC

    This seems too simple, which makes me suspicious that I've missed something?

    $data = "five[5]=12345&ten[12]=12345=678=90&six[6]=123456"; $hash{ $1 } = $2 while $data =~ m/ ( [a-z]+ ) \[ \d+ \] = (.*?) (?= (?: & [a-z]+ \[ ) | $ ) /gx; print Dumper \%hash;; $VAR1 = { 'six' => '123456', 'five' => '12345', 'ten' => '12345=678=90' };

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: splitting string based on length index
by TedPride (Priest) on Nov 06, 2004 at 13:20 UTC
    $hash{$1} = substr($data, pos($data)+1, $2) while ($data =~ /(\w+)\[(\ +d+)\]/g);
    If keys can contain other values than \w, you might want to change that. Otherwise, this works just fine.