nutshell has asked for the wisdom of the Perl Monks concerning the following question:

I need some regex powerful enough to parse the following text into $1 (key) and $2 (value):
key => 44, "key" => "fun. \"yes\" 'fun'", 'key' => 'yahoo\' "fun"'
Keep in mind that the very last line wont have a comma at the end.

I'm currently using $text =~ s/'?(\w+)'?\s+=>\s+'?([^']+)'?,?\n/$Data{$1} = $2;/seg; but as you can see it's too limited.

--nutshell

Replies are listed 'Best First'.
Re: Regexing a hash line thing
by zigdon (Deacon) on Oct 21, 2002 at 14:35 UTC
    Is '=>' a valid key? otherwise, I'd just use split:
    my ($key, $val) = split /\s*=>\s*/, $text, 2; $Data{$key} = $val;
    And then massage $key and $val to remove quotes and escapes, if needed.

    -- Dan

      Ah, yes... very good idea!

      let me play with it some :)

      --nutshell

(tye)Re: Regexing a hash line thing
by tye (Sage) on Oct 21, 2002 at 15:15 UTC

    Perhaps this:

    while( <DATA> ) { my @m= m{ ^\s* ( \w+ | '(?:\\.|[^\\']+)*' | "(?:\\.|[^\\"]+)*" ) \s* => \s* ( \w+ | '(?:\\.|[^\\']+)*' | "(?:\\.|[^\\"]+)*" ) \s*,?\s*$ }x; print "(", join("|",@m), $/; } __END__ key => 44, "key" => "fun. \"yes\" 'fun'", 'key' => 'yahoo\' "fun"'
    will work for you.

            - tye => "Tye"
Re: (nrd) Regexing a hash line thing
by newrisedesigns (Curate) on Oct 21, 2002 at 14:41 UTC

    This might be a stretch...

    Why exactly are you doing this? It looks like you are trying to pull from the output of Data::Dumper's dump() function. I believe you could simply eval() this whole thing and pull out the items using foreach(keys %hash).

    I might be off base here, but I'd like to know what you plan on doing with the data you have at hand because there might be an easier solution than regexping.

    John J Reiser
    newrisedesigns.com

Re: Regexing a hash line thing
by broquaint (Abbot) on Oct 21, 2002 at 15:22 UTC
    You could take advantage of the super Regexp::Common
    use Regexp::Common; my $re = qr/ ( \w+ | $RE{quoted} ) \s+ => \s+ ( \w+ | $RE{quoted} ) ,? /x; /$re/ and print "$1: $2\n" while <DATA>; __DATA__ key => 44, "key" => "fun. \"yes\" 'fun'", 'key' => 'yahoo\' "fun"'
    Although this leaves the quotes around both keys and values, which may or may not be desirable, and won't work for anything more complicated than simple constants. So as zigdon has mentioned a simple split should suffice.
    HTH

    _________
    broquaint

Re: Regexing a hash line thing
by nutshell (Beadle) on Oct 21, 2002 at 15:04 UTC
    This is what I did:
    # build an array of chunks that look like parms my (@parms) = $source =~ m/=\s+(?:new\s+)?FooBar(?:->new\s*?)?\((.*?)\ +);/gcs; # Processs the array and build a hash from the chunks. eval "\$Data = {" . join('', @parms) . ' };';
    Works great! And what it does is take the source for a my $obj = new FooBar( 'key' => 'that' ); and put the parms in %Data.

    Thanks for making me think about using split!! :)

    --nutshell

Re: Regexing a hash line thing
by sauoq (Abbot) on Oct 21, 2002 at 19:59 UTC

    Since that looks an awful lot like Perl, you might consider just eval'ing it.

    Yes, this would allow arbitrary Perl code in your data which may be a bad thing. It could be downright stupid if you don't trust the source of the data. You can mitigate the hazards somewhat by using the Safe module but it is likely that the task would not be worth the extra work. In many cases, however, security isn't an issue and using Perl to parse a Perl-like structure saves a lot of time and reduces code complexity. This thread didn't seem complete without the suggestion. Oops,newrisedesigns did mention it.

    #!/usr/bin/perl -w use strict; use Data::Dumper; undef $/; my $text = <DATA>; my $hash = eval '{' . $text . '}'; print Dumper $hash; __DATA__ key0 => 44, "key1" => "fun. \"yes\" 'fun'", 'key2' => 'yahoo\' "fun"'

    Check $@ and make use of $SIG{__WARN__} to catch corrupt data.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Regexing a hash line thing
by nutshell (Beadle) on Oct 21, 2002 at 14:30 UTC
    You'd probably not want to use the above but rather the following to test it--
    key => 44, "key2" => "fun. \"yes\" 'fun'", 'key3' => 'yahoo\' "fun"'

    Thanks,
    --nutshell