trippledubs has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I'm not sure how to parse this structured text using a regular expression or maybe multiple regular expressions. What are some better approaches or approaches that actually work. Thanks for help.
#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; sub parse { my $text = shift; my $values; my $reg = qr/ ([A-Z]+) #Capital letters \n? #Possible new line \| #Literal | (?: #Begin key value pairs \|?(.*?) #Key = (.*?) #Value \| #Begins new kv pair )+ /xs; while ( $text =~ m/$reg/g ) { print "Match!\n"; print $text; push @{$values->{$1}}, { $2 => $3 }; } return $values; } my $text = join '', <DATA>; my $values = parse($text); print Dumper $values; #not what I want __DATA__ CAR MODEL|name=Mustang|Quality=A|Speed= MODEL|Chevy=Cavalier|MPG=20 MODEL |Color=blue|type=hatchback|crashrating=spectacular |explodes=true|speed=|storage=none,little

Replies are listed 'Best First'.
Re: Parse structured text using regex
by kcott (Archbishop) on Mar 17, 2014 at 07:26 UTC

    You're reading your data twice: once into $text and again with a regex. I assume your data is in a file and you've only shown a small portion of it.

    Now that you've shown the data structure you're trying to create (in "Re^2: Parse structured text using regex"), I'd probably parse it line-by-line.

    #!/usr/bin/env perl use strict; use warnings; my $parsed = parse(\*DATA); use Data::Dump; dd $parsed; sub parse { my ($fh) = @_; my ($parsed, $car_key, $model_key); my $last_record_only_has_key = 0; my $new_model_array = 1; while (<$fh>) { my ($leader, $info) = /^([A-Z]*)[|]?(\S*)\s*$/; if ($leader) { if (! $info) { $car_key = $model_key if $last_record_only_has_key; $car_key = $leader unless $car_key; $last_record_only_has_key = 1; } $model_key = $leader; $new_model_array = 1; next unless $info; } $last_record_only_has_key = 0; for (split /[|]/ =>$info) { my ($key, $value) = split /=/; if ($new_model_array) { push @{$parsed->{$car_key}}, {$model_key => { $key => +$value }}; $new_model_array = 0; } else { $parsed->{$car_key}[-1]{$model_key}{$key} = $value; } } } return $parsed; } __DATA__ CAR MODEL|name=Mustang|Quality=A|Speed= MODEL|Chevy=Cavalier|MPG=20 MODEL |Color=blue|type=hatchback|crashrating=spectacular |explodes=true|speed=|storage=none,little

    Output:

    { CAR => [ { MODEL => { name => "Mustang", Quality => "A", Speed => "" + } }, { MODEL => { Chevy => "Cavalier", MPG => 20 } }, { MODEL => { Color => "blue", crashrating => "spectacular", explodes => "true", speed => "", storage => "none,little", type => "hatchback", }, }, ], }

    -- Ken

      Thanks!!
Re: Parse structured text using regex
by Anonymous Monk on Mar 16, 2014 at 20:52 UTC
      Thanks for help. I see that split needs to be in it somewhere. What I got from that is:
      [ [["global_model", "CAR"]], [ [ "pairs", "\nMODEL", "Make=Mustang", "Quality=A", "Speed=\nMODEL", "Chevy=Cavalier\nMODEL\n", "Color=blue", "type=hatchback", "crashrating=spectacular\n", "explodes=true", "speed=", "storage=none,little\n", ], ], ]
      What I am looking for is this:
      { CAR => [ { MODEL => { Make => "Mustang", Quality => "A", Speed => un +def } }, { MODEL => { Chevy => "Cavalier" } }, { MODEL => { Color => "Blue", crashrating => "spectacular", explodes => "true", speed => undef, storage => "none,little", type => "hatchback", }, }, ], }
      Maybe I am missing something...?
        "What I am looking for is this: { ... }"

        That should have appeared in your OP, not here.

        You've been here almost two years and, by now, should have read "How do I post a question effectively?". Please follow its guidelines in any future postings.

        -- Ken