in reply to Parsing to a hash suggestions

With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:

C:\test>874313 { File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { value => "physical" }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { value => "1.0" }, }, value => "\"foo\"", }, } my $hash = { File => { Header => { Contents => { Flags => {}, Dictionary => {}, Properties => + {} }, Key => { value => "phy", }, Precision => { Units => { value => "mil" }, Dec => { value + => 1 } } Revision => { log => { value => 0 }, phy => { value => 0 }, oth => { value => 0 } }, Version => { value => 1.0, }, } value => "foo", } };

Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.

Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.

Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:

{ File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { physical => {} }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { "1.0" => {} }, }, value => "\"foo\"", }, }

which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.

The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.

The code:

#! perl -slw use strict; use Data::Dump qw[ pp ]; $|++; sub seeNextToken { my( $next ) = $_[0] =~ m[\s*(\S+)]; return $next; } sub getNextToken { $_[0] =~ s[\s*(\S+)\s+][] or die; return $1; } #my $depth = 0; sub parse { local $^W; ## alias rather than copy the input, so that we can modify it our $in; local *in = \$_[0]; my $ref = {}; my $token = getNextToken( $in ); die 'No open paren' unless $token eq '('; my $name = getNextToken( $in ); my $value; if( seeNextToken( $in ) !~ '[()]' ) { $value = getNextToken( $in ); } $ref->{ value } = $value if defined $value; # printf "%s n:$name v:$value (next:%s) in:$in\n", ' .' x $depth++, seeNextToken( $in ); while( seeNextToken( $in ) eq '(' ) { my( $name, $value ) = parse( $in ); $ref->{ $name } = $value; } die 'Missing close paren' unless getNextToken( $in ) eq ')'; ## fix up the single, single anomaly if( keys( %$ref ) == 1 ) { my( $key, $value ) = each %$ref; if( ref $value eq 'HASH' and keys( %$value ) == 0 ) { delete $ref->{ $key }; $ref->{ value } = $key; } } # --$depth; return $name, $ref; } my $input = do{ local $/; <DATA> }; $input =~ s[\s+][ ]gsm; my $ref = { parse( $input ) }; pp $ref; __DATA__ ( File "foo" ( Header ( Key ( physical ) ) ( Version ( 1.0 ) ) ( Revision ( log 0 ) ( phy 0 ) ( oth 0 ) ) ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) ( Precision ( Units mil ) ( Dec 1 ) ) ) )

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Parsing to a hash suggestions
by chasdavies (Initiate) on Dec 01, 2010 at 20:59 UTC
    Thank you very much. This is exactly what I needed. I have a tested it on more robust sample of data and it worked flawlessly, albeit I did need to tweak it a bit. Many many thanks and hope you have a great day.

    Best regards,
    Charlie