in reply to Parsing to a hash suggestions
With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:
C:\test>874313 { File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { value => "physical" }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { value => "1.0" }, }, value => "\"foo\"", }, } my $hash = { File => { Header => { Contents => { Flags => {}, Dictionary => {}, Properties => + {} }, Key => { value => "phy", }, Precision => { Units => { value => "mil" }, Dec => { value + => 1 } } Revision => { log => { value => 0 }, phy => { value => 0 }, oth => { value => 0 } }, Version => { value => 1.0, }, } value => "foo", } };
Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.
Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.
Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:
{ File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { physical => {} }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { "1.0" => {} }, }, value => "\"foo\"", }, }
which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.
The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.
The code:
#! perl -slw use strict; use Data::Dump qw[ pp ]; $|++; sub seeNextToken { my( $next ) = $_[0] =~ m[\s*(\S+)]; return $next; } sub getNextToken { $_[0] =~ s[\s*(\S+)\s+][] or die; return $1; } #my $depth = 0; sub parse { local $^W; ## alias rather than copy the input, so that we can modify it our $in; local *in = \$_[0]; my $ref = {}; my $token = getNextToken( $in ); die 'No open paren' unless $token eq '('; my $name = getNextToken( $in ); my $value; if( seeNextToken( $in ) !~ '[()]' ) { $value = getNextToken( $in ); } $ref->{ value } = $value if defined $value; # printf "%s n:$name v:$value (next:%s) in:$in\n", ' .' x $depth++, seeNextToken( $in ); while( seeNextToken( $in ) eq '(' ) { my( $name, $value ) = parse( $in ); $ref->{ $name } = $value; } die 'Missing close paren' unless getNextToken( $in ) eq ')'; ## fix up the single, single anomaly if( keys( %$ref ) == 1 ) { my( $key, $value ) = each %$ref; if( ref $value eq 'HASH' and keys( %$value ) == 0 ) { delete $ref->{ $key }; $ref->{ value } = $key; } } # --$depth; return $name, $ref; } my $input = do{ local $/; <DATA> }; $input =~ s[\s+][ ]gsm; my $ref = { parse( $input ) }; pp $ref; __DATA__ ( File "foo" ( Header ( Key ( physical ) ) ( Version ( 1.0 ) ) ( Revision ( log 0 ) ( phy 0 ) ( oth 0 ) ) ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) ( Precision ( Units mil ) ( Dec 1 ) ) ) )
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parsing to a hash suggestions
by chasdavies (Initiate) on Dec 01, 2010 at 20:59 UTC |