Re: Parsing to a hash suggestions

With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:

C:\test>874313
{
    File => {
        Header => {
            Contents => { Dictionary => {}, Flags => {}, Properties =>
+ {} },
            Key => { value => "physical" },
            Precision => { Dec => { value => 1 }, Units => { value => 
+"mil" } },
            Revision => { 
                "log" => { value => 0 }, 
                oth => { value => 0 }, 
                phy => { value => 0 } 
            },
            Version => { value => "1.0" },
        },
        value  => "\"foo\"",
    },
}


my $hash = {
    File => {
        Header => {
            Contents => { Flags => {}, Dictionary => {}, Properties =>
+ {} },
            Key => { value => "phy", },
            Precision => { Units => { value => "mil" }, Dec => { value
+ => 1 } }
            Revision => { 
                log => { value => 0 }, 
                phy => { value => 0 }, 
                oth => { value => 0 } 
            },
            Version => { value => 1.0, },
        }
        value => "foo",
    }
};
[download]

Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.

Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.

Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:

{
    File => {
        Header => {
            Contents => { Dictionary => {}, Flags => {}, Properties =>
+ {} },
            Key => { physical => {} },
            Precision => { Dec => { value => 1 }, Units => { value => 
+"mil" } },
            Revision => { 
                "log" => { value => 0 }, 
                oth => { value => 0 }, 
                phy => { value => 0 } 
            },
            Version => { "1.0" => {} },
        },
        value  => "\"foo\"",
    },
}
[download]

which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.

The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.

The code:

#! perl -slw
use strict;
use Data::Dump qw[ pp ];

$|++;

sub seeNextToken {
    my( $next ) = $_[0] =~ m[\s*(\S+)];
    return $next;
}

sub getNextToken {
    $_[0] =~ s[\s*(\S+)\s+][] or die;
    return $1;
}

#my $depth = 0;
sub parse { local $^W;
    ## alias rather than copy the input, so that we can modify it
    our $in; local *in = \$_[0];
    my $ref = {};

    my $token = getNextToken( $in );
    die 'No open paren' unless $token eq '(';

    my $name = getNextToken( $in );
    my $value;
    if( seeNextToken( $in ) !~ '[()]' ) {
        $value = getNextToken( $in );
    }
    $ref->{ value } = $value if defined $value;

#    printf "%s n:$name v:$value (next:%s) in:$in\n", 
         ' .' x $depth++, seeNextToken( $in );

    while( seeNextToken( $in ) eq '(' ) {
        my( $name, $value ) = parse( $in );
        $ref->{ $name } = $value;
    }
    die 'Missing close paren' unless getNextToken( $in ) eq ')';

    ## fix up the single, single anomaly
    if( keys( %$ref ) == 1 ) {
        my( $key, $value ) = each %$ref;
        if( ref $value eq 'HASH' and keys( %$value ) == 0 ) {
            delete $ref->{ $key };
            $ref->{ value } = $key;
        }
    }
#    --$depth;
    return $name, $ref;
}

my $input = do{ local $/; <DATA> };
$input =~ s[\s+][ ]gsm;

my $ref = { parse( $input ) };
pp $ref;

__DATA__

( File "foo"
    ( Header
        ( Key
            ( physical )
        )
        ( Version
            ( 1.0 )
        )
        ( Revision
            ( log 0 )
            ( phy 0 )
            ( oth 0 )
        )
        ( Contents
            ( Flags )
            ( Dictionary )
            ( Properties )
        )
        ( Precision
            ( Units mil )
            ( Dec 1 )
        )
    )
)
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Parsing to a hash suggestions Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing to a hash suggestions by chasdavies (Initiate) on Dec 01, 2010 at 20:59 UTC
Thank you very much. This is exactly what I needed. I have a tested it on more robust sample of data and it worked flawlessly, albeit I did need to tweak it a bit. Many many thanks and hope you have a great day. Best regards, Charlie	[reply]