Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Parse a file and store it in hash of hashes

by Sonali (Novice)
on Jan 16, 2017 at 04:59 UTC ( [id://1179628] : perlquestion . print w/replies, xml ) Need Help??

Sonali has asked for the wisdom of the Perl Monks concerning the following question:

I want to parse the following file which is in the below format and generate a hash of hash to store it. Obviously a newbie.

[CELL_NAME1] COMMENT = "Perl parsing" FIRST = "TEST1" SECOND = "ID1" THIRD = 123 FOURTH = "THREE" FIFTH = 12345 SIXTH = 6789 SEVENTH = QWERTY [CELL_NAME2] COMMENT = "Tester" FIRST = "TEST2" SECOND = "ID2" THIRD = 1234 FOURTH = "FOUR" FIFTH = 12345 SIXTH = BOARD SEVENTH = MOUSE [CELL_NAME3] COMMENT = "Parser" FIRST = "TEST3" SECOND = "ID3" THIRD = 12345 FOURTH = "FIVE" FIFTH = 12345 SIXTH = PAD SEVENTH = KEY

My code goes like this

#!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my $filename = 'tester.txt'; my %HoH; my $key; my $value; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while ( <$fh> ) { next unless s/^\[(.*?)\]\s*//; my $rec = $1; for my $field ( split /\n/) { ($key, $value) = split /\s*=\s*/, $field; $HoH{$rec}{$key} = $value; } } print Dumper \%HoH;

Replies are listed 'Best First'.
Re: Parse a file and store it in hash of hashes
by afoken (Chancellor) on Jan 16, 2017 at 07:12 UTC

    Hi. You are aware that perlmonks is neither a code writing service nor a job exchange, aren't you?

    Show what you tried so far, and we'll help you with the remaining problems.

    A great part of using perl is using CPAN. The file format looks very much like a Windows INI file, and that's a solved problem. Go to http://search.cpan.org and search for "INI". You will find many modules that can handle those files. Follow the links to the module documentation and find the one that fits best. Then open a command prompt, and type cpan install Your::Favorite::INI::Module.

    If you insist on reinventing the wheel, but have no code yet, look up the documentation of strict, warnings, open, readline, split, autodie, and at least try to write a piece of code that reads the file line by line and splits the data line into key and value.

    As with most other "simple" computer problems: Explain the problem in plain english, as you would for a very stupid human. As in: "Open the file foobar.ini. If that fails, stop. Else, read a line ...". From there, translating english to any computer language is quite easy.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      This is my code snippet. When i run this program there is no output at all. I am not able to figure out why

      #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my $filename = 'tester.txt'; my %HoH; my $key; my $value; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while ( <$fh> ) { next unless s/^\[(.*?)\]\s*//; $rec = $1; for my $field ( split /\n/) { ($key, $value) = split /\s*=\s*/, $field; $HoH{$rec}{$key} = $value; } } print Dumper %HoH;

        I find this highly unlikely. When I run your code, I get the following output:

        Global symbol "$rec" requires explicit package name at q:\tmp.pl line +12. Global symbol "$rec" requires explicit package name at q:\tmp.pl line +15. Execution of q:\tmp.pl aborted due to compilation errors.

        If I declare $rec as lexical variable and create an empty filename tester.txt, I get no output. This is because you're not using Data::Dumper properly

        print Dumper %HoH; # should be print Dumper \%HoH;

        Please post the actual code you are using.

        Also, look at Config::IniFiles, which does all of what you're doing already.

        Hello Sonali and welcome to the monastery and to wonderful world of Perl!

        First of all follow the wise suggestions of the precise monk afoken.

        That said, with the code you posted, and in particular $rec = $1 I get the error Global symbol "$rec" requires explicit package name at pm16012017.pl line 12. but is probably a typo.

        In addition i think you just need a hash not a HashOfHash.

        Now about your code: if next unless s/^\[(.*?)\]\s*//; is intended to skip the first line must probably be: next if s/^\[(.*?)\]\s*//;

        Even with this you get errors about undefined values: Use of uninitialized value in hash element at inifile16012017.pl line 15, foreach line of data and the following datastructure:

        $VAR1 = ''; $VAR2 = { 'FIFTH' => '12345', 'COMMENT' => '"Perl parsing"', 'SEVENTH' => 'QWERTY', 'FOURTH' => '"RANDOM"', 'SECOND' => '"ID"', 'FIRST' => '"TEST"', 'THIRD' => '123', 'SIXTH' => '6789' };

        If you intended to have CELL_NAME as root element you need to not skip the line with it and have $rec declared outside the loop, to have it ad disposal during the loop:

        my $rec; while ( <$fh> ) { if (s/^\[(.*?)\]\s*//){$rec = $1}

        The resulting datastructure (dumped with Data::Dump with dd prettier method) will be:

        ( "CELL_NAME", { COMMENT => "\"Perl parsing\"", FIFTH => 12345, FIRST => "\"TEST\"", FOURTH => "\"RANDOM\"", SECOND => "\"ID\"", SEVENTH => "QWERTY", SIXTH => 6789, THIRD => 123, }, )

        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Parse a file and store it in hash of hashes
by 1nickt (Canon) on Jan 16, 2017 at 14:09 UTC

    Hello Sonali,

    I appreciate that you are trying to learn how to do some basics in Perl, and you want to understand how things work. But one of the very best reasons to use a CPAN module for a common task is that it has probably considered all the "edge cases" that you might encounter in your data. Reputable CPAN modules come with a test suite that demonstrates this. So the risk of writing your own solution is that you may miss a special case, and you won't have a test for it to reveal your error.

    At the least you should compare the results you get with the results from another processor. Here is a solution using Config::Tiny::Ordered:

    use strict; use warnings; use Config::Tiny::Ordered; my $file = '1179628.txt'; my $config = Config::Tiny::Ordered->read( $file ); foreach my $section_name( sort keys %{ $config } ) { print "SECTION: $section_name\n"; foreach my $item( @{ $config->{ $section_name } } ) { printf ( " %7s : %s \n", $item->{'key'}, $item->{'value'} ); } print "\n"; } __END__
    Output:
    $ perl 1179628.pl SECTION: CELL_NAME1 COMMENT : "Perl parsing" FIRST : "TEST1" SECOND : "ID1" THIRD : 123 FOURTH : "THREE" FIFTH : 12345 SIXTH : 6789 SEVENTH : QWERTY SECTION: CELL_NAME2 COMMENT : "Tester" FIRST : "TEST2" SECOND : "ID2" THIRD : 1234 FOURTH : "FOUR" FIFTH : 12345 SIXTH : BOARD SEVENTH : MOUSE SECTION: CELL_NAME3 COMMENT : "Parser" FIRST : "TEST3" SECOND : "ID3" THIRD : 12345 FOURTH : "FIVE" FIFTH : 12345 SIXTH : PAD SEVENTH : KEY
    Hope this helps!


    The way forward always starts with a minimal test.

      Yes I tried it and it is way easier. Thank you!

Re: Parse a file and store it in hash of hashes
by rahulruns (Scribe) on Jan 16, 2017 at 10:13 UTC

    Few Ideas, you data will look arranged in a better way if you use xml rather than plain file, with that you could parse your xml and store elements of xml as key and value of that element as value for your key. If you want to use only from a plain file you could split on basis of = sign and store them in hash look something like

    while my $line (<$fh>){ my ($a, $b) = split (/=/, $line); $hash{$a} = $b ; }
    Remember this is not complete code, it is just to give you hint

      No I dont want it in XML format. Thanks for the hint!!

Re: Parse a file and store it in hash of hashes
by tybalt89 (Monsignor) on Jan 16, 2017 at 20:17 UTC

    I find YAML is usually easier to read for debugging structures than Data::Dumper

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1179628 use strict; use warnings; my $rec; my %HoH; while(<DATA>) { if( /^\[(.*?)\]/ ) { $rec = $1; } elsif( defined $rec and /(\S+)\s*=\s*(".*"|\S+)/ ) { $HoH{$rec}{$1} = $2; } } use YAML; print Dump \%HoH; __DATA__ [CELL_NAME1] COMMENT = "Perl parsing" FIRST = "TEST1" SECOND = "ID1" THIRD = 123 FOURTH = "THREE" FIFTH = 12345 SIXTH = 6789 SEVENTH = QWERTY [CELL_NAME2] COMMENT = "Tester" FIRST = "TEST2" SECOND = "ID2" THIRD = 1234 FOURTH = "FOUR" FIFTH = 12345 SIXTH = BOARD SEVENTH = MOUSE [CELL_NAME3] COMMENT = "Parser" FIRST = "TEST3" SECOND = "ID3" THIRD = 12345 FOURTH = "FIVE" FIFTH = 12345 SIXTH = PAD SEVENTH = KEY