Parse data representing hash

peterp has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parse data representing hash by AppleFritter (Vicar) on Jun 28, 2014 at 22:22 UTC
Some advice, as requested: Keep track of the current "chain of keys" as you progress through the lines. For each line, use the number of tabs to determine how many of the previously-saved keys you need to keep. Truncate your chain, use it to add a new entry to your hash at the right spot with the newly-read key, then add that key to your chain. So your script would work like this: Read line 1. `@keychain` is empty. Zero tabs, so you truncate it to zero entries (a no-op, coincidentally). Progress through `$hash` along your `@keychain`; since it's empty, you're still at the root. Add an entry for 'one' to the hash, add 'one' to `@keychain`. Read line 2. `@keychain` contains 'one'. One tab, so you truncate it to one entry (another no-op). Progress through `$hash` along your `@keychain`; you'll end up at `$hash->{'one'}`. Add an entry for 'two' to the hash, add 'two' to `@keychain`. Read line 3. `@keychain` contains 'one' and 'two'. One tab, so you truncate it to one entry (NOT a no-op this time). Progress through `$hash` along your `@keychain`; you'll end up at `$hash->{'one'}` again. Add an entry for 'three' to the hash, add 'three' to `@keychain`. Read line 3. `@keychain` contains 'one' and 'three'. And so on... You get the idea - this is how I'd approach this.	[reply]
Re^2: Parse data representing hash by peterp (Sexton) on Jun 28, 2014 at 22:44 UTC
Thank you for your advice, it correlates with what others have said by maintaining state in @keychain. When truncating I suppose the best approach be something along the lines of splice @keychain, $tabs_count; (update: nevermind this question, the example provided by choroba covers this)	[reply]
Re^3: Parse data representing hash by AppleFritter (Vicar) on Jun 29, 2014 at 09:40 UTC
Yes, `splice` is likely the best approach. From its documentation: Removes the elements designated by OFFSET and LENGTH from an array [...] If LENGTH is omitted, removes everything from OFFSET onward. So with zero-based arary indexing, it really is as simple as `splice @keychain, $tabs_count;`, yes. You could also use an array slice, BTW, e.g. `@keychain = @keychain[0 .. ($tabs_count - 1)];`, but that's less elegant and idiomatic.	[reply]
Re^4: Parse data representing hash by peterp (Sexton) on Jun 29, 2014 at 18:51 UTC
Re^5: Parse data representing hash by AppleFritter (Vicar) on Jun 29, 2014 at 20:08 UTC
Re: Parse data representing hash by choroba (Cardinal) on Jun 28, 2014 at 22:05 UTC
Instead of recursion, you can reach to Data::Diver and its `DiveRef`: <Reveal this spoiler or all in this thread> لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^2: Parse data representing hash by peterp (Sexton) on Jun 28, 2014 at 22:28 UTC
Hi, Thank you very much for your suggestion, i'm unable to test right now since i'm on holiday using an online compiler, but it looks exactly what I needed. My data is actually slightly more complicated than the example I provided, since each row has additional information, therefore was planning on eventually using the following design $hash->{one}->{children}->{two}..., $hash->{one}->{data}->{url} = 'example.com' etc to parse the row "one\|url=>example.com", and I will read the documentation carefully to see if this is possible. Thanks again	[reply]
Re^2: Parse data representing hash by Anonymous Monk on Jun 28, 2014 at 22:30 UTC
To maintain hash of hash of .... in case the data has lines of numbers ... `DiveRef($hash, \(@path) );`	[reply] [d/l]
Re: Parse data representing hash by LanX (Saint) on Jun 28, 2014 at 22:00 UTC
No need for recursion. Your HoHoH... has always hashrefs or undef as values. You need one state var: an @path array holding the refs till the last value so far. Whenever you parse the indentation you get the index of the next entry in @path and "parent" entry points to the hash you need to extend. if smaller you need to shorten @path, if bigger you have to extend @path and transform the last undef into a hashref. I hope you get the idea... I'm mobile so no chance for tested code, but I'm sure the archive has many examples. Cheers Rolf (addicted to the Perl Programming Language)	[reply]
Re^2: Parse data representing hash by peterp (Sexton) on Jun 28, 2014 at 22:21 UTC
Thank you, This information is very useful. If I understand correctly your suggested design is much like what choroba has suggested below, which appears to build the state into an array and pass this to the core DiveRef function. Regards, Chris	[reply]
Re^3: Parse data representing hash by LanX (Saint) on Jun 28, 2014 at 22:45 UTC
Similar but not identical from what I see. I'd rather keep the values° in @path, choroba keeps the keys. Like this I don't need any dive function, the ref of the hash to be extended is already available (or must be autovivified if undef) update The only complication is that you prefer undef instead of an empty hash for leaves of your HoH tree, which leads to a test condition in edge cases. Cheers Rolf (addicted to the Perl Programming Language) °) or even the refs of the values	[reply]
Re^4: Parse data representing hash by peterp (Sexton) on Jun 28, 2014 at 23:21 UTC
Re: Parse data representing hash by hdb (Monsignor) on Jun 29, 2014 at 17:23 UTC
Here is my humble attempt to solve your interesting puzzle: `use strict; use warnings; use Data::Dumper; my %hash; my @row; while(<DATA>){ my ($level, $word) = /^(\s)(\w)$/; $level = length($level)/4; $row[$level] = $word; $#row = $level; my $last = \%hash; for (0..$#row) { $last->{$row[$_]} = $_ == $#row ? undef : {} if not defined $last- +>{$row[$_]}; $last = $last->{$row[$_]}; } } print Dumper \%hash; __DATA__ one two three four five six` [download]	[reply] [d/l]
Re^2: Parse data representing hash by peterp (Sexton) on Jun 29, 2014 at 18:47 UTC
Thank you very much for providing your attempt, its very similar to my alternative version to using Data::Diver, but has provided some insight into ways I can improve. Notably, the $#array = $index syntax is new to me and is a nice alternative to using splice or a slice. Also, I prefer your for loop over my recursive function to construct the resultant hash. Regards.	[reply]
Re: Parse data representing hash by ww (Archbishop) on Jun 28, 2014 at 21:47 UTC
"I understand its common courtesy to provide what I have tried so far...." Well, 'courtesy' is only part of it. The more important part grows out of the Monastery's reason for existence: to help you learn. It's hard to know where your issues are ... ie, what you may need to know/learn more about -- if you don't show us your attempt(s) (boiled down to a concise failure case) and the exact warning, error and/or (output from your code+explanation of how that's not what you intended). No downvote from me, in this case, but please heed this advice. *Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:* code verbatim error and/or warning messages *a coherent explanation of what "doesn't work* actually means.** check Ln42!	[reply]
Re^2: Parse data representing hash by peterp (Sexton) on Jun 28, 2014 at 22:13 UTC
Hi, This is my code as it stands, I know it doesn't work and I know why it doesn't work. It basically represents the foundations of my best attempt. Chris use strict; use warnings; use Data::Dumper; my $rows = [ ]; while ( <DATA> ) { chomp; my $depth = $_ =~ s/\s{4}//g; $depth \|\|= 0; push @$rows, { depth => $depth, key => $_ }; } my $ref = process( { }, $rows ); print Dumper $ref; sub process { my ( $ref, $rows ) = @_; my $current_row = shift @$rows; my $next_row = $rows->[0] // return $ref; my $current_key = $current_row->{key}; my $current_depth = $current_row->{depth}; my $next_key = $next_row->{key}; my $next_depth = $next_row->{depth}; print "$current_key, $current_depth, $next_key, $next_depth\n"; $ref = $ref->{$current_key} = { }; if ( $current_depth > $next_depth ) { return $ref; } elsif ( $current_depth < $next_depth ) { $ref = process( $ref, $rows ); } return $ref; } __DATA__ one two three four five six [download]	[reply] [d/l]
Re: Parse data representing hash by remiah (Hermit) on Jun 30, 2014 at 00:39 UTC
I was thinkg of another attemt to solve this puzzle. Making hash notatin text and eval it And this is nothing better than hdb's one, maybe ... use strict; use warnings; use Data::Dumper; sub proc{ my($pre, $cnt_cur, $end_brackets)=@_; my $buff; $buff = $pre->{data} ; if ($cnt_cur > $pre->{cnt}){ #next is children $buff .=' => {'; push(@$end_brackets, '}'); }elsif ( $cnt_cur == $pre->{cnt} ){ #next is brothers $buff .= ' => undef ,'; } else { $buff .= ' => undef ,'; for ( 1 .. ($pre->{cnt} - $cnt_cur) ){ #output right bracke +t till that depth $buff .= pop(@$end_brackets) . ","; } } return $buff; } my ($pre, $cnt, $ret, @end_brackets); $ret='{'; while(<DATA>){ $cnt = $_ =~ s/\s{4}//g; $cnt = $cnt \|\| 0; if ( $pre ){ $ret .= proc($pre, $cnt, \@end_brackets); } $pre={cnt => $cnt, data=>$_}; } $ret .= proc($pre, 0, \@end_brackets); $ret .= '}'; print Dumper eval($ret); __DATA__ one two three four three_another four_another five six seven eight nine ten 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [download]	[reply] [d/l]
Re^2: Parse data representing hash by peterp (Sexton) on Jun 30, 2014 at 06:42 UTC
Thanks for providing your solution to the problem. I found it very interesting particularly as you have taken a different approach to others by building a string then evaluating it into the data structure it represents. It works absolutely fine for the example data I provided, although my real world keys contain special characters such as { and } therefore I had to adjust to $buff = "'$pre->{data}'";. As a side effect of doing this, I also had to chomp each row. The only other difference to other solutions I noted when running with my real world data was, my real world data isn't entirely regular i.e. some rows unfortunately have extra spacing (stupid mistake in the code that generated the data) e.g. `one two three` [download] Other solutions unwittingly accounted for this by creating an empty '' key on an intermediate level, although outputting "Use of uninitialized value in hash element" warnings, which I hindered by ensuring the undefined key defaulted to ''. I haven't yet fully understood your code in order to explain why, but under this scenario, your code breaks / generates an invalid data structure. In order to fix this issue for all solutions, I think I will have to adjust the way depth is calculated, perhaps compare the current row to the previous and check whether there is a bigger or smaller (the complexity is how much smaller) gap as oppose to assuming there will be a change of either +4, 0 or multiples of -4 spaces, or atleast keep track of when the extra spacing occurs and account for it. My real world data isn't irregular enough to make it impossible to decipher which parent level to return to, the extra spacing consistently occurs at particular levels. Perhaps it will be best just to simply programatically clean up the data before processing! Thanks again	[reply] [d/l]

update