Perl parse text file using hash

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl parse text file using hash by hv (Prior) on Dec 20, 2022 at 18:09 UTC
I'm not sure why some are reacting negatively, I reckon you've made a pretty good first post by showing your code and the example input - any expert will see immediately that it won't compile, so I'm not worried that you didn't tell us that. The code is also written very cleanly and mostly has the right ideas. The important thing you didn't tell us was what output you actually want. It appears you want some output for each fruit, but what output do you want for "apple", which appears twice? And is the intention here to solve a specific problem, or to have a framework that you can use to get other types of output that may be filtered by the name or the path? If you are looking for a general framework, then I suggest creating a data structure for each line containing all the data for that line, then linking to that in various ways. That might look something like this: use strict; use warnings; my $file = "/path/to/config/config.txt"; my(%by_name, %by_fruit, %by_path); for my $data (parse_file($file)) { # Create lookups for each data item returned. # We assume any data can appear more than once, so we # store an array(ref) of data structures for each match. push @{ $by_name{$data->{name}} }, $data; push @{ $by_fruit{$data->{fruit}} }, $data; push @{ $by_path{$data->{path}} }, $data; } # now let us show all the paths by fruit: for my $fruit (sort keys %by_fruit) { print "Fruit $fruit:\n"; # $by_fruit{$fruit} is now an arrayref of data structures # matching 'fruit eq $fruit' for my $data (@{ $by_fruit{$fruit} }) { print " $data->{path}\n"; } } # Return a list of data structures, one for each line of # the specified file that contains data. # Blank lines and comment lines are ignored. sub parse_file { my $file = shift; open(my $fh, "<", $file) or die "Can't open < $file: $!"; return map { if (/^#/ \|\| /^\s*$/) { (); # no data on this line } else { my($name, $fruit, $path) = split ' ', $_; # Return a data structure as a hash reference { name => $name, fruit => $fruit, path => $path, }; } } <$fh>; # Note that $file is automatically closed when the variable # goes out of scope, on return from this function. } [download] Note that if your needs become more complex, this data structure could easily be upgraded to an object as part of an object-oriented solution.	[reply] [d/l]
Re: Perl parse text file using hash by kcott (Archbishop) on Dec 21, 2022 at 03:46 UTC
There are some elements of your code which suggest that you haven't fully grasped Perl data structures. The main ones that struck me originally were: `$info{$name}{$fruit}{path} = $p; $path=$info{$name}{$fruit}{path}; $name=$info{name}{$fruit}{$path};` [download] Notice that in some places you have `'name'` instead of `$name`, and `'path'` instead of `$path`. Take a look at the "Data extracted:", from my output below, to gain a bit more insight into this. Also, I recommend that you read "perldsc - Perl Data Structures Cookbook". You input data is idealised. It contains nothing to exercise the code that skips comments and blank lines. I've added some additional records: `$ cat test_input.txt Albert apple /path/to/somewhere/a # next line is <TAB><TAB><NL> Jack pineapple /path/to/somewhere/b # next line is only space characters Jack apple /path/to/somewhere/c # Comments only work when # is first character! Dex jackfruit /path/to/somewhere/d # next line is blank` [download] And, to reveal the types of whitespace (`^I` is a tab; `$` is a newline): `$ cat -vet test_input.txt Albert apple /path/to/somewhere/a$ # next line is <TAB><TAB><NL>$ ^I^I$ Jack pineapple /path/to/somewhere/b$ # next line is only space characters$ $ Jack apple /path/to/somewhere/c$ # Comments only work when # is first character!$ Dex jackfruit /path/to/somewhere/d$ # next line is blank$ $` [download] This would have alerted you to a problem in your regex (`\s+` should be `\s`). When you've got the basic code working, you should add new records with the wrong number of fields, and validation code to deal with such. There may be any number of other checks you may wish to implement; for instance, using pathname as an example: is the format valid? is it a real file? can it be read? is it the right type of file? and so on. There's another issue with your input file format. Real names and real fruits can contain spaces. How do you deal with that? A tab-separated CSV file, or similar, might be a better option than plain text. You haven't shown any expected output, so we don't know what you want. I've already pointed out problems with "`$path=...`" and "`$name=...`". Also, trying to access that data using a list of fruits seems very strange. Knowing what output you wanted, would put us in a better position to steer you towards a solution. Here's some code to get you started: #!/usr/bin/env perl use strict; use warnings; use autodie; my $input_file = 'test_input.txt'; my $info = parse_input_file($input_file); #get $address for each line in input file # Assume "path", not "address". # Will only be for each "processed" line: # no comments or blanks. #get $name for each line in input file # Again, will only be for each "processed" line: # no comments or blanks. # You also seemed to want a list of fruits. my (@names, @fruits, @paths, %seen); for my $name (keys %$info) { push @names, $name; FRUIT: for my $fruit (keys %{$info->{$name}}) { push @paths, $info->{$name}{$fruit}; next FRUIT if $seen{$fruit}++; push @fruits, $fruit; } } # TODO - for demo only; remove for production use Data::Dump; print "Data extracted:\n"; dd $info; print "Names:\n"; dd \@names; print "Fruits:\n"; dd \@fruits; print "Paths:\n"; dd \@paths; sub parse_input_file { my ($file) = @_; my $info = {}; { open my $fh, '<', $file; while (<$fh>) { next if /^(?:#\|\s$)/; my ($name, $fruit, $path) = split; $info->{$name}{$fruit} = $path; } } return $info; } [download] Here's the output from a sample run: `Data extracted: { Albert => { apple => "/path/to/somewhere/a" }, Dex => { jackfruit => "/path/to/somewhere/d" }, Jack => { apple => "/path/to/somewhere/c", pineapple => "/path/to/somewhere/b", }, } Names: ["Jack", "Dex", "Albert"] Fruits: ["apple", "pineapple", "jackfruit"] Paths: [ "/path/to/somewhere/c", "/path/to/somewhere/b", "/path/to/somewhere/d", "/path/to/somewhere/a", ]` [download] Also, consider registering a username. There are currently over 100,000 posts by "Anonymous Monk"; for now, you're easily lost in a very large crowd. There's other benefits, such as being able to edit your posts. It's very simple: see "Create A New User". — Ken	[reply] [d/l] [select]
Re: Perl parse text file using hash by Tux (Canon) on Dec 20, 2022 at 16:37 UTC
Looking at the original code, a few things stand out: re-use of variable `$file`. Not wrong, but very confusing, certainly because you do use the (undeclared) `@fruits` filled in `parse` In the `foreach` loop, you use variables `$name` and `$path` that are never set (in that context) The first argument to `split` is a regex unless you want paragraph mode (and you don't) Rewriting your program to show the data you got: #!/usr/bin/perl use 5.014002; use warnings; my @fruits; my %info = parse ("test.txt"); # %info now holds: # { Albert => { # apple => { # path => '/path/to/somewhere/a' # } # }, # Dex => { # jackfruit => { # path => '/path/to/somewhere/d' # } # }, # Jack => { # apple => { # path => '/path/to/somewhere/c' # }, # pineapple => { # path => '/path/to/somewhere/b' # } # } # } foreach my $name (sort keys %info) { foreach my $fruit (sort @fruits) { printf "%-7s %-12s %s\n", $name, $fruit, $info{$name}{$fruit}{path} // "-"; } } sub parse { my $file = shift; my %info; -e $file or return; open my $fh, "<", $file or die "Can't open $file: $!\n"; say "-I-: Reading from config file: $file"; my %seen; while (<$fh>) { m/^\s(?:#\|\s$)/ and next; my @fields = split m/\s+/ => $_; my ($name, $fruit, $p) = @fields; $seen{$fruit}++ or push @fruits => $fruit; $info{$name}{$fruit}{path} = $p; } close $fh; return %info; } # parse [download] Will - with your test data - result in: `-I-: Reading from config file: test.txt Albert apple /path/to/somewhere/a Albert jackfruit - Albert pineapple - Dex apple - Dex jackfruit /path/to/somewhere/d Dex pineapple - Jack apple /path/to/somewhere/c Jack jackfruit - Jack pineapple /path/to/somewhere/b` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^2: Perl parse text file using hash by hv (Prior) on Dec 20, 2022 at 18:17 UTC
The first argument to split is a regex unless you want paragraph mode (and you don't) That isn't what is happening here, this is a different special case documented in `perldoc -f split`: As another special case, "split" emulates the default beha +vior of the command line tool awk when the PATTERN is either omitt +ed or a string composed of a single space character (such as ' ' o +r "\x20", but not e.g. "/ /"). In this case, any leading whi +tespace in EXPR is removed before splitting occurs, and the PATTER +N is instead treated as if it were "/\s+/"; in particular, this + means that any contiguous whitespace (not just a single space ch +aracter) is used as a separator. However, this special treatment ca +n be avoided by specifying the pattern "/ /" instead of the str +ing " ", thereby allowing only a single space character to be a sep +arator. In earlier Perls this special case was restricted to the u +se of a plain " " as the pattern argument to split; in Perl 5.18.0 + and later this special case is triggered by any expression whi +ch evaluates to the simple string " ". If omitted, PATTERN defaults to a single space, " ", trigg +ering the previously described awk emulation. [download] The references to awk are probably not very helpful these days, and probably discourage people from reading the rest and using this useful construct.	[reply] [d/l]
Re: Perl parse text file using hash by Marshall (Canon) on Dec 20, 2022 at 16:46 UTC
If you insist on using a Hash of Hashes (HoH) so that you get "$info{}{}{} method", then this is one possibility. HoH is sometimes awkward because to print it, you need a loop for each level as shown. use strict; use warnings; my %info; while (<DATA>) { chomp; my ($name,$fruit, $path) = split ' ',$_,3; # allow space in path n +ame(s) $info{$name}{$fruit}=$path; } foreach my $name (sort keys %info) { print "$name\n"; foreach my $fruit (sort keys %{$info{$name}} ) { print " $fruit \t$info{$name}{$fruit}\n"; } } =PRINTS: Albert apple /path/to/somewhere/a Dex jackfruit /path/to/some where/d Jack apple /path/to/somewhere/c pineapple /path/to/some where/b =cut __DATA__ Albert apple /path/to/somewhere/a Jack pineapple /path/to/some where/b Jack apple /path/to/somewhere/c Dex jackfruit /path/to/some where/d [download]	[reply] [d/l]
Re^2: Perl parse text file using hash by Fletch (Bishop) on Dec 21, 2022 at 11:43 UTC
Or delegate the awkward and use YAML::XS or Cpanel::JSON::XS or Data::Dumper to show things (especially for a debugging context). Edit: Derp, extra n in panel fixed; thx pryrt and Anomalous. The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re: Perl parse text file using hash by Anonymous Monk on Dec 20, 2022 at 15:35 UTC
Unfortunately, "not working properly" is not a very helpful problem statement. All I can tell for sure is that your program fails to compile. And yes, the blast of compiler messages can be daunting. You are actually doing a number of things right: Enabling strictures (`use strict;`); Enabling warnings (`use warnings;`); Using lexical file handles (`open my $fh, ...` instead of `open FH, ...`); Using three-argument open (`open my $fh, '<', $file` instead of `open $fh, "<$file"`); Checking whether the open succeeded (`open ... or die ...`); All the compile errors I get are due to undeclared variables. In fact, they end in the text `(did you forget to declare "my ..."?)`. In some cases the fix is easy: just stick a "my" in front of the first assignment to the variable. In others you need to actually declare the value somewhere. For example, you need a `my @fruits;` somewhere above the call to `parse()`. And in some you need to think about where the value is to come from.	[reply] [d/l] [select]
Re: Perl parse text file using hash by BillKSmith (Monsignor) on Dec 20, 2022 at 14:40 UTC
I recommend that you combine name and fruit into one key. I assume that the character '!' will never appear in the name of any person or fruit. `use strict; use warnings; use autodie; my $file = \<<'EOF'; Albert apple /path/to/somewhere/a Jack pineapple /path/to/somewhere/b Jack apple /path/to/somewhere/c Dex jackfruit /path/to/somewhere/d EOF open( my $fh, '<', $file); my %info; while (<$fh>) { my @fields = split; my $key = "$fields[0]!$fields[1]"; my $path = $fields[2]; $info{$key} = $path; } close $fh; my $name = 'Jack'; my $fruit = 'pineapple'; my $path = $info{"$name!$fruit"};` [download] Bill	[reply] [d/l]
Re^2: Perl parse text file using hash by LanX (Saint) on Dec 20, 2022 at 16:34 UTC
Hi Bill > I assume that the character `!` will never appear in the name of any person or fruit. Actually you are reinventing the wheel here. Perl has a built-in mechanism for multidimensional keys. `$foo{$x,$y,$z}` ... is equivalent to ... `$foo{join($;, $x, $y, $z)}` see `$;` aka `$SUBSCRIPT_SEPARATOR` in `perlvar` Cheers Rolf _{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^3: Perl parse text file using hash by BillKSmith (Monsignor) on Dec 20, 2022 at 18:46 UTC
Rolf, That is a cleaver use of the Multi-Dimensional Array Emulation syntax even though that is not exactly what we are doing. In the spirit of 'not inventing a wheel', perl can also do the input, looping and splitting for us if we use the -n and -a command switches. (perlrun). `#!perl -na use strict; use warnings; our %info; $info{$F[0],$F[1]} = $F[2]; END{ my $name = 'Jack'; my $fruit = 'pineapple'; my $path = $info{$name,$fruit}; print "$name $fruit $path\n"; }` [download] OUTPUT: `>perl 1114899a.pl paths.csv Jack pineapple /path/to/somewhere/b` [download] Bill	[reply] [d/l] [select]