Filtering text file made into hash

reebee3 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Filtering text file made into hash by afoken (Chancellor) on Oct 27, 2015 at 09:24 UTC
Some hints: Change the `open` line to include `or die "Could not open $file1: $!";`. Alternatively, add `use autodie qw( open close );` to the head of your script. This will replace the core open and close functions with wrappers that always check for errors. The `for` loop writing the output should run after the `while` loop, not inside. You don't want to write intermediate results after each line of input read. You want the total result after having read all input lines. The `sort` expression in the `for` loop should compare `$codes{$a}` and `$codes{$b}`. There is no variable named `%name`. There is no bareword file handle named `OUT`, so `print OUT` does not make sense. Comparing the result of the print function with `results.txt` also does not make sense. It seems you want to write to a file named `results.txt`. For that to work, open a second file handle in write mode, then use `print $outputhandle "$name\n";`. `foreach ($line)` makes no sense. There is only one line in each round of the `while` loop. The first two lines inside the `foreach` loop don't make sense. You don't assign anything to $length, and you assign to $name only after having used it. You want to split each line into name and length. See split. You wrote two conditions above your code. One is: print the names that ... have a length greater than or equal to 50.. The `for` loop with the `sort` will sort and print anything in `%codes`. If you don't add lines with a length value less than 50 to `%codes`, they won't appear in the output. `next if $length<50;` will skip all lines where the lenght is less than 50. The other condition is: print the names that appear at least 4 times in the text file. A common trick is to use a hash as the counter. As you aren't really interested in the length at this point, you can use the `%codes` hash for that. Don't assign anything to `$codes{$name}`, but increment its value: `$codes{$name}++`. After the while loop, `%codes` will contain the counts of every name with a length of at least 50. You only want those with a count of at least 4. Combine `sort` with grep: `for my $name (sort { ... } grep { $codes{$_}>=4 } keys %codes)`. Limit the scope of the variables to the minimum. `$name` and `$length` are only required inside the `while` loop, so declare them there, not outside the loop. Close the files you use as soon as you no longer need them. You don't need the input file after having read all lines in the `while` loop, so close it after the while loop. The output handle is no longer needed after the `for` loop. Don't forget to check for errors. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re: Filtering text file made into hash by AppleFritter (Vicar) on Oct 27, 2015 at 10:13 UTC
You're almost there! But as my anonymous brother pointed out, you'll have to think about what you want to achieve and how you want to do it before actually coding it up. You want to print names if they appear at least four times in the file. Just the names, or their associated lengths as well? If the latter, do you want to print all lengths, so long as at least one is greater than or equal to 50? And how often do you want to print them? For instance, take your sample data. Here's a variety of possible outputs I could think of: "CA", once, because that name appears at least four times and has at least one length >= 50. "CA", four times, since that name appears four times and has an associated length >= 50 four times. "CA", four times, since that name appears four times and has an associated length >= 50 at least once. "CA", followed by all the associated lengths, regardless of whether they're >= 50. "CA", followed by only the associated lengths that are >= 50. ... I usually go for a general solution that allows me to tweak the output as desired later on without having to change the crunching (much). Which brings me to another important remark - the overall structure for this sort of program is generally: Read input into the right kind of data structure Process Output This may seem trivial, especially since there's no processing going on here (I don't count selecting what to output as processing, really), but it helps to separate these steps. For instance, what kind of data structure is right? You're dealing with key/value pairs here, so it'll likely be something involving a hash. Without knowing exactly what data you need to preserve, I'd suggest simply saving ALL the lengths for each name, in the order they appear in the input file. (As an added benefit, counting the number of lengths for a given name will then also tell you how often that name appeared in your input.) So, use a hash of arrays. perldsc, the Perl Data Structures Cookbook, may be of help there. Further observe that you can use `split` to break up your input line along commas to separate names from lengths, and use `any` from the List::Util core module to test if any length for a given name is >= 50, and here's my starting point for a solution: #!/usr/bin/perl use Modern::Perl '2014'; # core modules use List::Util qw/any/; # this will hold the data read my %names = (); # 1. read input data while(<DATA>) { chomp; # split $_ along commas, returning at most 2 pieces my ($name, $length) = split /,/, $_, 2; # save $length for $name push @{ $names{$name} }, $length; } # 2. processing - none # 3. select what to output foreach my $name (sort keys %names) { # did $name appear at least four times? if(scalar @{ $names{$name} } >= 4) { # is at least one of the associated lengths >= 50? if(any { $_ > 50 } @{ $names{$name} }) { say "$name: ", join ",", @{ $names{$name} }; } } } __DATA__ CA,57 MO,22 CA,88 CA,99 NC,34 CA,104 [download] This outputs: $ perl 1146088.pl CA: 57,88,99,104 $ [download] BTW - I left out the file-handling code on purpose here and read from `__DATA__` (see Special Literals for more on that) instead to focus on the important bits. You'll know what to do. :) (In general it's perhaps simpler/better to read from `<<>>` anyway and use the shell to redirect input and output as desired.)	[reply] [d/l] [select]
Re^2: Filtering text file made into hash by GrandFather (Saint) on Oct 27, 2015 at 22:36 UTC
`my %names = ();` is better as: `my %names;` Hashes and arrays are made fresh (and empty) when they are declared so you don't need to clutter code with redundant initialization. Avoid nested blocks. Your for loop is better written: `for my $name (sort keys %names) { # Skip if fewer than 4 occurrences of name or none over 50 next if @{$names{$name}} < 4 \|\| !any {$_ > 50} @{$names{$name}}; say "$name: ", join ",", @{$names{$name}}; }` [download] Use early exits to avoid nesting in loops and subs. Nested blocks makes logic flow much harder to analyze. Using early exits allows a simple to understand list of test/handle/bail steps. Premature optimization is the root of all job security	[reply] [d/l] [select]
Re: Filtering text file made into hash by Anonymous Monk on Oct 27, 2015 at 07:47 UTC
Here is what I have so far. code Thats great :) But now is the time to step away from the code and back to paper/pencil :) make a bullet point list of the checks you need to make and the order you need to make them in, don't try writing code yet, just simple english, write the steps needed to solve the problem	[reply]
Re: Filtering text file made into hash by GotToBTru (Prior) on Oct 27, 2015 at 12:47 UTC
Did your instructor tell you what the correct results should be? Your description and your example data are ambiguous (as others have already said). Dum Spiro Spero	[reply]