reebee3 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a text file that is set up...

name,length

For example:

CA,57

MO,22

CA,88

CA,99

NC,34

CA,104

I am trying to sort the file and print the names that appear at least 4 times in the text file and have a length greater than or equal to 50. Here is what I have so far.

#!/usr/bin/perl use strict; use warnings; my $file1 = shift open my $f1, '<', $file1; my %codes; my $name; my $length; while (my $line = <$f1>) { chomp $line; foreach ($line) { $codes{$name} = $length if defined $name; $name = $1; #when name is present at least 4 times-- not sure how to repre +sent this yet $length >= 50 ; #when length is greater than or equal to 50 } { for my $name (sort { $name{$a} <=> $name{$b}} keys %codes) { print OUT > results.txt ; } }
Thank you

Replies are listed 'Best First'.
Re: Filtering text file made into hash
by afoken (Chancellor) on Oct 27, 2015 at 09:24 UTC

    Some hints:

    • Change the open line to include or die "Could not open $file1: $!";. Alternatively, add use autodie qw( open close ); to the head of your script. This will replace the core open and close functions with wrappers that always check for errors.
    • The for loop writing the output should run after the while loop, not inside. You don't want to write intermediate results after each line of input read. You want the total result after having read all input lines.
    • The sort expression in the for loop should compare $codes{$a} and $codes{$b}. There is no variable named %name.
    • There is no bareword file handle named OUT, so print OUT does not make sense.
    • Comparing the result of the print function with results.txt also does not make sense. It seems you want to write to a file named results.txt. For that to work, open a second file handle in write mode, then use print $outputhandle "$name\n";.
    • foreach ($line) makes no sense. There is only one line in each round of the while loop.
    • The first two lines inside the foreach loop don't make sense. You don't assign anything to $length, and you assign to $name only after having used it. You want to split each line into name and length. See split.
    • You wrote two conditions above your code. One is: print the names that ... have a length greater than or equal to 50.. The for loop with the sort will sort and print anything in %codes. If you don't add lines with a length value less than 50 to %codes, they won't appear in the output. next if $length<50; will skip all lines where the lenght is less than 50.
    • The other condition is: print the names that appear at least 4 times in the text file. A common trick is to use a hash as the counter. As you aren't really interested in the length at this point, you can use the %codes hash for that. Don't assign anything to $codes{$name}, but increment its value: $codes{$name}++. After the while loop, %codes will contain the counts of every name with a length of at least 50. You only want those with a count of at least 4. Combine sort with grep: for my $name (sort { ... } grep { $codes{$_}>=4 } keys %codes).
    • Limit the scope of the variables to the minimum. $name and $length are only required inside the while loop, so declare them there, not outside the loop.
    • Close the files you use as soon as you no longer need them. You don't need the input file after having read all lines in the while loop, so close it after the while loop. The output handle is no longer needed after the for loop. Don't forget to check for errors.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Filtering text file made into hash
by AppleFritter (Vicar) on Oct 27, 2015 at 10:13 UTC

    You're almost there! But as my anonymous brother pointed out, you'll have to think about what you want to achieve and how you want to do it before actually coding it up.

    You want to print names if they appear at least four times in the file. Just the names, or their associated lengths as well? If the latter, do you want to print all lengths, so long as at least one is greater than or equal to 50? And how often do you want to print them?

    For instance, take your sample data. Here's a variety of possible outputs I could think of:

    • "CA", once, because that name appears at least four times and has at least one length >= 50.
    • "CA", four times, since that name appears four times and has an associated length >= 50 four times.
    • "CA", four times, since that name appears four times and has an associated length >= 50 at least once.
    • "CA", followed by all the associated lengths, regardless of whether they're >= 50.
    • "CA", followed by only the associated lengths that are >= 50.
    • ...

    I usually go for a general solution that allows me to tweak the output as desired later on without having to change the crunching (much). Which brings me to another important remark - the overall structure for this sort of program is generally:

    1. Read input into the right kind of data structure
    2. Process
    3. Output

    This may seem trivial, especially since there's no processing going on here (I don't count selecting what to output as processing, really), but it helps to separate these steps. For instance, what kind of data structure is right?

    You're dealing with key/value pairs here, so it'll likely be something involving a hash. Without knowing exactly what data you need to preserve, I'd suggest simply saving ALL the lengths for each name, in the order they appear in the input file. (As an added benefit, counting the number of lengths for a given name will then also tell you how often that name appeared in your input.) So, use a hash of arrays. perldsc, the Perl Data Structures Cookbook, may be of help there.

    Further observe that you can use split to break up your input line along commas to separate names from lengths, and use any from the List::Util core module to test if any length for a given name is >= 50, and here's my starting point for a solution:

    #!/usr/bin/perl use Modern::Perl '2014'; # core modules use List::Util qw/any/; # this will hold the data read my %names = (); # 1. read input data while(<DATA>) { chomp; # split $_ along commas, returning at most 2 pieces my ($name, $length) = split /,/, $_, 2; # save $length for $name push @{ $names{$name} }, $length; } # 2. processing - none # 3. select what to output foreach my $name (sort keys %names) { # did $name appear at least four times? if(scalar @{ $names{$name} } >= 4) { # is at least one of the associated lengths >= 50? if(any { $_ > 50 } @{ $names{$name} }) { say "$name: ", join ",", @{ $names{$name} }; } } } __DATA__ CA,57 MO,22 CA,88 CA,99 NC,34 CA,104

    This outputs:

    $ perl 1146088.pl CA: 57,88,99,104 $

    BTW - I left out the file-handling code on purpose here and read from __DATA__ (see Special Literals for more on that) instead to focus on the important bits. You'll know what to do. :) (In general it's perhaps simpler/better to read from <<>> anyway and use the shell to redirect input and output as desired.)

      my %names = ();

      is better as:

      my %names;

      Hashes and arrays are made fresh (and empty) when they are declared so you don't need to clutter code with redundant initialization.

      Avoid nested blocks. Your for loop is better written:

      for my $name (sort keys %names) { # Skip if fewer than 4 occurrences of name or none over 50 next if @{$names{$name}} < 4 || !any {$_ > 50} @{$names{$name}}; say "$name: ", join ",", @{$names{$name}}; }

      Use early exits to avoid nesting in loops and subs. Nested blocks makes logic flow much harder to analyze. Using early exits allows a simple to understand list of test/handle/bail steps.

      Premature optimization is the root of all job security
Re: Filtering text file made into hash
by Anonymous Monk on Oct 27, 2015 at 07:47 UTC

    Here is what I have so far. *code*

    Thats great :)

    But now is the time to step away from the code and back to paper/pencil :)

    make a bullet point list of the checks you need to make and the order you need to make them in, don't try writing code yet, just simple english, write the steps needed to solve the problem

Re: Filtering text file made into hash
by GotToBTru (Prior) on Oct 27, 2015 at 12:47 UTC

    Did your instructor tell you what the correct results should be? Your description and your example data are ambiguous (as others have already said).

    Dum Spiro Spero