Cian has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone!, Im new to perl and programming in general (a week!) as this question shows. I have bam files and I want to see how often strings are repeated. However, Im trying to get it to work on a txt file firstly. The thing is, I don't really want to do it as shown below. Is there a way I can code it so I dont have to specify that which I want counted? So it just tells me the frequency of every string in the tab-delimited file? And printing to another text file is preferable to STDOUT.

#!/usr/bin/perl -w print "Enter the name of your file, ie myfile.txt:\n"; my $val = <STDIN>; chomp ($val); my $cnt=0; open (HNDL, "$val") || die "wrong filename"; while ($val = <HNDL>) { while ($val =~ /\bchr1\b/ig) { ++$cnt; } } print "Number of instances of 'chr1' found: $cnt\n\n";

Thank's for your time!

Replies are listed 'Best First'.
Re: Counting frequency of strings in files
by toolic (Bishop) on Apr 23, 2012 at 17:17 UTC
    You could accumulate words into a hash:
    use warnings; use strict; use Data::Dumper; $Data::Dumper::Sortkeys = 1; my %words; while (<DATA>) { $words{$_}++ for split; } print Dumper(\%words); __DATA__ the the the and me ok big dog me

    Prints:

    $VAR1 = { 'and' => 1, 'big' => 1, 'dog' => 1, 'me' => 2, 'ok' => 1, 'the' => 3 };

    See also:

Re: Counting frequency of strings in files
by GrandFather (Saint) on Apr 24, 2012 at 05:27 UTC

    Others have already pointed you in the direction of the usual Perl way to solve the problem - use a hash, and have actually shown you some good coding habits along the way. However it's worth being a little more explicit about some of those habits. Consider:

    use strict; use warnings; my $filename = 'delme.txt'; # Create a sample file open my $fOut, '>', $filename or die "Can't create $filename: $!\n"; print $fOut <<SAMPLE; the the the and me ok big dog me SAMPLE close $fOut; my %words = (' word ' => 'count'); open my $fIn, '<', $filename or die "Can't open $filename: $!\n"; while (<$fIn>) { $words{$_}++ for split; } close $fIn; printf "%-10s %3s\n", $_, $words{$_} for sort keys %words;

    Prints:

    word count and 1 big 1 dog 1 me 2 ok 1 the 3

    For note the use strict: always use strictures (use strict; use warnings; - see The strictures, according to Seuss).

    Then note the use of the three parameter version of open. In particular note the '<' to make the file open mode explicit. That makes the code clearer and safer. Also note the use of lexical file handled (declared using my), that also makes the code safer.

    The split looks like absolute magic, but it is simply using defaults for all its parameters. Read teh split documentation until you understand ghow it works. Note that while (<$fIn>) sets the default variable ($_) and does a little other magic so you may want to read the while documentation too to understand what's going on there.

    Most of the build in functions don't need () and I tend to skip them to reduce clutter, but that means you need to use the low priority or instead of || so the die does the right thing.

    Notice that the die message gives both the file name and the system error (that's what the $! special variable is about) to make it easier to diagnose file errors.

    The last line uses for as a statement modifier to compactly print out the contents of the words hash. Note that the header is generated by priming the words hash in a sneaky fashion: the spaces in the key guarantee there are no conflicts with words from the text and that the header line sorts first and thus gets printed first (that's just a trick, not a "coding habit" of course).

    True laziness is hard work
      You people are awesome! Such good replies, I got it working thanks to you!
Re: Counting frequency of strings in files
by 2teez (Vicar) on Apr 23, 2012 at 22:06 UTC

    Welcome to Perl and to programming!.

    To solve this problem Hash comes to mind. The code below open a new file to be read into i.e "freq_file.txt", and a file to read from. Print in the new file the values matched and it's frequency. See below:

    #!/usr/bin/perl use warnings; use strict; print "Enter the name of your file, ie myfile.txt: ",; chomp( my $file = <STDIN> ); my %found; open my $fh, '>', 'freq_file.txt' or die "can't open this file: $!"; open my $fh2, '<', $file or die "can't open this file: $!"; while ( my $line = <$fh2> ) { $found{$_}++ foreach split /\s+?/, $line; } print $fh "Frequence\tValue Found", $/; print $fh $_, "\t\t", $found{$_}, $/ foreach sort keys %found; close $fh2 or die "can't close file: $!"; close $fh or die "can't close file: $!";

    I hope this helps

Re: Counting frequency of strings in files
by Anonymous Monk on Apr 23, 2012 at 17:28 UTC
    #!/usr/bin/perl -w use strict; print "Enter the name of your file, ie myfile.txt:\n"; chomp(my $val = <STDIN>); my %seen; open my $fh, '<', $val or die "wrong filename: $!"; while (defined(my $line = readline $fh)){ my @list = split "\t", $line; @seen{@list} = map{$seen{$_}||0+1} @list ; } #while(my($string, $count) = each %seen){ foreach my $key(sort {$seen{$b} <=> $seen{$a}} keys %seen){ my $string = $key; my $count = $seen{$key}; print "Number of instances of '${string}' found: $count\n"; }
      your script is close alright, but it seems to give a count of 1 for everything, even though i know there is more than one of most... I also want to have the script create an output text file and put the results there.
        Sorry, there should be this:
        @seen{@list} = map{($seen{$_}||0)+1} @list;

        and to print the output into a file, just say:
        open my $output_fh, '>', "filename.txt" or die $!; #while(my($string, $count) = each %seen){ foreach my $string(sort {$seen{$b} <=> $seen{$a}} keys %seen){ my $count = $seen{$string}; printf $output_fh "%-70s%d\n", $string, $count; } close $output_fh;