batcater98 has asked for the wisdom of the Perl Monks concerning the following question:

I have a flat file example is below. I need to parse through the flat file looking for the numbers ranging from 4000 to 6200 and count up how many times it finds each of them and output the results to another file. So how many times did it fine 4001 or 4348 or 5774? Each line in the file will look like the following, I only want to count the ones that have the bn infront of the 4 digit number, and I don't want to count the digits if they happen to match after the word bytes.
Data Example: base1,Thu 21Dec06 08:00:02 ,62bn6085, bytes 608584 base1,Thu 21Dec06 08:00:07 ,63bn5600, bytes 77383 base2,Thu 21Dec06 08:00:18 ,65bn6085, bytes 88373 base5,Thu 21Dec06 08:00:19 ,66bn6042, bytes 388377 base4,Thu 21Dec06 08:03:44 ,81bn4370, bytes 8956003 base6,Thu 21Dec06 08:03:57 ,82bn4512, bytes 7783 base3,Thu 21Dec06 08:01:03 ,06bn5600, bytes 77383 Output Would be: 6085 - 2 5600 - 2 6042 - 1 4270 - 1 4512 - 1 ..... And if it did not find any #### - 0
What would be the best and most efficient way to do this, the file I will be parsing is rather large. Thanks, Ad.

Replies are listed 'Best First'.
Re: Parsing a Flat File and counting occurances of numbers?
by jettero (Monsignor) on Dec 22, 2006 at 13:43 UTC

    Hrm, looking at your post history it would seem all your questions so far are somehow related to lines like these. Are you making progress I hope? The following code demonstrates the basic ideas you'll probably employ, but you'll need to make a few changes too I think.

    my %range_counts = (); my %counts = (); while( $entire_file =~ m/(\d+)bn(\d+), bytes/sg ) { $range_counts{'4kset'} ++ if $2 > 4000 and $2 < 6000; $counts{$2} ++ if $2 > 4000 and $2 < 6000; } print "there were $counts{'4kset'} 4000-6000 lines\n";

    I suspect you'd also benifit from pouring the results into a database of some kind. Then you could run queries like SELECT sum(something_column), avg(something_column) FROM data WHERE something>something and something < something and report on all kinds of interesting details.

    -Paul

      Hi jettero:

      Indeed, it looks like batcater98 is making progess. To my eye, the output file shown above looks like it could be a straight dump of the output from either my or imp's recent attempts to help batcater98 solve their problem.

      Oh how nice it would've been to have the full spec up front, so that either imp or I could have added this simple addition! Instead, it seems like poor batcater98 has taken one of those referenced programs, run it to produce an output file, and is trying to write another program to produce these (relatively trivial to add) additional results.

      Your simple loop code above would definitely cut the mustard for the spec presented above, but perhaps it would be a Better solution overall to shoehorn something that would fit the spec into the previous solution batcater98 has (apparently without much gratitude) decided to use.

      Update: Just wanted to make it clear that though I was replying to jettero, it was not my intention to berate him, as I feared that the phrase I updated might indicate.



      --chargrill
      s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
        I do appriciate everyones help in my learning curve toward perl. I am familiar with the language, but do not have the level of deep knowledge that you and others here have. If I am tapping the knowledge base in the wrong fashion, I do appologize. I am attempting to get better and hope to someday be able to return the favors to others in the room as they have helped me out in the begining. Again, Thanks for the help. batcater98
Re: Parsing a Flat File and counting occurances of numbers?
by Fletch (Bishop) on Dec 22, 2006 at 13:43 UTC
    • Open the input file
    • Read the file line by line
    • For each line use a regex to pull off one or more digits at the end of the line
    • use a hash to keep track of how many times that number's occured
    • if the regex failed to find a number, increment a counter $missing_count
    • at the end of the file, sort the list of keys in the hash by their value and print out your results table

    If you need more than that, show what you've tried so far.

Re: Parsing a Flat File and counting occurances of numbers?
by BrowserUk (Patriarch) on Dec 22, 2006 at 13:48 UTC

    The following one-liner would do it. (Wrapped for posting. Quotes are shell dependant.)

    perl -nwle"/bn(\d{4})/ && $1 >= 4000 && $1 <=6200 && ++$c{ $1 };" -e"END{ print qq[$_ => $c{ $_ }] for sort keys %c}" temp.dat 4370 => 1 4512 => 1 5600 => 2 6042 => 1 6085 => 2

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parsing a Flat File and counting occurances of numbers?
by johngg (Canon) on Dec 22, 2006 at 14:24 UTC
    Rather than looking for a four-digit number in the right place then testing to see if it is in the right range, I would construct a regular expression that only pulled out numbers in the right range. I would do this by holding the range in an array and then change the LIST_SEPARATOR variable to the pipe symbol, (regular expression alternation), and use the array inside the qr{...} which interpolates each element separated by the pipe.

    The script below also caters for your #### - 0 requirement if no occurance of a number was found. I used a cut down range to keep the output short and because I use the range array, output is sorted.

    use strict; use warnings; use English q{-no_match_vars}; my @range = (6030 .. 6090); #my @range = (4000 .. 6200); my $rxExtract; { local $LIST_SEPARATOR = q{|}; $rxExtract = qr{bn(@range),}; } my %frequencies = (); while (<DATA>) { next unless m{$rxExtract}; $frequencies{$1} ++; } my $outFile = q{freq.out}; open my $outFH, q{>}, $outFile or die qq{open: $outFile: $!\n}; print $outFH qq{$_ - }, exists $frequencies{$_} ? qq{$frequencies{$_}\n} : qq{0\n} for @range; close $outFH or die qq{close: $outFile: $!\n}; __END__ base1,Thu 21Dec06 08:00:02 ,62bn6085, bytes 608584 base1,Thu 21Dec06 08:00:07 ,63bn5600, bytes 77383 base2,Thu 21Dec06 08:00:18 ,65bn6085, bytes 88373 base5,Thu 21Dec06 08:00:19 ,66bn6042, bytes 388377 base4,Thu 21Dec06 08:03:44 ,81bn4370, bytes 8956003 base6,Thu 21Dec06 08:03:57 ,82bn4512, bytes 7783 base3,Thu 21Dec06 08:01:03 ,06bn5600, bytes 77383

    Here is the output

    When I get some free time I will benchmark how my long alternation regular expression stacks up against the \d{4} and test approach. I hope this is of use.

    Cheers,

    JohnGG

Re: Parsing a Flat File and counting occurances of numbers?
by druud (Sexton) on Dec 22, 2006 at 15:57 UTC
    Just a start:
    perl -wne ' $h{$1}++ if /bn(\d{4}),/ }{$,="\t"; print %h, "\n" ' datafile
    To see the actual code, use:
    perl -MO=Deparse -wne ' $h{$1}++ if /bn(\d{4}),/ }{$,="\t"; print %h, "\n" '

    -- 
    Ruud