Parsing a Flat File and counting occurances of numbers?

batcater98 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing a Flat File and counting occurances of numbers? by jettero (Monsignor) on Dec 22, 2006 at 13:43 UTC
Hrm, looking at your post history it would seem all your questions so far are somehow related to lines like these. Are you making progress I hope? The following code demonstrates the basic ideas you'll probably employ, but you'll need to make a few changes too I think. `my %range_counts = (); my %counts = (); while( $entire_file =~ m/(\d+)bn(\d+), bytes/sg ) { $range_counts{'4kset'} ++ if $2 > 4000 and $2 < 6000; $counts{$2} ++ if $2 > 4000 and $2 < 6000; } print "there were $counts{'4kset'} 4000-6000 lines\n";` [download] I suspect you'd also benifit from pouring the results into a database of some kind. Then you could run queries like `SELECT sum(something_column), avg(something_column) FROM data WHERE something>something and something < something` and report on all kinds of interesting details. -Paul	[reply] [d/l] [select]
Re^2: Parsing a Flat File and counting occurances of numbers? by chargrill (Parson) on Dec 22, 2006 at 14:00 UTC
Hi jettero: Indeed, it looks like batcater98 is making progess. To my eye, the output file shown above looks like it could be a straight dump of the output from either my or imp's recent attempts to help batcater98 solve their problem. Oh how nice it would've been to have the full spec up front, so that either imp or I could have added this simple addition! Instead, it seems like poor batcater98 has taken one of those referenced programs, run it to produce an output file, and is trying to write another program to produce these (relatively trivial to add) additional results. Your ~~simple loop~~ code above would definitely cut the mustard for the spec presented above, but perhaps it would be a Better^™ solution overall to shoehorn something that would fit the spec into the previous solution batcater98 has (apparently without much gratitude) decided to use. Update: Just wanted to make it clear that though I was replying to jettero, it was not my intention to berate him, as I feared that the phrase I updated might indicate. --chargrill `s*lil; $=join'',sort split q; s;.;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$,$/)` [download]	[reply] [d/l]
Re^3: Parsing a Flat File and counting occurances of numbers? by batcater98 (Acolyte) on Dec 22, 2006 at 14:12 UTC
I do appriciate everyones help in my learning curve toward perl. I am familiar with the language, but do not have the level of deep knowledge that you and others here have. If I am tapping the knowledge base in the wrong fashion, I do appologize. I am attempting to get better and hope to someday be able to return the favors to others in the room as they have helped me out in the begining. Again, Thanks for the help. batcater98	[reply]
Re: Parsing a Flat File and counting occurances of numbers? by Fletch (Bishop) on Dec 22, 2006 at 13:43 UTC
Open the input file Read the file line by line For each line use a regex to pull off one or more digits at the end of the line use a hash to keep track of how many times that number's occured if the regex failed to find a number, increment a counter `$missing_count` at the end of the file, sort the list of keys in the hash by their value and print out your results table If you need more than that, show what you've tried so far.	[reply] [d/l]
Re: Parsing a Flat File and counting occurances of numbers? by BrowserUk (Patriarch) on Dec 22, 2006 at 13:48 UTC
The following one-liner would do it. (Wrapped for posting. Quotes are shell dependant.) `perl -nwle"/bn(\d{4})/ && $1 >= 4000 && $1 <=6200 && ++$c{ $1 };" -e"END{ print qq[$_ => $c{ $_ }] for sort keys %c}" temp.dat 4370 => 1 4512 => 1 5600 => 2 6042 => 1 6085 => 2` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Parsing a Flat File and counting occurances of numbers? by johngg (Canon) on Dec 22, 2006 at 14:24 UTC
Rather than looking for a four-digit number in the right place then testing to see if it is in the right range, I would construct a regular expression that only pulled out numbers in the right range. I would do this by holding the range in an array and then change the `LIST_SEPARATOR` variable to the pipe symbol, (regular expression alternation), and use the array inside the `qr{...}` which interpolates each element separated by the pipe. The script below also caters for your #### - 0 requirement if no occurance of a number was found. I used a cut down range to keep the output short and because I use the range array, output is sorted. use strict; use warnings; use English q{-no_match_vars}; my @range = (6030 .. 6090); #my @range = (4000 .. 6200); my $rxExtract; { local $LIST_SEPARATOR = q{\|}; $rxExtract = qr{bn(@range),}; } my %frequencies = (); while (<DATA>) { next unless m{$rxExtract}; $frequencies{$1} ++; } my $outFile = q{freq.out}; open my $outFH, q{>}, $outFile or die qq{open: $outFile: $!\n}; print $outFH qq{$_ - }, exists $frequencies{$_} ? qq{$frequencies{$_}\n} : qq{0\n} for @range; close $outFH or die qq{close: $outFile: $!\n}; __END__ base1,Thu 21Dec06 08:00:02 ,62bn6085, bytes 608584 base1,Thu 21Dec06 08:00:07 ,63bn5600, bytes 77383 base2,Thu 21Dec06 08:00:18 ,65bn6085, bytes 88373 base5,Thu 21Dec06 08:00:19 ,66bn6042, bytes 388377 base4,Thu 21Dec06 08:03:44 ,81bn4370, bytes 8956003 base6,Thu 21Dec06 08:03:57 ,82bn4512, bytes 7783 base3,Thu 21Dec06 08:01:03 ,06bn5600, bytes 77383 [download] Here is the output Read more... (751 Bytes) When I get some free time I will benchmark how my long alternation regular expression stacks up against the `\d{4}` and test approach. I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re: Parsing a Flat File and counting occurances of numbers? by druud (Sexton) on Dec 22, 2006 at 15:57 UTC
Just a start: `perl -wne ' $h{$1}++ if /bn(\d{4}),/ }{$,="\t"; print %h, "\n" ' datafile` [download] To see the actual code, use: `perl -MO=Deparse -wne ' $h{$1}++ if /bn(\d{4}),/ }{$,="\t"; print %h, "\n" '` [download] -- Ruud	[reply] [d/l] [select]