in reply to Grep Speeds

This is a nice one. I would take another approach. The 65,000 is not really a problem with decent memory, so I would just read all in one swoop:
use SuperSplit; $AoA = supersplit_open('|','\n',$filename); no strict; for $line ( @AoA ){ $ecl_hash->{$line[0]}->{$line[1]}->{$line[2]}=@line[3..$#line]; } my $sub_ecl = $ecl_hash->{$NETID}->{$date}; some_function( $sub_ecl->{$time} ) for $time (keys %$sub_ecl);
This prevents you from grepping every item of your timearray, which takes quite some time indeed. The above returns something easy to process, but if you want more speed, the following is better and doesn't use supersplit:
open DATA, $filename; my %ecl_hash; my $str = "$NETID|$month/$date/$year"; while( <DATA> ){ next if index($_,$str)<0; chomp; my $item = (split '|' )[2]; $ecl_hash{$item} = [] unless defined $ecl_hash{$item}; push @$ecl_hash{$item}, $_; }
I use index here, because it's faster than matches.

Returns something similar, only the fields have not been separated into arrays. But maybe you don't even want to.

SuperSplit can be found here.

Cheers,

Jeroen
"We are not alone"(FZ)

Update: I ignored the fact that you want to search on the date/ID as well. Fixed in the last codeblock. Will be much faster, of course. What do you *really* want? You want to add an extra line to a report, with some sumary? You wanna make some graphs? Depending on the question, you can decide whether you want the whole thing read in, or just a little piece, or whether you'd better go and put everything in a database, as merlyn suggested...

Update2: after quite some CB, I think ImpalaSS'd better use the supersplit method and use the modified hash to access his data.

Replies are listed 'Best First'.
Re: Re: Grep Speeds
by ImpalaSS (Monk) on Feb 06, 2001 at 21:06 UTC
    Hey,
    Well to answer your question, what the program does is, for each site/per date/per half hour it grabs data from 2 different files. It then tkaes this data, and performs a lot of calculations to print numbers such as dropped calls, and total traffic etc. As of yet, no graphs or charts, all the data is just dumped into the arrays and then a subroutine takes the data and performs calculations and prints the results.

    Dipul
      As long as you don't want to do recalculations (as in deviations from the mean, or percentages from the max) you can stick with a database-less solution.

      If you already know that you want the last lines of your database, why don't you just use ' open DATA, "tail -n $number $filename|";'? That would speed up things, as perl doesn't have to work the whole file.

      Jeroen
      "We are not alone"(FZ)

        Hey,
        Well, i need the whole file, because all the data is some how used within the calculations. The first 3 columns define the data, then the rest actually contains the data which is used in the calculations

        Dipul