Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Searching Files

by rchou2 (Novice)
on Jul 24, 2002 at 18:56 UTC ( #184980=perlquestion: print w/replies, xml ) Need Help??

rchou2 has asked for the wisdom of the Perl Monks concerning the following question:

I currently have a file named file.200207 with data such as
john 2 4 chris 3 5 lisa 1 3
I am trying to search the files from the 3 previous months, file.200204, file.200205, file.200206 and keep a count on the total number of times each name in file.200207 shows up in the three files from the last three months...any suggestions?

Replies are listed 'Best First'.
break up the problem
by cebrown (Pilgrim) on Jul 24, 2002 at 19:21 UTC
    Hi -- in the interest of helping you help yourself I would suggest you break up the problem into subproblems.

    1. Read in this month's file and use the split function to find the first word on each line, then update a hash, using that first word as the key.
    2. Determine the name of the file for each of the previous three months (sort of tricky -- think about in February 2003 -- you need to get 200211, 200212, and 200301).
    3. For each of those files:
      • create a new hash, let's call it the "counter" hash
      • read in each record and split to find the first word.
      • if the word is in the "this month" hash, increment your counter hash.
      • when you're done reading a given file, print out the counter hash.
    Each of these subproblems is pretty simple on its own, so start there and report back if you're struggling.
      cebrown, I tried doing this...... I already created a hash dataname with the names......
      foreach $Name (sort keys %dataname) { for (APRIL, MAY, JUNE) { if ($count{$Name}) { $count{$Name}++; } } printf OUTFILE "%-50s %d\n", $Name, $count{$Interface_Name};
      but it just seems to be adding 1 count after each loop...any suggestions?
        Are you sure your code is doing what you think it's doing? When I read it, it's saying
        1. For every name in this list, do the following:
          1. If I have a non-zero number in the location $count{$Name}, then I need to add 1 to the location specified by $count{Name}
        2. Print stuff out to somewhere.

        That doesn't seem like that's what you want to do. In fact, it's doing exactly what you're complaining it's doing. :-)

        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        Well, "APRIL, MAY, JUNE" is just a list of three strings. Perl is iterating across each element in that list and doing the stuff inside the for loop exactly three times, once for each string.

        Elsewhere in the program I assume you did something like open APRIL, "<200204";. If that's the case, you need to use the special angle bracket operator to iterate through each record in a file, like while(<APRIL>) {do stuff}. Note also that while you are looping through each file you still need to perform the split on each record in order to find the name that's on a given record.

Re: Searching Files
by fuzzyping (Chaplain) on Jul 24, 2002 at 19:29 UTC
    Increment a hash value for each hash key (where the key is your date or filename in a HoH). Note that the use of Data::Dumper is just to show you the structure of the Hash of Hashes. It's up to you to figure out what to do with the data.

    #!/usr/bin/perl use Data::Dumper; use strict; my %count; for my $i (3..6) { my $filename = "file.20020$i"; open(DATA,$filename); while (<DATA>) { my ($name) = split(/\s+/,$_); $count{$filename}{$name}++; } } print Dumper %count;

    $VAR1 = 'file.200206'; $VAR2 = { 'john' => '1', 'chris' => '1', 'lisa' => '1' };
    Update: Note about Data::Dumper, and removed parantheses from around split().
Re: Searching Files
by dimmesdale (Friar) on Jul 24, 2002 at 19:14 UTC
    open(FH,'file.200207') or die 'hlpfl err msg'; while(<FH>) { # get name $name = (split /\s+/, $_)[0]; # init count $info->{$name} = 0; } for('file.200204','file.200205','file.200206') { open(FH, $_) or die 'nther hlpfl err msg'; while(<FH>) { # get name $name = (split /\s+/, $_)[0]; # update count if(defined $info->{$name}) { $info->{$name}++ } } }

    Just a suggestion. Haven't tested it, so use with caution.

    update: Hmm... I see fuzzyping has posted a very valid solution to your problem too ... only he keeps track of the name appearing in individual files, and I keep the entire sum. I didn't even think about that-- I don't know what you meant in your original post (it looks like it could go either way from re-reading it), but there's yet another disclaimer :)

Re: Searching Files
by PhiRatE (Monk) on Jul 24, 2002 at 22:31 UTC
    Other people have already posted reasonable solutions so I'll concentrate on the meta-problem here.

    You keep posting questions like this, "I have file of format x with data y and I need to sort/count/total/modify it in way z". I think what you really need to do is likely to involve getting a book on SQL and installing a database. All these problems could have been solved in much more effective ways using simple SQL statements.

    In this particular instance, what the poor monks responding to your question took 10+ lines to do could be done in one:

    select,count(*) from datastore as n1, datastore as n2 where n2 and n2.yearmonth=200207 and n1.yearmonth<200207 group b +y;

    Databases really come into their own with tabular data like this and if you're working with it a lot I *highly* recommend getting the necessary books, reading the online tutorials and installing one of the freely available RDBMSs (MySQL, PostGreSQL). That way the only thing you'll have to get perl to do is load the data in, from there you can play to your hearts content.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://184980]
Approved by dimmesdale
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2023-10-02 08:18 GMT
Find Nodes?
    Voting Booth?

    No recent polls found