Dr.Avocado has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,
I am relatively inexperienced in the ways of Perl, and I have a task that I need help with.
I have a file that contains data in columns headed with: "Score" "Points" "Time" "Points" "Record" "Size" "Points" "Age" "Points" "Difficulty" "Size" "Points" "Name" (in that order)
Each column is populated with integer values, except for "Name".
I need to write a script that can search for the values of the first "Points" column after "Time" and the first "Points" column after "Difficulty" for rows containing 'intrepid' and 'triumph' in the Name column (may not be the full name). There could be hundreds of entries the file I'm searching, but only one will have a name containing triumph/victory. If it's any help, the "names" containing 'triumph' and 'intrepid' will always come directly before a row with 'Total' in the "Name" field.
As the end result, I want an output file that looks like:

team_triumph (this was the value in the "Name" column)
Time: 58 Difficulty: 23 (Values from first "Points" columns after "Time" and "Difficulty")

team_intrepid
Time: 6 Difficulty: 344

I'm not very experienced with Perl, so I'm not even sure how hard of a job this is. Could anyone give me a hand? I'd appreciate it greatly.

Replies are listed 'Best First'.
Re: Searching for Certain Values
by saintly (Scribe) on Jul 30, 2007 at 17:57 UTC
    Well, you're in luck. This is the kind of thing that Perl excels at. I don't know what separates the data in the columns, but assuming it's some sort of spacing (or tabs), you could use something like:
    # Open the file containing data or abort with error message open(my $fh, "<", "some_file.txt") || die "Can't open file: $!"; # Run through all lines of the file, one by one while(my $line = <$fh>) { # Break up the line on whitespace, assign columns to vars my( $score,$scorePoints, $time,$timePoints, $record,$recordPoints, $size,$sizePoints, $age,$agePoints, $diff,$diffPoints, $size2,$size2Points, $name ) = split(/\s+/,$line,13); # Check to see if name matches if($name =~ /(intrepid|triumph)/) { print "$name\n", "Time: $timePoints, Difficulty: $diffPoints\n\n"; } }
    That is, break up the line on spaces, assign each of the columns to variables, then print something if the data matches a test. Since your data is very regular, the code doesn't have to be complicated. You can make some modifications for simplicity:
    # Assign only the columns you're interested in my ($timePoints,$diffPoints,$name) = +(split(/\s+/,$line,13))[3,11,1 +4];
    Or to eliminate possibly-bogus data:
    # Ensure the line consists of 12 integers + something my ( ... ) = ($line =~ /^\s*(?:(\d+)\s*){12}(.*)/);
    Or for speed (don't bother splitting lines unless they have intrepid/triumph on them somewhere):
    next unless $line =~ /(intrepid|triumph)/; my ( ... ) = ...; print ....;
      if($name =~ /(intrepid|triumph)/) {

      Captures are needlessly slow, and you're not actually checking for equality. Better:

      if($name =~ /^(?:intrepid|triumph)\z/) {

      And if you're so worried about speed, I think doing string comparisons would be even faster.

      if($name eq 'intrepid' || $name eq 'triumph') {
      Thanks for the help. I'll try it out and get back to you. It looks like it'll work.

      And the columns are separated in the form " Score | Points | Time | Points | etc." with both spaces and |. What do I have to change to factor in the |?
        Sorry, I wrote that last comment, but I forgot to log in at the time.
Re: Searching for Certain Values
by Dr.Avocado (Novice) on Jul 30, 2007 at 21:49 UTC
    I seem to be running into a problem. Whenever I try to execute your script, I get a "Too many arguments for open at datasearch.pl line 3, near ""data.txt") "
    What am I doing wrong?

    My current code is pretty much what Saintly gave me:
    #!/usr/local/bin/perl open(my $fh, "<", "data.txt") || die "Can't open file: $!"; # Run through all lines of the file, one by one while(my $line = <$fh>) { # Break up the line on whitespace, assign columns to vars my( $score,$scorePoints, $time,$timePoints, $record,$recordPoints, $size,$sizePoints, $age,$agePoints, $diff,$diffPoints, $size2,$size2Points, $name ) = split(/\s+/,$line,13); # Check to see if name matches if($name =~ /(intrepid|triumph)/) { print "$name\n", "Time: $timePoints, Difficulty: $diffPoints\n\n"; } }
      You are probably running an elderly version of Perl. What do you get when you run /usr/local/bin/perl -v on the command line? The three-argument form of open was introduced in Perl 5.6 according to perl56delta as were lexical filehandles (the my $fh). If you are running an earlier version then change

      open(my $fh, "<", "data.txt") || die "Can't open file: $!";

      to

      open (FH, '<data.txt') || die "Can't open file: $!";

      and

      while(my $line = <$fh>) {

      to

      while(my $line = <FH>) {

      You may want to consider upgrading your version of Perl as 5.005 is positively ancient.

      Cheers,

      JohnGG

        That ought to do it, as I am running the ancient Perl v. 5.005. Thanks! I'll get an update ASAP.
      BTW, here is a sample of a file I would need to search:
      Score | Points | Time | Points | Record | Size | Points | Age | Points + | Difficulty | Size | Points | Name 4 |15 |356 |17 |45 |14 |45 |24 |12 + |3 |1 |34 |team A 6 |24 |354 |45 |345 |53 |25 |47 |34 + |3 |3 |45 |team B 3 |18 |303 |34 |234 |32 |48 |67 |32 + |23 |4 |22 |team C 7 |13 |322 |26 |33 |56 |57 |46 |23 + |3 |1 |14 |team D 5 |10 |353 |24 |58 |82 |35 |33 |12 + |5 |2 |35 |team E 5 |30 |264 |48 |26 |23 |23 |73 |23 + |5 |2 |65 |team F 6 |18 |363 |58* |39 |71 |35 |75 |46 + |2 |4 |23* |team_triumph ---------------------------------------------------------------------- +------------------------------------- x |x |x |x |x |x |x |x |x + |x |x |x |Total ---------------------------------------------------------------------- +------------------------------------- Score | Points | Time | Points | Record | Size | Points | Age | Points + | Difficulty | Size | Points | Name 2 |32 |443 |34 |464 |38 |89 |9 |43 + |3 |4 |353 |Team C 5 |24 |343 |543 |923 |478 |0 |35 |3 + |3 |2 |39 |Team B 6 |5 |263 |232 |92 |43 |48 |96 |46 + |4 |52 |78 |team_victory ---------------------------------------------------------------------- +------------------------------------- x |x |x |x |x |x |x |x |x + |x |x |x |Total ---------------------------------------------------------------------- +------------------------------------- Score | Points | Time | Points | Record | Size | Points | Age | Points + | Difficulty | Size | Points | Name 5 |76 |366 |37 |593 |453 |34 |68 |65 + |35 |4 |54 |Team D 3 |34 |235 |102 |967 |290 |2 |54 |2 + |3 |6 |3 |Team C 2 |643 |643 |34 |291 |10 |2 |43 |53 + |3 |7 |46 |Team F 5 |43 |362 |2 |152 |35 |35 |24 |5 + |2 |43 |7 |Team G 6 |7 |643 |6* |45 |0 |97 |75 |883 + |1 |2 |344* |team_intrepid ---------------------------------------------------------------------- +------------------------------------- x |x |x |x |x |x |x |x |x + |x |x |x |Total ---------------------------------------------------------------------- +-------------------------------------