Elbarto has asked for the wisdom of the Perl Monks concerning the following question:

Hello, wise Monks! My problem is the following: i have a file with several serial numbers, e.g. serials.txt. Furthermore i have about 700MB worth of database-extracts covering the last 7 years. The task i want to accomplish now looks like this: Open the serials.txt, read the contents to an array, iterate over this array and with every serial in the file do this:
exec("grep $serial logs* > $serial.log");
where logs* are the files i want to grep through. this works fine when executed with a single serial, but as soon as i try to do it in a loop, it doesnt do anything. It looks like this:
open(SERIALS, "serials.txt") or die("can't open serials.txt"); @serials = <SERIALS>; for my $serial (@serials) { print "now processing $serial"; exec("grep $serial logs* > $serial.log"); }
I fear that the solution is painfully obvious, but i'm stuck here now for a good while. Oh, and btw. Super Search didn't help me, i really tried :) greetings, Elbarto

Replies are listed 'Best First'.
Re: System Call for various serials in a for loop
by roboticus (Chancellor) on Jun 28, 2007 at 10:56 UTC

    Elbarto:

    You ought to read the man page for grep. Not every problem is a perl problem:

    grep -F -f serials.txt logs* >serials.log

    if you want to use perl, then you don't really need grep, either:

    # Code to load serial numbers into @serials goes here... while (<>) { for my $serial (@serials) { if (0 >= index($_, $serial) { print $_; next; } } }

    ...roboticus

      If I could ++ 10 times that mention of grep -F -f..., I would.

      I didn't know about that grep option.

      For the perl approach, depending on the number of "serials", Regex::PreSuf might be able to turn @serials into one regex, which would likely be very ugly, but likely faster.

      However, it's likely to be impossible to beat doing the grep -F -f and then post-processing the results file to split into separate serial files - for example, if the serial column is easily found, the sort program could sort the result by serial #, making the split very simple.

      For bulk applications, /usr/bin/grep and /usr/bin/sort can sometimes beat perl's speed by large margins, so it's worth trying them.


      Mike
Re: System Call for various serials in a for loop
by lima1 (Curate) on Jun 28, 2007 at 09:18 UTC
    Use system instead of exec. And you probably want to remove the newline from the serial:
    #!/usr/bin/perl use strict; use warnings; open(SERIALS, "serials.txt") or die("can't open serials.txt"); my @serials = <SERIALS>; for my $serial (@serials) { chomp $serial; print "now processing $serial"; system("echo grep $serial logs* > $serial.log"); }
Re: System Call for various serials in a for loop
by citromatik (Curate) on Jun 28, 2007 at 09:19 UTC

    from perldoc -f exec:

    exec LIST exec PROGRAM LIST The "exec" function executes a system command and never returns-- u +se "system" instead of "exec" if you want it to return. [...] Since it's a common mistake to use exec instead of system, Perl warns +you if there is a following statement which isn't die, warn, or exit +(if -w is set - but you always do that).

    You should always turn on -w flag or "use warnings" (moreover if you are trying to debug your code!)

    perl -e 'use warnings;exec ("ls");print "Hello\n"' Statement unlikely to be reached at -e line 1. (Maybe you meant system() when you said exec()?)

    citromatik

Re: System Call for various serials in a for loop
by Elbarto (Initiate) on Jun 28, 2007 at 12:15 UTC
    Thanks for all the Comments!

    First, the chomping did its part, now my program at least works with each serial i put in the text file instead of just the first.

    Second, using system instead of exec, well i started with system, then read that it didn't return from execution (or at least i thought i read it) and then turned to exec. Seemingly, i was wrong :)

    Third, i tried the -F -f option with grep, but this did not achieve my goal.

    My new problem is this: when i execute the program, the output looks like:
    Now Processing #1 .log: No such file or directory Now Processing #2 .log: No such file or directory Now Processing #3 .log: No such file or directory
    and so on. The rogue line is this one (clearly):
    system("echo grep $serial logs* > $serial.log")
    This line used to work when i used exec, but with system it doesn't. Any hints on how to solve this problem?

    And by the way: the speed you guys show is amazing :) I never expected so much as simple replies until tomorrow, but you came up with real solutions almost instantly. Respect!

    greetings, Elbarto
      Elbarto,

      Those system errors are strange. Are you running under use warnings and use strict?

      My guess is that in your real code, $serial is typo'd, perhaps?


      Mike
        Mike,

        2xYes, 1xNo.

        Theses errors indeed ARE strange,

        i run under use warnings and strict and

        NO, i just triple checked, $serial is not typo'd (i use vim with syntax completion after all, just to avoid these errors (and for comfort reasons :))).