dannyjmh has asked for the wisdom of the Perl Monks concerning the following question:

Hey there good monks! I'm writing a small perl script to create 1000 files ("test_set_$i" here) with 400 gene IDs each (one per line). Then I must search for them in 4 other files (stored in @results). I'd rather use unix grep -f instead of doing a lot of Perl open/close and hash creating and accessing, which would be way too slower. The problem comes after iteration #204, in which the backticks call won't get the output in $tot_length and $? gets -1. Any idea why, please? I'm using Perl v5.14.2 in 64-bit Ubuntu 14.04. Here's that part of the code only:

for my $i(0..999){ + + my $tot_length = `grep -f test_set_$i $ARGV[2] | awk '{cnt += \$2}END{printf "%d", +cnt}'`; `grep -f test_set_$i $results[0] | awk '{print \$1}' | sed 's/>>//'| sort | uniq -c | awk '{print \$1*1000/$tot_length,\$2}' > out`; tie @{$files{'first'}[$i]},'Tie::File', "out"; `grep -f test_set_$i $results[1] | awk -F" : " '{print \$2}' | sort | uniq -c| awk '{print \$1*1000/$tot_length,\$2}' > out`; tie @{$files{'second'}[$i]},'Tie::File', "out"; `grep -f test_set_$i $results[2] | sed 's/ targets sites//' | sed 's/.*>//' | awk -F": " '{cnt[\$1]+=\$2} END{for (x in cnt){print cnt[x]*1000/$tot_length,x}}' > out`; tie @{$files{'third'}[$i]},'Tie::File',"out"; `grep -f test_set_$i $results[3] | awk '{print \$2}' | sort | uniq -c| awk '{print \$1*1000/$tot_length,\$2}' > out`; tie @{$files{'fourth'}[$i]},'Tie::File',"out"; `grep -f test_set_$i $results[4] | awk '{print \$2}' | sort | uniq -c| awk '{print \$1*1000/$tot_length,\$2}' > out`; tie @{$files{'fifth'}[$i]},'Tie::File',"out"; }

Thanks a lot!

Replies are listed 'Best First'.
Re: Perl backticks not returning output
by Corion (Patriarch) on Jul 05, 2015 at 18:04 UTC

    You never do anything with the data returned from the backticks. What did you expect to happen?

    Also, I'm not sure what tie-ing to Tie::File is supposed to do.

Re: Perl backticks not returning output
by 1nickt (Canon) on Jul 05, 2015 at 18:23 UTC

    One of the problems with using backticks to run system executables is that the errors are not returned to your perl program. You would have to combine the stderr of grep, or awk, or whatever is throwing the error, with its stdout, and then parse what you get back.

    1,000 files times 400 lines times four files that each line could appear in is really not that much data. It sure seems like you could read it all into memory and process your search and match, neatly and easily using Perl's built-in tools, and it wouldn't be unacceptably slow.

    I would definitely try that and benchmark it first, before using my perl script to run a bunch of shell commands that are piping to each other.

    Remember: Ne dederis in spiritu molere illegitimi!

      Hey there! You guys are right. I just replaced tying with opening the "out" files and storing the content in arrays. Fast the same and cleaner, and no more "too many open files" problem. Thanks!