hepcat72 has asked for the wisdom of the Perl Monks concerning the following question:
I became disenchanted with perl's glob function quite some time ago and started using bsd_glob instead. It seems to be able to handle the strings it's given a lot better and more comprehensively, but I encountered some unexpected results yesterday. I implemented a work-around, but I'd like the implementation to be more simple.
I debugged an issue where, when the script was called and given a long glob pattern, it was truncating it to just the preceding directory. So I just wrote a preprocessing routine to pre-expand any patterns containing '{...}' to hopefully shorten the string sent in on the command line before calling bsd_glob on it.
As far as I understand this issue - and I could be slightly off - the posix flag GLOB_LIMIT (or *ARG_MAX?) is set too low to be able to handle a string that was successfully submitted to the script.
By implementing my preprocessing of the string that comes from the command line, I was able to break anything up that had a '{...}' pattern in it into shorter pieces that I then spoon-feed to bsd_glob a bite at a time.
I feel like there's got to be a better solution. If the script is receiving the whole string from the command line, shouldn't the limits of bsd_glob be the same as the surround shell - why would it not be able to handle as long of a string as the surrounding shell script can give it?
Here's a toy example which shows it not work in the first call, but remove 1 bit and it does work in the second call:
>perl -e '$x="cff_updated/1_lib/{A3DWE.1.Solexa-142587.splice.fastq,A3 +DWE.1.Solexa-142588.splice.fastq,A3DWE.1.Solexa-142589.splice.fastq,A +3DWE.1.Solexa-142590.splice.fastq,A3DWE.1.Solexa-142594.splice.fastq, +A3DWE.1.Solexa-142595.splice.fastq,A3DWE.1.Solexa-142596.splice.fastq +,A3DWE.1.Solexa-142597.splice.fastq,A3DWE.1.Solexa-142598.splice.fast +q,A3DWE.1.Solexa-142599.splice.fastq,A3DWE.1.Solexa-142600.splice.fas +tq,A3DWE.1.Solexa-142602.splice.fastq,A3DWE.1.Solexa-142603.splice.fa +stq,A3DWE.1.Solexa-142605.splice.fastq,A3DWE.1.Solexa-142606.splice.f +astq,A3DWE.1.Solexa-142607.splice.fastq,A3DWE.1.Solexa-142608.splice. +fastq,A3DWE.1.Solexa-142609.splice.fastq,A3DWE.1.Solexa-142610.splice +.fastq,A3DWE.1.Solexa-142611.splice.fastq,A3DWE.1.Solexa-142612.splic +e.fastq,A3DWE.1.Solexa-142613.splice.fastq,A3DWE.1.Solexa-142614.spli +ce.fastq,A3DWE.1.Solexa-142615.splice.fastq,A3DWE.1.Solexa-142616.spl +ice.fastq,A3DWE.1.Solexa-142617.splice.fastq,A3DWE.1.Solexa-142618.sp +lice.fastq,A3DWE.1.Solexa-142619.splice.fastq,A3DWE.1.Solexa-142621.s +plice.fastq}.drp.fna.lib";use File::Glob ":glob";@y=bsd_glob($x);prin +t(join("\n",@y),"\n");' cff_updated/1_lib/ >perl -e '$x="cff_updated/1_lib/{A3DWE.1.Solexa-142587.splice.fastq,A3 +DWE.1.Solexa-142588.splice.fastq,A3DWE.1.Solexa-142589.splice.fastq,A +3DWE.1.Solexa-142590.splice.fastq,A3DWE.1.Solexa-142594.splice.fastq, +A3DWE.1.Solexa-142595.splice.fastq,A3DWE.1.Solexa-142596.splice.fastq +,A3DWE.1.Solexa-142597.splice.fastq,A3DWE.1.Solexa-142598.splice.fast +q,A3DWE.1.Solexa-142599.splice.fastq,A3DWE.1.Solexa-142600.splice.fas +tq,A3DWE.1.Solexa-142602.splice.fastq,A3DWE.1.Solexa-142603.splice.fa +stq,A3DWE.1.Solexa-142605.splice.fastq,A3DWE.1.Solexa-142606.splice.f +astq,A3DWE.1.Solexa-142607.splice.fastq,A3DWE.1.Solexa-142608.splice. +fastq,A3DWE.1.Solexa-142609.splice.fastq,A3DWE.1.Solexa-142610.splice +.fastq,A3DWE.1.Solexa-142611.splice.fastq,A3DWE.1.Solexa-142612.splic +e.fastq,A3DWE.1.Solexa-142613.splice.fastq,A3DWE.1.Solexa-142614.spli +ce.fastq,A3DWE.1.Solexa-142615.splice.fastq,A3DWE.1.Solexa-142616.spl +ice.fastq,A3DWE.1.Solexa-142617.splice.fastq,A3DWE.1.Solexa-142618.sp +lice.fastq,A3DWE.1.Solexa-142619.splice.fastq}.drp.fna.lib";use File: +:Glob ":glob";@y=bsd_glob($x);print(join("\n",@y),"\n");' cff_updated/1_lib/A3DWE.1.Solexa-142587.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142588.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142589.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142590.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142594.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142595.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142596.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142597.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142598.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142599.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142600.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142602.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142603.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142605.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142606.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142607.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142608.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142609.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142610.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142611.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142612.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142613.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142614.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142615.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142616.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142617.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142618.splice.fastq.drp.fna.lib cff_updated/1_lib/A3DWE.1.Solexa-142619.splice.fastq.drp.fna.lib
Perhaps bsd_glob is behaving as it should and the design of my surrounding script is what needs to be fixed...
The surrounding shell script is part of a distributed package - a pipeline of analysis commands, each of which is a perl script. I had decided to handle the input files explicitly instead of supplying a glob pattern involving an '*' so that users could analyze a subset of their data files. Each step in the pipeline adds an extension to each of the files, which is why I chose to use the '{...}' glob pattern, so that I could do this: -d "{$STUBS}.extension". I wrapped it in double quotes so that the shell wouldn't expand them into a space-separated list and thus everything would be identified as an argument to the preceding flag. Like this:
script.pl -i "dir1/{$STUBS}.drp.fna.lib.n0s.cands" -d "dir2/{$STUBS}.drp.fna.lib"Assuming that I might eventually encounter a command that exceeds an arbitrary command-line length limit, what would be a better way to submit a series of files, each with an added extension from the output of the previous step?
I await the wisdom of the perl monks.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Reliable glob?
by Loops (Curate) on Oct 21, 2014 at 23:48 UTC | |
by hepcat72 (Sexton) on Oct 27, 2014 at 19:14 UTC | |
by Loops (Curate) on Oct 28, 2014 at 01:22 UTC | |
by hepcat72 (Sexton) on Oct 28, 2014 at 15:02 UTC | |
|
Re: Reliable glob?
by Anonymous Monk on Oct 21, 2014 at 20:44 UTC | |
by hepcat72 (Sexton) on Oct 27, 2014 at 18:19 UTC |