in reply to Re: Reliable glob?
in thread Reliable glob?

Regarding my perl version:

perl 5, version 16, subversion 2 (v5.16.2) built for darwin-thread-multi-2level

I expect that a good number of my users will be running on macs, but probably also a lot on Linux. Actually, the problem was reported from a Linux user (Ubuntu, I think) - though it was a much longer string when he used it. I had just started chopping off values from the '{}' pattern until it started to work when I started debugging the issue.

I tried printing the error as you proposed:

perl -e '$x="cff_updated/1_lib/{A3DWE.1.Solexa-142587.splice.fastq,A3D +WE.1.Solexa-142588.splice.fastq,A3DWE.1.Solexa-142589.splice.fastq,A3 +DWE.1.Solexa-142590.splice.fastq,A3DWE.1.Solexa-142594.splice.fastq,A +3DWE.1.Solexa-142595.splice.fastq,A3DWE.1.Solexa-14A3DWE.1.Solexa-142 +597.splice.fastq,A3DWE.1.Solexa-142598.splice.fastq,A3DWE.1.Solexa-14 +2599.splice.fastq,A3DWE.1.Solexa-142600.splice.fastq,A3DWE.1.Solexa-1 +42602.splice.fastq,A3DWE.1.Solexa-142603.splice.fastq,A3DWE.1.Solexa- +142605.splice.fastq,A3DWE.1.Solexa-142606.splice.fastq,A3DWE.1.Solexa +-142607.splice.fastq,A3DWE.1.Solexa-142608.splice.fastq,A3DWE.1.Solex +a-142609.splice.fastq,A3DWE.1.Solexa-142610.splice.fastq,A3DWE.1.Sole +xa-142611.splice.fastq,A3DWE.1.Solexa-142612.splice.fastq,A3DWE.1.Sol +exa-142613.splice.fastq,A3DWE.1.Solexa-142614.splice.fastq,A3DWE.1.So +lexa-142615.splice.fastq,A3DWE.1.Solexa-142616.splice.fastq,A3DWE.1.S +olexa-142617.splice.fastq,A3DWE.1.Solexa-142618.splice.fastq,A3DWE.1. +Solexa-142619.splice.fastq,A3DWE.1.Solexa-142621.splice.fastq}.drp.fn +a.lib";use File::Glob ":bsd_glob";@y=bsd_glob($x,GLOB_LIMIT | GLOB_CS +H);print(join("\n",@y),"\n");print "Error: $\!\n" if &File::Glob::GLO +B_ERROR;' cff_updated/1_lib/


but like you, I didn't get an error. I even tried: "bsd_glob($x,GLOB_LIMIT | GLOB_CSH | GLOB_ERR)".

My main problem with the built-in glob is that it splits on spaces even if they are escaped, and last I tried, it didn't do anything with glob characters like '?' or '{}' or maybe even character classes. I don't remember what version of perl I was running at the time, but it had to have been at least 5.6.

Ultimately, it seems like using perl code to expand the '{}' patterns is the only way to mitigate this truncation issue. basically, I did it like this. Anyone have any streamlining/more-comprehensive suggestions?

#Keep updating an array to be the expansion of a file pattern to #separate files my @expanded = ($nospace_string); #If there exists a '{X,Y,...}' pattern in the string if($nospace_string =~ /\{[^\{\}]+\}/) { #While the first element still has a '{X,Y,...}' pattern #(assuming everything else has the same pattern structure) while($expanded[0] =~ /\{[^\{\}]+\}/) { #Accumulate replaced file patterns in @g my @buffer = (); foreach my $str (@expanded) { #If there's a '{X,Y,...}' pattern, split on ',' if($str =~ /\{([^\{\}]+)\}/) { my $substr = $1; my $before = $`; my $after = $'; my @expansions = split(/,/,$substr); push(@buffer,map {$before . $_ . $after} @expansions); } #Otherwise, push on the whole string else {push(@buffer,$str)} } #Reset @f with the newly expanded file strings so that we #can handle additional '{X,Y,...}' patterns @expanded = @buffer; } } #Pass the newly expanded file strings through return(wantarray ? @expanded : [@expanded]);


Rob

Replies are listed 'Best First'.
Re^3: Reliable glob?
by Loops (Curate) on Oct 28, 2014 at 01:22 UTC

    Hi Rob,

    I took a look at the C source for bsd_glob and it does indeed truncate the input pattern in all cases, regardless of what options you use. That said, you're hitting an unusually short maximum buffer size. But that size is compiled into the C code and is not changeable at runtime. So as you suggest above you'll have to find some way to work around this if you're trying to support such platforms.

    If you don't mind a CPAN dependency, there are several Perl-only glob implementations on CPAN you could explore. I tried out Text::Glob::Expand and it handled your input string without a problem. Even added a second trailing braces expansion to make it longer, and it was still okay:

    use Text::Glob::Expand; my $x = 'cff_updated/1_lib/{A3DWE.1.Solexa-142587.splice.fastq,A3DWE.1 +.Solexa-142588.splice.fa­stq,A3DWE.1.Solexa-142589.splice.fastq,A3DWE +.1.Solexa-14 2590.splice.fastq,A3DWE.1.Solexa-1­42594.splice.fastq,A3DWE.1.Solexa-1 +42595.splice.fastq,A3DWE.1.Solexa-142596.splice.fastq,A­3DWE.1.Solexa +-142597.splice.fastq,A3DWE.1.Solexa-142598.splice.fastq,A3DWE.1.Solex +a-142599­.splice.fastq,A3DWE.1.Solexa-142600.splice.fastq,A3DWE.1.Sol +exa-142602.splice.fastq,A3DWE.­1.Solexa-142603.splice.fastq,A3DWE.1.S +olexa-142605.splice.fastq,A3DWE.1.Solexa-142606.spli­ce.fastq,A3DWE.1 +.Solexa-142607.splice.fastq,A3DWE.1.Solexa-142608.splice.fastq,A3DWE. +1.Sol­exa-142609.splice.fastq,A3DWE.1.Solexa-142610.splice.fastq,A3DW +E.1.Solexa-142611.splice.fa­stq,A3DWE.1.Solexa-142612.splice.fastq,A3 +DWE.1.Solexa-142613.splice.fastq,A3DWE.1.Solexa-1­42614.splice.fastq, +A3DWE.1.Solexa-142615.splice.fastq,A3DWE.1.Solexa-142616.splice.fastq +,A­3DWE.1.Solexa-142617.splice.fastq,A3DWE.1.Solexa-142618.splice.fas +tq,A3DWE.1.Solexa-142619­.splice.fastq,A3DWE.1.Solexa-142621.splice.f +astq}{.drp,.fna,.lib}'; my @y = map { $_->text } @{Text::Glob::Expand->parse($x)->explode}; print "Number of items: ", scalar @y, $/, join($/,@y);

    In any case, hope you find a relatively painless way to deal with this. Cheers.

      Cool. Thanks for the C-code lookup! Glad I'm not crazy. I have been trying to limit dependencies, but I could look at that module to see how they handle the '{}' expressions. My code should handle multiple occurrences, and assuming that there are no spaces and no nested expressions, I think it should theoretically work in every case. I'm not 100% on that though. Well, it doesn't handle escape curlies, but I'm not even sure a filename could have that... Whoops, yes they can. I just renamed a file to "tmpdelete{test}.txt". I dragged it to my terminal and it pasted it with escape characters. I guess I should make a minor edit to my code:

      #Keep updating an array to be the expansion of a file pattern to #separate files my @expanded = ($nospace_string); #If there exists a '{X,Y,...}' pattern in the string if($nospace_string =~ /(?<!\\)\{.+?(?<!\\)\}/) { #While the first element still has a '{X,Y,...}' pattern #(assuming everything else has the same pattern structure) while($expanded[0] =~ /(?<!\\)\{.+?(?<!\\)\}/) { #Accumulate replaced file patterns in @g my @buffer = (); foreach my $str (@expanded) { #If there's a '{X,Y,...}' pattern, split on ',' if($str =~ /(?<!\\)\{(.+?)(?<!\\)\}/) { my $substr = $1; my $before = $`; my $after = $'; my @expansions = split(/,/,$substr); push(@buffer,map {$before . $_ . $after} @expansions); } #Otherwise, push on the whole string else {push(@buffer,$str)} } #Reset @f with the newly expanded file strings so that we #can handle additional '{X,Y,...}' patterns @expanded = @buffer; } } #Pass the newly expanded file strings through return(wantarray ? @expanded : [@expanded]);


      Although, I just tested that nested expressions are possible too, so I would definitely like to check out that module.