in reply to Reliable glob?

For what it's worth, both examples you give work here on a Fedora box with Perl v5.18. The issue has nothing to do with the GLOB_LIMIT flag, since it is off by default. So it seems it's a problem with your system or perhaps an older version of Perl?

What's interesting is that the same results can be obtained just by dropping the trailing curly brace in a glob of any length:

use File::Glob ":bsd_glob"; my $x = 'cff_updated/1_lib/{a,b,c'; my @y=bsd_glob($x); print "Error: $!\n" if &File::Glob::GLOB_ERROR; print(join("\n",@y),"\n");
Which prints the same thing as in your first example, without any error value returned:
cff_updated/1_lib/

You could add some diagnostics as shown in the example above and mentioned in the File::Glob documentation to your test and see if anything is returned. Also note that this File::Glob documentation also says that the ":glob" tag is now discouraged and you should use ":bsd_glob".

So while it doesn't happen on Fedora you seem to be hitting some limit in the length of the input glob and it is being truncated, dropping the trailing curly brace. Of course maybe you'll actually get an error return value if you try the example above which will point to another issue.

As an aside, since you mention an aversion to the built in Glob, the File::Glob documentation also mentions:

Since v5.6.0, Perl's CORE::glob() is implemented in terms of bsd_glob(). Note that they don't share the same prototype--CORE::glob() only accepts a single argument. Due to historical reasons, CORE::glob() will also split its argument on whitespace, treating it as multiple patterns, whereas bsd_glob() considers them as one pattern.

Replies are listed 'Best First'.
Re^2: Reliable glob?
by hepcat72 (Sexton) on Oct 27, 2014 at 19:14 UTC
    Regarding my perl version:

    perl 5, version 16, subversion 2 (v5.16.2) built for darwin-thread-multi-2level

    I expect that a good number of my users will be running on macs, but probably also a lot on Linux. Actually, the problem was reported from a Linux user (Ubuntu, I think) - though it was a much longer string when he used it. I had just started chopping off values from the '{}' pattern until it started to work when I started debugging the issue.

    I tried printing the error as you proposed:

    perl -e '$x="cff_updated/1_lib/{A3DWE.1.Solexa-142587.splice.fastq,A3D +WE.1.Solexa-142588.splice.fastq,A3DWE.1.Solexa-142589.splice.fastq,A3 +DWE.1.Solexa-142590.splice.fastq,A3DWE.1.Solexa-142594.splice.fastq,A +3DWE.1.Solexa-142595.splice.fastq,A3DWE.1.Solexa-14A3DWE.1.Solexa-142 +597.splice.fastq,A3DWE.1.Solexa-142598.splice.fastq,A3DWE.1.Solexa-14 +2599.splice.fastq,A3DWE.1.Solexa-142600.splice.fastq,A3DWE.1.Solexa-1 +42602.splice.fastq,A3DWE.1.Solexa-142603.splice.fastq,A3DWE.1.Solexa- +142605.splice.fastq,A3DWE.1.Solexa-142606.splice.fastq,A3DWE.1.Solexa +-142607.splice.fastq,A3DWE.1.Solexa-142608.splice.fastq,A3DWE.1.Solex +a-142609.splice.fastq,A3DWE.1.Solexa-142610.splice.fastq,A3DWE.1.Sole +xa-142611.splice.fastq,A3DWE.1.Solexa-142612.splice.fastq,A3DWE.1.Sol +exa-142613.splice.fastq,A3DWE.1.Solexa-142614.splice.fastq,A3DWE.1.So +lexa-142615.splice.fastq,A3DWE.1.Solexa-142616.splice.fastq,A3DWE.1.S +olexa-142617.splice.fastq,A3DWE.1.Solexa-142618.splice.fastq,A3DWE.1. +Solexa-142619.splice.fastq,A3DWE.1.Solexa-142621.splice.fastq}.drp.fn +a.lib";use File::Glob ":bsd_glob";@y=bsd_glob($x,GLOB_LIMIT | GLOB_CS +H);print(join("\n",@y),"\n");print "Error: $\!\n" if &File::Glob::GLO +B_ERROR;' cff_updated/1_lib/


    but like you, I didn't get an error. I even tried: "bsd_glob($x,GLOB_LIMIT | GLOB_CSH | GLOB_ERR)".

    My main problem with the built-in glob is that it splits on spaces even if they are escaped, and last I tried, it didn't do anything with glob characters like '?' or '{}' or maybe even character classes. I don't remember what version of perl I was running at the time, but it had to have been at least 5.6.

    Ultimately, it seems like using perl code to expand the '{}' patterns is the only way to mitigate this truncation issue. basically, I did it like this. Anyone have any streamlining/more-comprehensive suggestions?

    #Keep updating an array to be the expansion of a file pattern to #separate files my @expanded = ($nospace_string); #If there exists a '{X,Y,...}' pattern in the string if($nospace_string =~ /\{[^\{\}]+\}/) { #While the first element still has a '{X,Y,...}' pattern #(assuming everything else has the same pattern structure) while($expanded[0] =~ /\{[^\{\}]+\}/) { #Accumulate replaced file patterns in @g my @buffer = (); foreach my $str (@expanded) { #If there's a '{X,Y,...}' pattern, split on ',' if($str =~ /\{([^\{\}]+)\}/) { my $substr = $1; my $before = $`; my $after = $'; my @expansions = split(/,/,$substr); push(@buffer,map {$before . $_ . $after} @expansions); } #Otherwise, push on the whole string else {push(@buffer,$str)} } #Reset @f with the newly expanded file strings so that we #can handle additional '{X,Y,...}' patterns @expanded = @buffer; } } #Pass the newly expanded file strings through return(wantarray ? @expanded : [@expanded]);


    Rob

      Hi Rob,

      I took a look at the C source for bsd_glob and it does indeed truncate the input pattern in all cases, regardless of what options you use. That said, you're hitting an unusually short maximum buffer size. But that size is compiled into the C code and is not changeable at runtime. So as you suggest above you'll have to find some way to work around this if you're trying to support such platforms.

      If you don't mind a CPAN dependency, there are several Perl-only glob implementations on CPAN you could explore. I tried out Text::Glob::Expand and it handled your input string without a problem. Even added a second trailing braces expansion to make it longer, and it was still okay:

      use Text::Glob::Expand; my $x = 'cff_updated/1_lib/{A3DWE.1.Solexa-142587.splice.fastq,A3DWE.1 +.Solexa-142588.splice.fa­stq,A3DWE.1.Solexa-142589.splice.fastq,A3DWE +.1.Solexa-14 2590.splice.fastq,A3DWE.1.Solexa-1­42594.splice.fastq,A3DWE.1.Solexa-1 +42595.splice.fastq,A3DWE.1.Solexa-142596.splice.fastq,A­3DWE.1.Solexa +-142597.splice.fastq,A3DWE.1.Solexa-142598.splice.fastq,A3DWE.1.Solex +a-142599­.splice.fastq,A3DWE.1.Solexa-142600.splice.fastq,A3DWE.1.Sol +exa-142602.splice.fastq,A3DWE.­1.Solexa-142603.splice.fastq,A3DWE.1.S +olexa-142605.splice.fastq,A3DWE.1.Solexa-142606.spli­ce.fastq,A3DWE.1 +.Solexa-142607.splice.fastq,A3DWE.1.Solexa-142608.splice.fastq,A3DWE. +1.Sol­exa-142609.splice.fastq,A3DWE.1.Solexa-142610.splice.fastq,A3DW +E.1.Solexa-142611.splice.fa­stq,A3DWE.1.Solexa-142612.splice.fastq,A3 +DWE.1.Solexa-142613.splice.fastq,A3DWE.1.Solexa-1­42614.splice.fastq, +A3DWE.1.Solexa-142615.splice.fastq,A3DWE.1.Solexa-142616.splice.fastq +,A­3DWE.1.Solexa-142617.splice.fastq,A3DWE.1.Solexa-142618.splice.fas +tq,A3DWE.1.Solexa-142619­.splice.fastq,A3DWE.1.Solexa-142621.splice.f +astq}{.drp,.fna,.lib}'; my @y = map { $_->text } @{Text::Glob::Expand->parse($x)->explode}; print "Number of items: ", scalar @y, $/, join($/,@y);

      In any case, hope you find a relatively painless way to deal with this. Cheers.

        Cool. Thanks for the C-code lookup! Glad I'm not crazy. I have been trying to limit dependencies, but I could look at that module to see how they handle the '{}' expressions. My code should handle multiple occurrences, and assuming that there are no spaces and no nested expressions, I think it should theoretically work in every case. I'm not 100% on that though. Well, it doesn't handle escape curlies, but I'm not even sure a filename could have that... Whoops, yes they can. I just renamed a file to "tmpdelete{test}.txt". I dragged it to my terminal and it pasted it with escape characters. I guess I should make a minor edit to my code:

        #Keep updating an array to be the expansion of a file pattern to #separate files my @expanded = ($nospace_string); #If there exists a '{X,Y,...}' pattern in the string if($nospace_string =~ /(?<!\\)\{.+?(?<!\\)\}/) { #While the first element still has a '{X,Y,...}' pattern #(assuming everything else has the same pattern structure) while($expanded[0] =~ /(?<!\\)\{.+?(?<!\\)\}/) { #Accumulate replaced file patterns in @g my @buffer = (); foreach my $str (@expanded) { #If there's a '{X,Y,...}' pattern, split on ',' if($str =~ /(?<!\\)\{(.+?)(?<!\\)\}/) { my $substr = $1; my $before = $`; my $after = $'; my @expansions = split(/,/,$substr); push(@buffer,map {$before . $_ . $after} @expansions); } #Otherwise, push on the whole string else {push(@buffer,$str)} } #Reset @f with the newly expanded file strings so that we #can handle additional '{X,Y,...}' patterns @expanded = @buffer; } } #Pass the newly expanded file strings through return(wantarray ? @expanded : [@expanded]);


        Although, I just tested that nested expressions are possible too, so I would definitely like to check out that module.