bangers has asked for the wisdom of the Perl Monks concerning the following question:

I appreciate this is a little off topic, but a colleague and I've googled for most of this morning with no joy. This is my last port of call, so any help would be gratefully appreciated.

I have a process running on Debian which uses glob to scan for incoming files of a certain type. Currently we use a file spec something like *_{process,read}_* This works great on files with names like:
abc_read_today.dat

The trouble is, that for operational reasons, the format is going to change so that a file could now be called:
abc_leave_today+abc_read_today.dat

We only want to process a file if it has 'read' or 'process' before the first '+'. Does anyone have any ideas on this?

We have looked into doing a glob on the file spec, then splitting the file on the '+' and doing a Perl regex on the file spec. This is the plan of last resort as I am not 100% happy that we can reliably convert the file specs to Perl regexs.

As I said, sorry that this isn’t strictly Perl, but it is in relation to a Perl process.

Replies are listed 'Best First'.
Re: Glob filespec
by blazar (Canon) on May 03, 2006 at 11:16 UTC

    glob plainly emulates shell globbing, and does not work with regexen. You can either use File::Find (or its relatives File::Find::Rule and File::Finder) even if you do not need to recurse, or just opendir, readdir and grep on filenames yourself.

    Or else, now that I think of it, shouldn't *_{process,read}[+_]* work? Well, not exactly, because it would give false positives if "+" were not the first one. Maybe it's enough for you, anyway...
      Thanks for your suggestions. Unfortunately File::File etc won’t work as we don’t want to change several 1,000 file specs ( sorry if some of the restriction seem arbitrary, but there are good reasons for them)

      In the end we decided to use the file spec to pull back a super set of what we wanted. We then converted any '*' into '(.*?)' and did a regex. If $1 contains a '+' then we exclude the file e.g.
      my $spec = ‘*_{process,read}_*’; my $reg = $spec; $reg =~ s/\*/(.*?)/g; my @use; for my $file ( glob $spec ) { $file = m/$reg/; push @use, $file unless $1 =~ /\+/; }
      Note: That’s a simplification of the code, which works, I haven’t tested or run the code above. It’s just for illustration here.

      I suppose in the end it was a PERL question after all.
        Thanks for your suggestions. Unfortunately File::File etc won’t work as we don’t want to change several 1,000 file specs ( sorry if some of the restriction seem arbitrary, but there are good reasons for them)

        To be fair I don't understand your concerns since I don't have the slightest idea about what you mean with "to change several 1,000 file specs". I suspect that you, in turn, did misunderstood the suggestion about File::Find.

        my $spec = ‘*_{process,read}_*’;

        Please use real single quotes: what are you using as an editor?!?

        my @use; for my $file ( glob $spec ) { $file = m/$reg/; push @use, $file unless $1 =~ /\+/; }

        This won't work, since since {process,read} does not do what you seem to think it does, in a regex. You probably want

        my @use=grep !/[^+]*?_(?:process|read)_/, glob $spec;

        But then you should be aware that you're duplicating your efforts, performing two very similar pattern matches one after the other. Although I'm a big advocate of glob whereas I often see people do unnecessary opendirs and readdirs, in this case I feel like suggesting you to follow that path...

        I suppose in the end it was a PERL question after all.

        No, it was not a "PERL" question, since there's not such a thing. Check

        perldoc -q 'difference between "perl" and "Perl"'

        and while you're there, PERL as shibboleth and the Perl community.