songahji has asked for the wisdom of the Perl Monks concerning the following question:

Let's say we need to process particular files from @files. The particular files are *foo3.txt, *foo4.txt, *bar1.txt, *bar2.txt
foreach (@files) { process_file ($_) if /(?:(?:foo[34])|(?:bar[12]))\.txt/io }
Please let me know that there is better way to do it !

Cheers,
Hanny J

Replies are listed 'Best First'.
Re: Regex Tuning
by Roy Johnson (Monsignor) on Apr 28, 2005 at 14:59 UTC
    Anchor the end. And you don't need the inner sets of parens.
    /(?:foo[34]|bar[12])\.txt$/io

    Caution: Contents may have been coded under pressure.
Re: Regex Tuning
by salva (Canon) on Apr 28, 2005 at 15:12 UTC
    this smells like premature optimization...
    foreach (@files) { process_file $_ if (/foo[34]\.txt$/i or /bar[12]\.txt$/i) }
    it's easier to read and to maintain, and I am sure you will not notice any speed lost, specially if you are going to access the files after it.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Regex Tuning /(?:(?:foo[34])|(?:bar[12]))\.txt/io
by demerphq (Chancellor) on Apr 28, 2005 at 17:14 UTC

    For perls prior to 5.9.2 your regex is pretty well as good as it will get. Splitting it up as some folks suggest wont improve the situation IMO as the is an fixed string at the end that will have to be matched regardless. Actually there is a small problem there, you should $ or \z anchor the string on the right hand side or you might match a filename like "foo3.txt.bak" which probably isnt what you want.

    With Perl 5.9.2 and later you will see better performance with

    process_file ($_) if /(?:foo3|foo4|bar1|bar2)\.txt$/i;

    Note the addition of the $ anchor and the removal of the /o modifier (which doesnt do anything for this regex anyway.)

    ---
    demerphq

Re: Regex Tuning
by Jaap (Curate) on Apr 28, 2005 at 14:56 UTC
    I don't see what's so wrong with your code but you could also grep it. Something like (untested):
    my @result = map { process_file($_) } grep (/(?:foo[34]|bar[12])\.txt/ +i, @files);
      .oO( will grep be ok on handling 10000 files?)
        It will probably be as ok as @files is. It's no worse than a temporary copy of @files (and potentially much smaller, depending on how many match). The questionable thing is the use of map: there's no indication that the OP wants to collect output from process_file (one would think that output would be handled at the time of the call, rather than in bulk afterward).
        process_file($_) for grep /(?:foo[34]|bar[12])\.txt$/io, @files;
        should be fine.

        Caution: Contents may have been coded under pressure.
Re: Regex Tuning /(?:(?:foo[34])|(?:bar[12]))\.txt/io
by TedPride (Priest) on Apr 28, 2005 at 17:30 UTC
    The vast majority of processing time is going to come from reading the files, not from running the script itself. I'd personally just keep things simple and do the tests individually, since there's only 4 of them and you can more easily edit the code later if the tests are separate.
A reply falls below the community's threshold of quality. You may see it by logging in.