Re: Multiple Regex, it works but it aint clever

Hey alexiskb,

I may be approaching this more from a mind-set of how to make it fast rather than how to make it elegant, but this is what I would try.

First I would using the quoting REx operator to precompile the RExs outside of the function. I also switched your parens to the non capturing format: (?:...) since you aren't using $1 in any of your examples. Lastly this is a case where using study might significantly speed your program. Here's what I came up with. Update 2: wait - why use parens at all? Silly me...

#! /usr/local/bin/perl -w

use strict;

my $REs = [
    qr/(?: EUR1)[^,]+/,
    qr/(?: EUR2)[^,]+/,
    qr/(?: EUR3)[^,]+/,
    qr/(?: EUR4)[^,]+/,
    qr/(?: EUR8)[^,]+/,
    qr/(?: EUR0\.)[^,]+/,
    qr/(?: CHF10)[^,]+/,
    qr/(?: Y5)[^,]+/,
    qr/(?: NV )[^,]+/,
    qr/(?:NON-CUM)[^,]+/,
    qr/(?: LTD)[^,]+/,
    qr/(?: FIN )[^,]+/,
    qr/(?: INTL)[^,]+/,
    qr/(?:\$)[^,]+/,
    qr/(?:\s+$)/
];

my $string = "a INTL , b Y5c, NV , d e f... & & CO FIN    ";
my $return = &format($REs, $string);
print ">$string<\n";
print ">$return<\n";
exit;

sub format
{
    my $REs    = shift;
    my $string = shift;

    study $string;

    $string =~ s/(?: & CO )[^,]+/ AND CO/;
    $string =~ s/&/AND/g;
    for (@{$REs})
    {
        $string =~ s/$_//;
    }

    return $string;
}
[download]

Without a sample of your data and output it is hard to be sure that this does what you want. It should, however, be faster, and the format subroutine is a little cleaner to look at.

Hope this helps you. If I have time I may run this through some benchmarks and see if it does in fact speed things up, and see which part helps the most.

Update: Oops - typos.

Good luck,
{NULE}
--
http://www.nule.org

Comment on Re: Multiple Regex, it works but it aint clever Select or Download Code