Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have a regex problem I need some assistance with.

I have an array that contains data an example of which looks like this
@array=('GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsmc11 +_wld(sxfatd12j)) GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPAN04 +] : tspan04-dfdint-wld(sxfamd6f)) POS_WLI02 - POS_WLI02 Web Logic Int +egrator');
I need to be able to create a new list from this that looks like this
@newarray=('GAP_SPAN09,GAP_SPAN03,POS_WLI02');
So the elements I require are those which appear before a " - " (space hyphen space") and these need to be put into a comma delimited list as in the example.
My attempts so far to figure this out with my limited perl knowledge have been wildly off the mark.
Any help would be most welcome.

Replies are listed 'Best First'.
Re: regex assistance
by toolic (Bishop) on Oct 20, 2010 at 12:51 UTC
    use strict; use warnings; use Data::Dumper; my @array=('GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsm +c11_wld(sxfatd12j)) GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPA +N04] : tspan04-dfdint-wld(sxfamd6f)) POS_WLI02 - POS_WLI02 Web Logic +Integrator'); my @newarray; while ($array[0] =~ /(\w+) - /g) { push @newarray, $1; } print Dumper(\@newarray); __END__ $VAR1 = [ 'GAP_SPAN09', 'GAP_SPAN03', 'POS_WLI02' ];
Re: regex assistance
by johngg (Canon) on Oct 20, 2010 at 13:16 UTC

    Using your updated data which is now an array:-

    knoppix@Microknoppix:~$ perl -Mstrict -wE ' > my @array = ( > q{GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsmc11_w +ld(sxfatd12j))}, > q{GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPAN04] : tspan04 +-dfdint-wld(sxfamd6f))}, > q{OS_WLI02 - POS_WLI02 Web Logic Integrator}, > ); > my @newArray = > map m{^(.*?)\s-\s}, > @array; > say for @newArray;' GAP_SPAN09 GAP_SPAN03 OS_WLI02 knoppix@Microknoppix:~$

    I hope this is helpful.

    Cheers,

    JohnGG

Re: regex assistance
by hbm (Hermit) on Oct 20, 2010 at 12:50 UTC

    You realize @array has only one element, the long single-quoted string?

    Perhaps this:

    use strict; my @array=('GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsm +c11 +_wld(sxfatd12j)) GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPAN04 +] : tspan04-dfdint-wld(sxfamd6f)) POS_WLI02 - POS_WLI02 Web Logic Int +egrator'); my @newarray = $array[0] =~ /(\S+) - /g; print "$_\n" for @newarray;
      Apologies the example given should have looked like this
      my @array=('GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsm +c11_wld(sxfatd12j))','GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TS +PAN04] : tspan04-dfdint-wld(sxfamd6f))','POS_WLI02 - POS_WLI02 Web Lo +gic Int +egrator');
      So there are 3 elements. This was purely an example though since there could be upto 20 elements in the original array
        Ahh, the question becomes clearer though the logic is largely similar

        use strict; use warnings; my @records=('GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : t +smc11_wld(sxfatd12j))', 'GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPAN04] : tspan04-d +fdint-wld(sxfamd6f))', 'POS_WLI02 - POS_WLI02 Web Logic Integrator)'); my @record_ids; for my $record (@records){ $record=~m/^(\S+)(?= - )/g; push @record_ids , $1; } for my $record_id (@record_ids){ print "$record_id\n"; } __END__ GAP_SPAN09 GAP_SPAN03 POS_WLI02
        print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
        my @newarray = map /^(\S+) +- /, @array;
Re: regex assistance
by CountZero (Bishop) on Oct 20, 2010 at 15:27 UTC
    Or using a split rather than a regexp:
    use strict; use warnings; use 5.012; my @array = ( 'GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsmc11_wld( +sxfatd12j))', 'GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPAN04] : tspan04-df +dint-wld(sxfamd6f))', 'OS_WLI02 - POS_WLI02 Web Logic Integrator', ); my @results = map{(split / - /)[0]} @array; say for @results;
    Output:
    GAP_SPAN09 GAP_SPAN03 OS_WLI02

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Thanks for all the responses. It just goes to show how versatile Perl is !
Re: regex assistance
by locked_user sundialsvc4 (Abbot) on Oct 20, 2010 at 13:24 UTC

    Here are some tips that might help you on your way:

    1. In perldoc perlre, carefully review the //g, //c, //s and //m modifiers.   Also consider the difference between using a regex in a scalar vs. a list context.
    2. Yes, there is such a thing as “regex golf,” which is the amusing geek-pasttime of trying to Name That Tune In One Note.   But clarity usually wins the race, and if that means writing several regexes and breaking down the string in several stages ... do so.   After all, after you write the thing, you will also be maintaining it.
    3. The join operator is very handy for constructing “comma-delimited strings.”
    4. There are gobs of “regular expression test sites” on the Internet.   Nearly all regex engines are “Perl compatible.”
    5. (Just in case...)   Far from being “dismissive,” we are seriously endeavoring to help you.   Attempt to solve the problem and show us your attempts.   Instead of “handing you a fish,” we’d like to “teach you to fish.”   This is an extremely frequently-done task that is one of the key reasons for “what is the fuss all about” with regards to Perl.   String-mangling is one of the reasons why Perl is referred to as a Swiss Army Knife.®

Re: regex assistance
by Utilitarian (Vicar) on Oct 20, 2010 at 12:59 UTC
    you have a string that looks like
    $string='GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : tsmc11_wld(sxfatd12j)) GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSPAN04] : tspan04-dfdint-wld(sxfamd6f)) POS_WLI02 - POS_WLI02 Web Logic Integrator';

    Thus @array=$string=~/PATTERN/g is the syntax you need.

    The PATTERN is to capture a series of non space characters if they are followed by the ' - ' pattern.

    If you need more help than that:

    ~/$ perl -e '$string=q(GAP_SPAN09 - GAP SPAN base (Scratch Testing [TS +MC11] : tsmc11_wld(sxfatd12j)) GAP_SPAN03 - GAP SPAN base (DFD E2E Te +sting [TSPAN04] : tspan04-dfdint-wld(sxfamd6f)) POS_WLI02 - POS_WLI02 + Web Logic Integrator);@array=$string=~m/(\S+)(?= - )/g;for $record_t +itle (@array){print "$record_title\n";}' GAP_SPAN09 GAP_SPAN03 POS_WLI02
    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
Re: regex assistance
by umasuresh (Hermit) on Oct 20, 2010 at 13:00 UTC
    Please show what you have tried so far:
    use strict; use warnings; my $string ="GAP_SPAN09 - GAP SPAN base (Scratch Testing [TSMC11] : ts +mc11_wld(sxfatd12j)) GAP_SPAN03 - GAP SPAN base (DFD E2E Testing [TSP +AN04: tspan04-dfdint-wld(sxfamd6f)) POS_WLI02 - POS_WLI02 Web Logic I +ntegrator"; my @array; while ($string =~ m/(\w+\s?)-/g) { push (@array, $1); } print join("\t", @array);
    Your array has only one element. I think meant a string as input! Must have book for Regex challenges: Mastering Regular Expressions UPDATE: Other monks have already posted answers while I was drafting this!
Re: regex assistance
by raybies (Chaplain) on Oct 20, 2010 at 14:26 UTC
    my first instinct was to split the longer string into an array on ' - ' and then use a regex on each element in the array.
    foreach (@array) { @pieces = split / - /; foreach (@pieces) { #then just grab the last full word # do regex stuff here. if (/(\w+)$/) { push @newarray, $1; } } }