convenientstore has asked for the wisdom of the Perl Monks concerning the following question:

My program works but I dont think this is the most efficient way to use the regular expression in the middle,
can someone comment on it to see how it can be shorten?
use strict; my $count = 1; while (<>) { if ( m/^\^status/ ) { s/([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) ([^ ]+)(?:\s+) (\d{1,3})(?:\s+) (\d+)$ /$1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16 $17 ${ +count} $19/x; $count++; } print ; }

Replies are listed 'Best First'.
Re: seeking improvement in my smiple program using regular expression
by grinder (Bishop) on Aug 05, 2007 at 12:01 UTC

    The split technique, as pointed out above, is certainly much more compact, but you now have to check explicitly that the last two elements are numeric:

    if (my @match = split /\s+/) { if ($match[-2] =~ /^\d{1,3}$/ and $match[-1] !~ /\D/) { $match[-2] = $count++; } $_ = "@match\n"; }

    (Note that it's easier to say that the last element doesn't match a non-digit...). Also, with an array, it doesn't matter if you capture one more or less; the code will adapt to the change.

    Otherwise, if for some reason you needed to stay with the regular expression as posted (for instance, because you really didn't want to capture the 12th element or something), the following pattern is equivalent (repetitive stuff in the middle omitted) to yours, and less noisy:

    /([^ ]+)\s+([^ ]+)\s+([^ ]+)...\s+(\d{1,3})\s+(\d+)$/

    that is, you don't need to group \s+. And I suspect you may want \S+ rather than [^ ]+:

    /(\S+)\s+(\S+)\s+(\S+)\s+...\s+(\d{1,3})\s+(\d+)$/

    You could also have Perl build the pattern for you, rather than mess with the fiddly details:

    my $pat = join ( '\\s+', ( ('(\\S+)' x 15), '(\\d{1,3})', '(\\d+)', )); $pat = qr/\A$pat\z/;

    • another intruder with the mooring in the heart of the Perl

Re: seeking improvement in my smiple program using regular expression
by FunkyMonk (Bishop) on Aug 05, 2007 at 08:34 UTC
    Personally, I'd use split for something like this...

    my $count = 1; while ( <DATA> ) { if ( m/^\^status/ ) { my @f = split; $f[17] = $count++; $_ = join( ' ', @f ) . "\n"; } print; } __DATA__ ^status f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 123 + 123 ^status f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 345 + 123

    But you could use a regexp...

    s/\d+\s+(\d+)$/$count $1/;

Re: seeking improvement in my smiple program using regular expression
by bruceb3 (Pilgrim) on Aug 05, 2007 at 08:31 UTC
    At a guess, something like this ???
    #!/usr/bin/perl -s use strict; my $count = 1; while (<>) { my @fields = split; print "@fields[0..16] ", $count++, " $fields[18]\n"; }
      hmmmm
      My code works, but I thought writing out ([^ ]+)(?:\s+) 18 times was stupid
      I thought there could be a way to do this
      ([^ ]+\s+){17}(\d{1,3})(?:\s+)
      But clearly above is not correct..
      your split code is neat though.. Let me look into that as well
        s/((?:\S+\s+){17})\d+(\s+\d+)$/$1$count$2/;

Re: seeking improvement in my smiple program using regular expression
by perl-diddler (Chaplain) on Aug 05, 2007 at 20:47 UTC
    Not that it is exactly answering your question, but is there a reason you use
    ([^ ]+)(?:\s+)
    instead of
    (\S+)\s+
    Or are you really wanting anything other than "space" (and not just any non-white)? Curious...
      I have to go back and check my solution to see if it actually fixed the problem.
      I originally wanted to leave original file intact(in terms of format)..
      I thought by using [^ ] I can also matche "" and ' ' .. perhaps I am wrong(?)

      Problem with split was the fact that each element has variable space in between and wanted to preserve,
      but like I said, looking back at my code,
      I wonder how it did it.. so let me rerun and get back to you guys.
        umm so i went back and realized that my program wasn't
        preservering the spaces. so I went back and fixed it, but now this program is like too messy Isn't there a someway to do ([^ ]+)(\s+){17}   ??
        use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s/([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) (\d{1,3})(\s+) (\d+)$ /$1$2$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19$20$21$22$23$ +24$25$26$27$28$29$30$31$32$33$34${count}$36$37/x; $count++; } print ; }