in reply to Re: seeking improvement in my smiple program using regular expression
in thread seeking improvement in my smiple program using regular expression

I have to go back and check my solution to see if it actually fixed the problem.
I originally wanted to leave original file intact(in terms of format)..
I thought by using [^ ] I can also matche "" and ' ' .. perhaps I am wrong(?)

Problem with split was the fact that each element has variable space in between and wanted to preserve,
but like I said, looking back at my code,
I wonder how it did it.. so let me rerun and get back to you guys.
  • Comment on Re^2: seeking improvement in my smiple program using regular expression
  • Download Code

Replies are listed 'Best First'.
Re^3: seeking improvement in my smiple program using regular expression
by convenientstore (Pilgrim) on Aug 06, 2007 at 02:11 UTC
    umm so i went back and realized that my program wasn't
    preservering the spaces. so I went back and fixed it, but now this program is like too messy Isn't there a someway to do ([^ ]+)(\s+){17}   ??
    use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s/([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) (\d{1,3})(\s+) (\d+)$ /$1$2$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19$20$21$22$23$ +24$25$26$27$28$29$30$31$32$33$34${count}$36$37/x; $count++; } print ; }

      You don't need to capture everything if all you want to do is replace one field.

      Assuming all lines that begin with 'sip', have the appropriate number of fields, what your regex above translates to is:Replace the second last set of digits with a running count.

      If you can isolate and match just the bit you want to replace, you don't have to capture everything else. By using zero length assertions to bracket the bit you want to replace, you don't need explicit captures and the replacement only consists of the count:

      use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s[ ## No need to explicitly capture anything (?<=\s) ## 2nd last num field cannot be 4 digits \d{1,3} ## only this is replaced. (?= \s+ \d+ $ ) ## Ensure there's one more before the EOS ][$count]x; $count++; } print ; }

      If you really need to verify that modified lines contain exactly 17 space delimited fields preceding the replacement, then just capture the whole thing as a single entity:

      use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s[ ( (?: \S+ \s+ ){17} ) ## Capture the 17/34 fields to $1 \d{1,3} ## No need to capture (?= \s+ \d+ $ ) ## Ensure there's one more before the EOS ][$1$count]x; $count++; } print ; }

      Also, are you sure you want to increment the count even if the replacement doesn't happen?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      the following may do the trick:

      s{ ((?: [^ ]+ \s+ ){17}) \d{1,3} (\s+ \d+) $} {$1$count$2}x;

      just a few points, some of which have been made before...

      • the complemented character set [^ ] is not the same as \S (that's a capital-S) because it includes the tab, newline, etc., characters -- in fact, all characters (including whitespace characters) except space;
      • because the pattern is anchored at the end of the string, the (?: [^ ]+ \s+ ){17} repetitions may be preceded by more (?: [^ ]+ \s+ ) repetitions or by anything else;
      • therefore, what you are really saying is that you want the \d{1,3} you're replacing to be preceded by at least 17 repetitions of (?: [^ ]+ \s+ ), then by anything else;
      • it may be worth considering if this ``17 repetitions'' business is really what you need;
      • if it isn't, the regex may simplify further to s{ \d{1,3} (?= \s+ \d+ $)} {$count}x;

      all these regexes are untested.

        Repeated captures do not work. Only the last repetition actually captures anything. Contrast:

        my @captures = 'abcde' =~ m[(.){3}]; print "@captures";; c

        with

        my @captures = 'abcde' =~ m[(.)(.)(.)];; print "@captures";; a b c

        and

        'abcde' =~ m[(.){3}] and print "$1,$2,$3";; Use of uninitialized value in concatenation (.) or string at ... Use of uninitialized value in concatenation (.) or string at ... c,, 'abcde' =~ m[(.)(.)(.)] and print "$1,$2,$3";; a,b,c
        all these regexes are untested.

        You might consider revising that policy.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.