Re^2: seeking improvement in my smiple program using regular expression

Replies are listed 'Best First'.
Re^3: seeking improvement in my smiple program using regular expression by convenientstore (Pilgrim) on Aug 06, 2007 at 02:11 UTC
umm so i went back and realized that my program wasn't preservering the spaces. so I went back and fixed it, but now this program is like too messy Isn't there a someway to do `([^ ]+)(\s+){17} ??` `use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s/([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) ([^ ]+)(\s+) (\d{1,3})(\s+) (\d+)$ /$1$2$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19$20$21$22$23$ +24$25$26$27$28$29$30$31$32$33$34${count}$36$37/x; $count++; } print ; }` [download]	[reply] [d/l] [select]
Re^4: seeking improvement in my smiple program using regular expression by BrowserUk (Patriarch) on Aug 06, 2007 at 06:36 UTC
You don't need to capture everything if all you want to do is replace one field. Assuming all lines that begin with 'sip', have the appropriate number of fields, what your regex above translates to is:Replace the second last set of digits with a running count. If you can isolate and match just the bit you want to replace, you don't have to capture everything else. By using zero length assertions to bracket the bit you want to replace, you don't need explicit captures and the replacement only consists of the count: `use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s[ ## No need to explicitly capture anything (?<=\s) ## 2nd last num field cannot be 4 digits \d{1,3} ## only this is replaced. (?= \s+ \d+ $ ) ## Ensure there's one more before the EOS ][$count]x; $count++; } print ; }` [download] If you really need to verify that modified lines contain exactly 17 space delimited fields preceding the replacement, then just capture the whole thing as a single entity: `use strict; my $count = 1; while (<>) { if ( m/^\^sip/ ) { s[ ( (?: \S+ \s+ ){17} ) ## Capture the 17/34 fields to $1 \d{1,3} ## No need to capture (?= \s+ \d+ $ ) ## Ensure there's one more before the EOS ][$1$count]x; $count++; } print ; }` [download] Also, are you sure you want to increment the count even if the replacement doesn't happen? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: seeking improvement in my smiple program using regular expression by Anonymous Monk on Aug 06, 2007 at 05:34 UTC
the following may do the trick: `s{ ((?: [^ ]+ \s+ ){17}) \d{1,3} (\s+ \d+) $} {$1$count$2}x;` [download] just a few points, some of which have been made before... the complemented character set `[^ ]` is not the same as `\S` (that's a capital-S) because it includes the tab, newline, etc., characters -- in fact, all characters (including whitespace characters) except space; because the pattern is anchored at the end of the string, the `(?: [^ ]+ \s+ ){17}` repetitions may be preceded by more `(?: [^ ]+ \s+ )` repetitions or by anything else; therefore, what you are really saying is that you want the `\d{1,3}` you're replacing to be preceded by at least 17 repetitions of `(?: [^ ]+ \s+ )`, then by anything else; it may be worth considering if this ``17 repetitions'' business is really what you need; if it isn't, the regex may simplify further to `s{ \d{1,3} (?= \s+ \d+ $)} {$count}x;` all these regexes are untested.	[reply] [d/l] [select]
Re^5: seeking improvement in my smiple program using regular expression by BrowserUk (Patriarch) on Aug 06, 2007 at 06:06 UTC
Repeated captures do not work. Only the last repetition actually captures anything. Contrast: `my @captures = 'abcde' =~ m[(.){3}]; print "@captures";; c` [download] with `my @captures = 'abcde' =~ m[(.)(.)(.)];; print "@captures";; a b c` [download] and `'abcde' =~ m[(.){3}] and print "$1,$2,$3";; Use of uninitialized value in concatenation (.) or string at ... Use of uninitialized value in concatenation (.) or string at ... c,, 'abcde' =~ m[(.)(.)(.)] and print "$1,$2,$3";; a,b,c` [download] all these regexes are untested. You might consider revising that policy. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]