Re: Howto strip 42 from comma separated list using s///
by japhy (Canon) on Nov 03, 2005 at 13:22 UTC
|
I'd look for commas on both sides and only replace one of them if they were both there:
# updated, thanks to PerlMouse -- not using \b, though!
$str =~ s/(^|,)42(,|$)/$1 && $2/e
| [reply] [d/l] |
|
|
You ought to guard the 42 with \b's on both sides, or it will turn 130,1423,999 into 130,13,999.
| [reply] |
|
|
Hi,
sorry, but just for me to understand:
$1 && $2
means: if both are there, return the latest, in this case $2?
Like this:
perl -e'$a=1;$b=2;print ($a && $b)'
Thanks,
svenXY | [reply] [d/l] [select] |
|
|
It's a trick, I'll admit. In Perl, && returns the right-hand operand if the left-hand one is true, and the left-hand operand if the left-hand one is false. X && Y returns X if X is false, and Y if X is true (regardless of Y's truth value!).
Thus, in my regex, a comma is only inserted if both $1 and $2 are true. $1 && $2 is only true when $1 and $2 have commas in them (otherwise, at least one of them is empty, and thus the empty string is returned).
| [reply] |
|
|
I'm currently doing this in .NET C# (which you couldn't know), where i can't eval like that, so a regex substitution /$1$2/ would eat both commas, like
"41,42,43" -> "4143"
Any plain vanilla regex to do it ?
/allan
| [reply] [d/l] |
Re: Howto strip 42 from comma separated list using s///
by jbware (Chaplain) on Nov 03, 2005 at 13:23 UTC
|
Is this maybe the regex that you're looking for? There may be more elegant ways, but this seems alot easier to decipher.
s/(,42\b|\b42,|^42$)//g;
Update: Added the case for a single element of just "42" per Not_a_Number's and Perl Mouse's suggestions. Good catch; that's what a quick solution gets me, I didn't test all cases. I should note too, the position of 42 at the end of the list is important, so the regex engine doesn't try and grab that without looking for commas first.
Update: Perl Mouse makes a good point here: 505335 on boundary checking too, so I made another mod to the regex to take that into account as well.
Update: Ok, this is it; no more updates. One catch w/ boundary checking is in the odd case of "41,I like 42 things, 43", which would replace the "42", but shouldn't. I think in the context of this question this is really isn't an issue and \b works fine. To combat that though, I just hit what really was being addressed, "^42$", where its the only element.
Update: lol. I give up; one more. Perl Mouse pointed out partial number matches, so the boundary checking is back in a modified form. He has this below too. I only mod it here so in case someone takes a quick glance, they can get my best version (with the help of mightier monks than I) instead of my regex version from like 4 iterations ago.
-jbWare
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
s/(,42|42,|\b42\b)//g;
Tricky, isn't? Unfortunally, you still haven't cover all cases correctly:
$ perl -wle '$_ = "142,143"; s/(,42|42,|\b42\b)//g; print'
1143
You need to anchor all cases:
s/,42\b|\b42,|^42$//; # Or
s/\b(?:,42|42,|42)\b//; # Or
s/\b42,|,?42\b//;
| [reply] [d/l] [select] |
|
|
Yeah, and painful. Although it can be done it regex, it just goes to show why I would much rather iterate through a list and process each element individually instead of handling all the element special cases in a regex. From a processing efficiency standpoint I'm not sure which is better, but I think regarding a programmer's efficiency, and future understanding, I'd lean toward looping through a list and handling each element on its own as a better decision. It just seems cleaner to me (as fun as regex exercises are); but maybe that's just me.
-jbWare
| [reply] |
|
|
That wouldn't work on a list containing just '42'. It will the string unmodified instead of ending up with the empty string.
| [reply] |
Re: Howto strip 42 from comma separated list using s///
by Perl Mouse (Chaplain) on Nov 03, 2005 at 13:35 UTC
|
I wouldn't do it with a single substitution, but either with two:
s/\b42\b,?// && s/,$//;
or none:
join ",", grep {$_ != 42} split /,/, $string
| [reply] [d/l] [select] |
|
|
I'd much prefer one plain vanilla regex (w/o Perl /e magic), so does such a beast exist...?
/allan
| [reply] |
|
|
Just by listing all cases.
s/,42\b|\b42,|^42$//
But next time, ask VB questions on a VB forum.
| [reply] [d/l] |
Re: Howto strip 42 from comma separated list using s///
by reasonablekeith (Deacon) on Nov 03, 2005 at 14:22 UTC
|
How about this one then? No tricks, pure regex. I can't spot any holes. Anyone beg to differ ;)
for ("42,43", "41,42", "41,42,43", "41,142,423") {
print "$_ \t=> ";
my $string = $_;
$string =~ s/(
((?=,)|^)42(,|$)
|
(,|^)42((?=,)|$)
)
//xg;
print "$string\n";
}
__OUTPUT__
42,43 => 43
41,42 => 41
41,42,43 => 41,43
41,142,423 => 41,142,423
---
my name's not Keith, and I'm not reasonable.
| [reply] [d/l] |
Re: Howto strip 42 from comma separated list using s///
by Roy Johnson (Monsignor) on Nov 03, 2005 at 15:26 UTC
|
If it's surrounded by commas, leave the second comma (by using lookahead to find it); otherwise, remove it and whichever comma (if any) is adjacent.
s/,42(?=,)|,?\b42\b,?//g;
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
Re: Howto strip 42 from comma separated list using s///
by ady (Deacon) on Nov 03, 2005 at 18:02 UTC
|
Thanks guys, for all the input, suggestions and comments
I ended up with this regex:
((?=,*\s*42))(,\s*42|42\s*,|\s*42\s*)((?<=42\s*,*))
Apologies for not specifying up front all constraints of the problem. Though this specific regex has to be 'generic' in the sense that it must work in the .NET/C# environment, these kind of constraints often pop up (as anyone who has read "Mastering Regular Expressions" can testify). And looking at the problem from several angels is always educational and, well: fun.
Thanks
/ allan | [reply] [d/l] |
|
|
((?=,*\s*42))(,\s*42|42\s*,|\s*42\s*)((?<=42\s*,*))
Being left with a result, which is not applicable in Perl isn't satisfying either. (There is no variable length lookbehind in Perl.)
I tried to understand your regex without being able to test it (as given) in Perl. But I either did not get it, or it is still incomplete, as I have the impression, that a string like before,42XX,after will be reduced to beforeXX,after after applying your regex and substituting the match to empty??
Anyway, I would suggest to use the regex provided by reasonablekeith earlier, because it seems to cover all cases correctly.
Stripping enclosing whitespace is easily achieved with this too, by adjusting 42 to \s*42\s* (twice) in that regex.
| [reply] [d/l] [select] |
|
|
Well the regex did do the job in .NET (using $1$3 as replacement) on lists of the indicated type, like:
"42,43" -> "43"
"41,42" -> "41"
"41,42,43" -> "41,43"
That is: "42,15,42,173,42" will be reduced to "15,173", which was basically what I wanted. And yes, "before,42XX,after" will be reduced to "beforeXX,after", but that was not the kind of data I needed to manipulate in this case.
You can try out .NET regexes by installing the runtime plus a tool like RegexDesigner.
Update
And you're right: the regex by reasonablekeith above does the same, but will leave "before,42XX,after" intact (if that's what you want); His regex is shorter tho', and could be preferred for that reason, -- but it is not forgiving with resp. to whitespace around the numbers, ie. "23,42 ,45, 56,42" will be reduced to "23,42 ,45, 56".
Best regards,
Allan Dystrup
| [reply] [d/l] |
|
|
Complicated. And I'm surprised - it seems that .NET/C# implements variable width look behind, something that Perl doesn't do.
| [reply] |
|
|
| [reply] |
Re: Howto strip 42 from comma separated list using s///
by ikegami (Patriarch) on Nov 03, 2005 at 15:09 UTC
|
You could use split instead:
$_ = join ',', grep { $_ ne '42' } split ',';
| [reply] [d/l] [select] |