Re: A regex that does this, but not that?

Replies are listed 'Best First'.
Re: Re: A regex that does this, but not that? by bradcathey (Prior) on Nov 15, 2003 at 01:36 UTC
Abigail-II, I spent quite a bit of time trying to craft my example carefully, so that if there was a regex solution to return the result I specified, I'd have my answer. pg got it perfectly. But just in case you're still interested—I know you're one of the regex gurus around the monastery, and I have always appreciated your thoroughness: 1. I want to delete any words that start with "t", end with "t", but do not contain any other "t"s within, except for the word "test". 2. The result should only be the words: "test" and any other non tt words. "1 2 3" was just an example. 3. The order of words, the number of words, or the content of any other words not "t\w+t", should not be a factor. I'd still love to hear your thoughts as I am trying to really ramp up my coding skills. Thanks. —Brad "A little yeast leavens the whole dough."*	[reply]
Re: Re: Re: A regex that does this, but not that? by danger (Priest) on Nov 15, 2003 at 04:57 UTC
Well, pg's solution works for the limited input provided and you haven't given any further particulars regarding input. That solution breaks just changing the first word from "thought" to "though" : `my $var = "though test tot 1 2 3 tesset"; $var =~ s/(t.?t)/($1 ne "test") ? "" : $1/ge; print $var; # prints: esoesset` [download] But, now you mention a further constraint that the words to be deleted may not contain any 't's inside, which is not inferrable from your earlier posts at all. Providing a good specification is much more than providing a sample case (but providing test cases is* important). Anyway, here's a go at your new specs: `my $var = <<TT; target blah foo test thought 123 though tempest testament though tightest treatment thermostat tantamount taboo TT $var =~ s/(?!\btest\b)(\bt[^t\W]t\b)//g; print $var; __END__ ## Result: blah foo test 123 though testament though tightest treatment thermostat tantamount taboo` [download] So, all the 't.t' words on the second line remain because they contain a 't' character within. All the 't.*t' words on the first line get deleted except for 'test'.	[reply] [d/l] [select]
Re: Re: Re: A regex that does this, but not that? by Cody Pendant (Prior) on Nov 15, 2003 at 05:17 UTC
any words that start with "t", end with "t", but do not contain any other "t"s within OK so that's `\bt[^t]+t\b` -- word-boundary, then a t, then one or more other characters not a t, then a t, then a word boundary. Apart from the abbreviation "tt" this should be fine. So "tent", "tesseract", "tot", "tort" and "test" itself will match this pattern. However, "testament" will fail it because of the "t" in the middle. Then you need a special case for "test" itself, which you can do with the /e modifier and the ternary operator, as in pg's example above. So something like this: `#!/usr/bin/perl -w use strict; my $words='test Buffy testament Anya tot Willow tesseract Faith tent'; $words =~ s/\b(t[^t]+t)\b/$1 eq "test" ? $1 : ''/ge; print $words; # prints 'test Buffy testament Anya Willow Faith';` [download] Where the regex means "Find words matching t, something-not-t, then t at the end. Replace them with nothing, unless they're the word test, in which case, replace them with themselves". You could replace the ternary thing with this more longwinded version if you liked: `$words =~ s/\b(t[^t]+t)\b/ my $temp = $1; if($temp eq 'test'){ $temp }else{ '' }/xge;` [download] `($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print` [download]	[reply] [d/l] [select]
Re: Re: Re: Re: A regex that does this, but not that? by Anonymous Monk on Nov 15, 2003 at 06:38 UTC
Your character class of `[^t]` can itself cross word boundaries so that strings like: "this will be a problem right?" will be a problem, right?	[reply] [d/l]
Re: Re: Re: Re: Re: A regex that does this, but not that? by Cody Pendant (Prior) on Nov 15, 2003 at 07:32 UTC
Re: A regex that does this, but not that? by Abigail-II (Bishop) on Nov 16, 2003 at 01:51 UTC
I spent quite a bit of time trying to craft my example carefully, so that if there was a regex solution to return the result I specified, I'd have my answer. But the problem is that you left it at the example. I could have given you a couple of regexes that solved your example, but would probably have failed to do what you wanted on the second example you tried. pg got it perfectly. Then you and he got lucky. If he came up with a different regexp that solved your one example, but that would do something else on other sentences, he would have wasted time formulating a useless answer. However, is it really true that pg's answer got it right? Your requirements say: I want to delete any words that start with "t", end with "t", but do not contain any other "t"s within, except for the word "test". and pg's regex is: `s/(t.?t)/($1 ne "test") ? "" : $1/ge;` [download] Now, to me that regex just deletes strings starting with a t, and ending with the next t, with the exception of the word "test". So, let's try it on another example: `$_ = "this is the wristwatch"; s/(t.?t)/($1 ne "test") ? "" : $1/ge; print; __END__ he wrisch` [download] Now, that might be exactly what you had in mind, but it doesn't suit the requirements. Abigail	[reply] [d/l] [select]
Re^3: A regex that does this, but not that? by Aristotle (Chancellor) on Nov 22, 2003 at 09:13 UTC
`s/\s\bt(?!est)[^t\W]t\b//g;` [download] Makeshifts last the longest.	[reply] [d/l]