match n EMth/EM occurence

arindamm has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: match n EMth/EM occurence by Corion (Patriarch) on Mar 31, 2002 at 20:33 UTC
Let's start with a simple case, n = 1 and the charcter `a`. We want to match from the beginning of the string as many non-a characters as possible, and we know that we must only stop if we encounter an `a` : `$foo =~ /^([^a])a/;` [download] Now let's look at an example of how we could do this for everything the second `a`. We can't use `.` because we would then lose count. We can use `.?`, but it won't help much. We will try to match as many non-`a` characters as possible before the first `a`, the first `a` and then again as many non-`a` characters, and then there must be the second `a` : `$foo =~ /^([^a]a[^a])a/;` [download] For three `a`s, the RE will look like this : `$foo =~ /^([^a]a[^a]a[^a])a/;` [download] and if we now look closely, we see a pattern `[^a]a` which we can reuse with the Perl RE engine, as we must repeat that pattern `n-1` times : `$m = $n -1; $foo =~ /^(([^a]a){$m}[^a])a/;` [download] Of course, as this pattern has to be recompiled every time we use it, we could as well use the above, unlooped pattern to match. Update:* 20020409 : Fixed small but important typo in the last line of code.`$foo =~ /^(([^a]a){m}[^a])a/;` obviously won't match `$m times...` `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web [download]`	[reply] [d/l] [select]
Re: match n EMth/EM occurence by Chmrr (Vicar) on Mar 31, 2002 at 20:34 UTC
Use split to hack the string into pieces, then just use the ones you want: `sub pre_nth { my ($str, $char, $n) = @_; return join("", (split /(?=$char)/, $str)[0.. $n-1]); } print pre_nth("this q is the q of the q which quickly quoth he.", q => 3);` [download] Update: Oops. Looks like I misread the question slightly (my solution ain't just one regex) but it may still be of some use. For example, this solution scales to when you want to deal with phrases instead of characters. perl -pe '"I lo`+$^X$\"$]!$/"=~m%(.)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'	[reply] [d/l]
Re: match n EMth/EM occurence by smgfc (Monk) on Mar 31, 2002 at 20:48 UTC
This isn't pure regex, but it is the way I would have solved the problem. If matches $char, assigns the prematch to $match, and when the counter == $n it exits and prints $match. Kinda ugly, but works well! $n = 5; $char = 'l'; $string = 'hello my name is william gobbel-dy-gook liam'; $count = 0; while ($string =~ /[$char]/g) { $match = $`; $count++; last if $count == $n; } print $match; [download]	[reply] [d/l]
Re: match n EMth/EM occurence by mdillon (Priest) on Mar 31, 2002 at 21:01 UTC
I wouldn't use a regular expression, but something like this, probably: `sub before_nth_char { my ($str, $c, $n) = @_; my $pos = -1; { $pos = index $str, $c, $pos + 1; return if $pos == -1; redo unless --$n == 0; return substr $str, 0, $pos; } } print before_nth_char("asdfasdfasdf", "a", 2), $/;` [download]	[reply] [d/l]
Re: match n EMth/EM occurence by RMGir (Prior) on Mar 31, 2002 at 21:19 UTC
With character c, and n-1 replaced with correct value, I think this would work: `/^((?:[^c]c){n-1}[^c])c/` [download] -- Mike (Edit: Corion winds up with the same solution up above but this is slightly more efficient since the inner () don't need to be capturing)	[reply] [d/l]