arindamm has asked for the wisdom of the Perl Monks concerning the following question:

how do i match everything preceding the n th occurence of a character in a string using regular expressions ?

Replies are listed 'Best First'.
Re: match n EMth/EM occurence
by Corion (Patriarch) on Mar 31, 2002 at 20:33 UTC

    Let's start with a simple case, n = 1 and the charcter a. We want to match from the beginning of the string as many non-a characters as possible, and we know that we must only stop if we encounter an a :

    $foo =~ /^([^a]*)a/;

    Now let's look at an example of how we could do this for everything the second a. We can't use .* because we would then lose count. We can use .*?, but it won't help much. We will try to match as many non-a characters as possible before the first a, the first a and then again as many non-a characters, and then there must be the second a :

    $foo =~ /^([^a]*a[^a]*)a/;

    For three as, the RE will look like this :

    $foo =~ /^([^a]*a[^a]*a[^a]*)a/;

    and if we now look closely, we see a pattern [^a]*a which we can reuse with the Perl RE engine, as we must repeat that pattern n-1 times :

    $m = $n -1; $foo =~ /^(([^a]*a){$m}[^a]*)a/;

    Of course, as this pattern has to be recompiled every time we use it, we could as well use the above, unlooped pattern to match.

    Update: 20020409 : Fixed small but important typo in the last line of code.$foo =~ /^(([^a]*a){m}[^a]*)a/; obviously won't match $m times...

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: match n EMth/EM occurence
by Chmrr (Vicar) on Mar 31, 2002 at 20:34 UTC

    Use split to hack the string into pieces, then just use the ones you want:

    sub pre_nth { my ($str, $char, $n) = @_; return join("", (split /(?=$char)/, $str)[0.. $n-1]); } print pre_nth("this q is the q of the q which quickly quoth he.", q => 3);

    Update: Oops. Looks like I misread the question slightly (my solution ain't just one regex) but it may still be of some use. For example, this solution scales to when you want to deal with phrases instead of characters.

    perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

Re: match n EMth/EM occurence
by smgfc (Monk) on Mar 31, 2002 at 20:48 UTC
    This isn't pure regex, but it is the way I would have solved the problem. If matches $char, assigns the prematch to $match, and when the counter == $n it exits and prints $match. Kinda ugly, but works well!
    $n = 5; $char = 'l'; $string = 'hello my name is william gobbel-dy-gook liam'; $count = 0; while ($string =~ /[$char]/g) { $match = $`; $count++; last if $count == $n; } print $match;
Re: match n EMth/EM occurence
by mdillon (Priest) on Mar 31, 2002 at 21:01 UTC
    I wouldn't use a regular expression, but something like this, probably:
    sub before_nth_char { my ($str, $c, $n) = @_; my $pos = -1; { $pos = index $str, $c, $pos + 1; return if $pos == -1; redo unless --$n == 0; return substr $str, 0, $pos; } } print before_nth_char("asdfasdfasdf", "a", 2), $/;
Re: match n EMth/EM occurence
by RMGir (Prior) on Mar 31, 2002 at 21:19 UTC
    With character c, and n-1 replaced with correct value, I think this would work:
    /^((?:[^c]*c){n-1}[^c]*)c/

    --
    Mike

    (Edit: Corion winds up with the same solution up above but this is slightly more efficient since the inner () don't need to be capturing)