in reply to Regex help

See I know what I mean. Why don't you?
I'm trying to match an 8-character word comprising small letters that has the pattern:

Small letters? do you mean small as in low ASCII code or lowercase?

ABCDECFG

Well, /[A-G]{8}/ should yield true.

The only symbol (letter) that is used twice in the word is "C", which is found only in positions 3 and 6.

That is true for the word you presented. Now what? ... guessing ... maybe you want

$c = substr($_,2,1); $pat = "\[^$c\]{2}$c\[^$c\]{2}$c\[^$c\]{2}"; print "yup" if /$pat/
? that would mach 'corporal' successfully. But then, it has two 'o'. Should that fail then?

Try again giving us a spec.

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^2: Regex help
by Anonymous Monk on Jun 22, 2007 at 15:05 UTC
    Hi shmem,

    Thanks for trying.

    I should have said "lowercase".

    I'm looking for a word in a large text that has the following pattern:

    abcdecgf

    There are 6 unique letters, one is repeated at positions 3 and 6.

    I don't know what the word is, so the letter "c" is only an example.

      abcdecgf
      has 7 unique letters.

      A single regexp would be too convoluted I guess (I say that only because such a regexp is beyond my skills :-) update: or laziness :-)

      #!/usr/bin/perl -nl if (length == 8) { $c = substr ($_, 2, 1); if (substr ($_, 5, 1) eq $c) { my %h; @h{split//,$_} = (1) x 8; print if keys %h == 7; } } __END__ perl match.pl /usr/share/dict/words Abednego abscised Acadians Acadia's Acalia's acerbest Adaiha's Adalia's Adelbert ... whirling whisking Wieche's wielders Winton's worker's writhing Yahweh's Yeargain Ygerne's Yorker's Zarger's Ziegfeld

      update: changed to -nl to apply on /usr/share/dict/words

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Thanks again.

        I tried your code on a dict file of words with length 8. Here're some of the results:

        ... abetters (not ok, t is repeated too) abigails (not ok) abillity (not ok, l is repeated too) abscises (not ok, s repeated more than twice) abscisin (not ok, i is repeated too) abscisse (not ok) acaudate (not ok) ...
        The result list is about 3300 words. I scanned through about 200 and couldn't find one that fits the regex...Maybe the word doesn't exist in the list...

      abcdecgf

      There are 6 unique letters, one is repeated at positions 3 and 6.

      There must be 7! (No, not seven factorial...)

      I don't know what the word is, so the letter "c" is only an example.

      Are the positions fixed too? I'm assuming they are, since the problem is slightly more complex like that. Of course there are tons of ways to do it. And as shmem wrote, probably not best done with a single regex - although it may be possible, perhaps by means of one of those funky extensions still marked as "experimental". One possible way that springs to my mind is:

      #!/usr/bin/perl use strict; use warnings; $_=<<'.'; bEjhMELGUaL smtMDEYSxyDvuQiUfAbJfYMPnfJAqaPnKL VWZWSdfYRSaSGlXOyPfxusC dtRAHabcdecgf taNdvtKdBlJcnFryVXObEDvawRyviWO hwlKiBpDWYeBPYhlpKFvrSeQ ksWmkXqQdLQPIzvKFE Jqrclq mPqQbMvkAx LtVuFMehKirSATuqlFzqwRknocsrcKXAE FNbOivdvkRonEkg apuPyHpTlssvVs BbwiHBvhfrSFwVkhwHkvoYjaGgntzFbEvPCIttD IAlYqoLUjtxsYvbwUBHIoMYmPJbGeXymuwERkHwSyKbE XMCcFgsYPzmJbVUsOwfDTgUiJ . while (/(?=([a-zA-Z]{8}))/g) { my @l = (my $found=$1) =~ /./g; print $found, "\n" if $l[2] eq $l[5] and do { my %h; @h{@l}=(); 7 == keys %h; } } __END__

      Update: or, in a slightly more agile way:

      for (/(?=([A-zA-Z]{8}))/g) { my @l=/./g; print $_, "\n" if $l[2] eq $l[5] and do { my %h; @h{@l}=(); 7 == keys %h; } }