in reply to Re: Regex help
in thread Regex help

Hi shmem,

Thanks for trying.

I should have said "lowercase".

I'm looking for a word in a large text that has the following pattern:

abcdecgf

There are 6 unique letters, one is repeated at positions 3 and 6.

I don't know what the word is, so the letter "c" is only an example.

Replies are listed 'Best First'.
Re^3: Regex help
by shmem (Chancellor) on Jun 22, 2007 at 15:22 UTC
    abcdecgf
    has 7 unique letters.

    A single regexp would be too convoluted I guess (I say that only because such a regexp is beyond my skills :-) update: or laziness :-)

    #!/usr/bin/perl -nl if (length == 8) { $c = substr ($_, 2, 1); if (substr ($_, 5, 1) eq $c) { my %h; @h{split//,$_} = (1) x 8; print if keys %h == 7; } } __END__ perl match.pl /usr/share/dict/words Abednego abscised Acadians Acadia's Acalia's acerbest Adaiha's Adalia's Adelbert ... whirling whisking Wieche's wielders Winton's worker's writhing Yahweh's Yeargain Ygerne's Yorker's Zarger's Ziegfeld

    update: changed to -nl to apply on /usr/share/dict/words

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Thanks again.

      I tried your code on a dict file of words with length 8. Here're some of the results:

      ... abetters (not ok, t is repeated too) abigails (not ok) abillity (not ok, l is repeated too) abscises (not ok, s repeated more than twice) abscisin (not ok, i is repeated too) abscisse (not ok) acaudate (not ok) ...
      The result list is about 3300 words. I scanned through about 200 and couldn't find one that fits the regex...Maybe the word doesn't exist in the list...
        Did you run exactly the code I posted? You must have some cut'n'paste error. My result list is 776 words and none of those you report are included.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re^3: Regex help
by blazar (Canon) on Jun 22, 2007 at 17:57 UTC

    abcdecgf

    There are 6 unique letters, one is repeated at positions 3 and 6.

    There must be 7! (No, not seven factorial...)

    I don't know what the word is, so the letter "c" is only an example.

    Are the positions fixed too? I'm assuming they are, since the problem is slightly more complex like that. Of course there are tons of ways to do it. And as shmem wrote, probably not best done with a single regex - although it may be possible, perhaps by means of one of those funky extensions still marked as "experimental". One possible way that springs to my mind is:

    #!/usr/bin/perl use strict; use warnings; $_=<<'.'; bEjhMELGUaL smtMDEYSxyDvuQiUfAbJfYMPnfJAqaPnKL VWZWSdfYRSaSGlXOyPfxusC dtRAHabcdecgf taNdvtKdBlJcnFryVXObEDvawRyviWO hwlKiBpDWYeBPYhlpKFvrSeQ ksWmkXqQdLQPIzvKFE Jqrclq mPqQbMvkAx LtVuFMehKirSATuqlFzqwRknocsrcKXAE FNbOivdvkRonEkg apuPyHpTlssvVs BbwiHBvhfrSFwVkhwHkvoYjaGgntzFbEvPCIttD IAlYqoLUjtxsYvbwUBHIoMYmPJbGeXymuwERkHwSyKbE XMCcFgsYPzmJbVUsOwfDTgUiJ . while (/(?=([a-zA-Z]{8}))/g) { my @l = (my $found=$1) =~ /./g; print $found, "\n" if $l[2] eq $l[5] and do { my %h; @h{@l}=(); 7 == keys %h; } } __END__

    Update: or, in a slightly more agile way:

    for (/(?=([A-zA-Z]{8}))/g) { my @l=/./g; print $_, "\n" if $l[2] eq $l[5] and do { my %h; @h{@l}=(); 7 == keys %h; } }