Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to match an 8-character word comprising small letters that has the pattern:
ABCDECFG
The only symbol (letter) that is used twice in the word is "C", which is found only in positions 3 and 6.
How does the regex look like?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Regex help
by BrowserUk (Patriarch) on Jun 23, 2007 at 02:46 UTC | |
Update: It does work when I get the syntax of the negative lookahead assertion correct. (?!...) not (?!=...)! Try this:
Expanded out that regex is:
That begs to be generated from some kind of shorthand spec. and actually, you used such a spec in your question. 'ABCDECFG' is a perfect spec. once you think of those letters as placeholders rather than literal characters. Update: And here is a generator:
I wonder if the golfers could reduce that to a one-liner? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by Anonymous Monk on Jun 24, 2007 at 03:01 UTC | |
Thank you :) | [reply] |
by graff (Chancellor) on Jun 29, 2007 at 03:53 UTC | |
(Admittedly, it's still a bit of a mind-bender.) Of course, having that handy regex generator makes the regex formatting and commentary a moot point -- so much nicer to allow people to use the simple "rhyme-scheme" alphabetics for specifying the target pattern, and keep the actual regex syntax a purely "script-internal" detail, hidden from human eyes. | [reply] [d/l] |
by BrowserUk (Patriarch) on Jun 29, 2007 at 05:35 UTC | |
... , but in terms of presenting a human-readable regex, ... I had 3 attempts at documenting it. The one I presented was the least bad to me, as the programmer that constructed it. Once you see the pattern, it's pretty clear how it works and how to extend it to other situations. What I attempted to do was make the pattern clear. But that relates back to a long held belief that the guy that writes the code is the worst possible person to document it. I was fortunate enough to work with the services of a s***-hot technical author (actually 3 over the years, one guy and two women), for a long period. His skill was as much in being able to 'forget' his (self-described, limited) programming skills and so ask questions from the perspective of someone with no knowledge, as it was in his writing. His very capable writing skills, and ability to phrase things clearly and concisely, were just the icing on the cake. His inate ability to tease out the detail that mattered and ignore what I (as programmer, designer or architect) though was important (today, this minute, because I just solved the problem) was far more invaluable. I highly commend and recommend the idea of adding a competent technical author to any team of more than 5 programmers, if you want your documentation to be produced, on time, on budget and in a usable and useful manner. The salary cost of an English Lit, major with a CS minor and a couple of years of exposure to development environments and technical documentation is approximately the same as a CS grad with one year, post grad experience--but the time they will save you, and the quality it will add to your development processes, are worth several times that. Pick the right person, with the right mix of 'people skills' and (metaphoric) balls to not take s*** from developers and managers who think that their part of the process is more important than the TAs. And endow them with sufficient authority from the get-go to allow them to 'pull rank' on deadlines, when the nicely, nicely reminders approach fails--and they will be a valuable asset far exceeding their cost. In a small team that might struggle to find budget for a dedicated TA, you can often find one that will also use their, usually strong organisational and documentary skills, to organise and perform a lot of the day to day housekeeping chores--scheduling, timesheet keeping, minute taking, meeting organisation, checkpoint noting, chasing and documenting; even cardboard programmer(ing?:) when the need arises. In that way, they can allow developers to spend more of their time developing, and less time doing non-programming chores they hate doing and so usually put off until absolutely forced to do them--and then do them badly. Overall, they can be a huge time and money saver. As always with personnel issues, getting the right man or woman for the job is essential. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re: Regex help
by shmem (Chancellor) on Jun 22, 2007 at 14:47 UTC | |
I'm trying to match an 8-character word comprising small letters that has the pattern: Small letters? do you mean small as in low ASCII code or lowercase? ABCDECFG Well, /[A-G]{8}/ should yield true. The only symbol (letter) that is used twice in the word is "C", which is found only in positions 3 and 6. That is true for the word you presented. Now what? ... guessing ... maybe you want ? that would mach 'corporal' successfully. But then, it has two 'o'. Should that fail then? Try again giving us a spec. --shmem
| [reply] [d/l] [select] |
by Anonymous Monk on Jun 22, 2007 at 15:05 UTC | |
Thanks for trying. I should have said "lowercase". I'm looking for a word in a large text that has the following pattern: abcdecgf There are 6 unique letters, one is repeated at positions 3 and 6. I don't know what the word is, so the letter "c" is only an example. | [reply] |
by shmem (Chancellor) on Jun 22, 2007 at 15:22 UTC | |
abcdecgfhas 7 unique letters. A single regexp would be too convoluted I guess (I say that only because such a regexp is beyond my skills :-) update: or laziness :-)
update: changed to -nl to apply on /usr/share/dict/words --shmem
| [reply] [d/l] |
by Anonymous Monk on Jun 23, 2007 at 02:13 UTC | |
by shmem (Chancellor) on Jun 23, 2007 at 06:42 UTC | |
| |
by blazar (Canon) on Jun 22, 2007 at 17:57 UTC | |
There must be 7! (No, not seven factorial...) I don't know what the word is, so the letter "c" is only an example. Are the positions fixed too? I'm assuming they are, since the problem is slightly more complex like that. Of course there are tons of ways to do it. And as shmem wrote, probably not best done with a single regex - although it may be possible, perhaps by means of one of those funky extensions still marked as "experimental". One possible way that springs to my mind is:
Update: or, in a slightly more agile way:
| [reply] [d/l] [select] |
Re: Regex help
by Roy Johnson (Monsignor) on Jun 22, 2007 at 15:08 UTC | |
(update: changed ?: to ?=) Caution: Contents may have been coded under pressure. | [reply] [d/l] |
by Anonymous Monk on Jun 23, 2007 at 02:16 UTC | |
I ran your code against the dict file containing words with length 8. I didn't get any results - I'm suspecting maybe the word required by the regex doesn't exist. | [reply] |
Re: Regex help
by archfool (Monk) on Jun 22, 2007 at 15:01 UTC | |
-ArchFool | [reply] [d/l] |
by Anonymous Monk on Jun 23, 2007 at 02:17 UTC | |
I had something like yours. It produced results with other repeated symbols (letters) than the ones at positions 3 and 6. | [reply] |
Re: Regex help
by Fletch (Bishop) on Jun 22, 2007 at 14:37 UTC | |
Erm . . . /ABCDECFG/ Try reading How (Not) To Ask A Question and see if you can't ask a clearer question. | [reply] [d/l] |