This is part 2 of a 3 part series (part 1 is given here). You may want to refer to part 1 for some ideas to help here; in addition, part 2's answer will help with part 3 which will not be posted for a few more days.

Some definitions now: ciphertext will be the text that makes up the encoded message, while normaltext will be the decoded message. All characters from both cypher and normal text outside of punctuation will be within the same alphabet, represented by the text string $alphabet; all characters in this alphabet are in the ASCII 7-bit standard set, but may not strictly be a-z, for example. There are no capital letters in either ciphertext or normaltext; that is, if 'a' and 'A' are both used in the same message, they represent two completely different letters as opposed to the normal way of considering them as two different cases of the same letter. A cipher-pattern is what was previously described in part 1; eg, a possible cipher-pattern for the word 'people' is 'abcadb'. Also, for purposes of this, assume the only punctuation is the space (' '); all other punctuation has been stripped or is otherwise part of the alphabet.

Given: a ciphertext phrase as a string, the alphabet string that was used in both normal and cyphertext, and a dictionary that is also in that alphabet as an array (assume this has been previously loaded by a call such as @dict = <DICT>; where DICT is a filehandle to /usr/dict/words or such). Also, two numbers $min and $max, where 0 < $min < $max.

Find a perl golf solution that returns a hash of arrays; the keys of this hash are the cyphertext words from the phrase, containing at least $min characters but no more than $max characters. The values are arrays of words from the dictionary that have the same cipher-pattern as the associated hash key.

The solution should count the number of characters between the brackets of the subroutine described above; if you use another subroutine, such as the one from part 1, you need to include the minimum 7 characters for a sub ("sub x{}") in your character count as well as the text of that sub.

As examples of how the sub should be used, I give the following:

my $alphabet = join '', ('a'..'z'); my $phrase = "abcd efgdhij kijl hecmin"; my ( $min, $max ) = (5, 15); my @dict = <DICT>; # or similar my %hash = t( $phrase, $alphabet, $min, $max, \@dict ); # see update 2 below!! # Hash will look like: # %hash = ( efgdhij => [ "another", "fathers", ... ], # hecmin => [ "hacker", "strong", ... ] );

(Note that this one should be rather simple despite the length of the setup for it...)

update fixed links, qualified "punctuation" a bit

update 2: On second thought, it's probably saner to pass the dictionary array as a reference rather than value. I'll note that in what I expect to be the average solution, this only adds an extra character for the one dereferencing that you have to do in the subroutine.


Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

Replies are listed 'Best First'.
Re: (Golf) Cryptographer's Tool #2
by chipmunk (Parson) on Jun 26, 2001 at 01:38 UTC
    It's been five hours and no one has posted a solution yet... I wonder if this one isn't as simple as Masem would have us believe... :)

    I've been working on this for a while, and I think I've gotten my solution as short as I can. (Then again, I thought the same thing 16 characters ago...) 143 characters:

    sub cypher_matches { sub c{my%h;($j=$_)=~s/./substr"a$a",$h{$&}||=keys%h,1/gse;$j}($_,$a,$n +, $m,@d)=@_;map{y///c<$n|y///c>$m?():do{$c=c;$_,[grep{c eq$c}@d]}}/[^ ]+ +/g }
Re: (Golf) Cryptographer's Tool #2
by petral (Curate) on Jun 26, 2001 at 02:02 UTC
    Maybe this will help. I took the approach that it doesn't matter what the "cypher pattern" is as long as it's the same for both, but I'm probably missing something.
    sub e{$a=$_;eval"\$a=~y/\Q$_\E/\0-~/";$a} sub c{ ($_,$m,$n,$a,$d)=@_; map{$x=e; $_,[grep{e eq$x}@$d] }/\b[^ ]{$m,$n}\b/g }
    107 chars w/o the linebreaks and indentation.
    update: changed \S to [^ ] (as per chipmunk) to match tabs and newlines in encoded strings: 109 chars.
    (but note that using $_[0..4] would get back a couple of chars.)
    Actually, the really correct way to do it seems to be something along the lines of:
    map{$x=e;$_,[grep{e eq$x}@{$_[4]}]} " $_[0] "=~/ ([^ ]{$_[1],$_[2]})(?= )/g
    which makes it 115 chars.

    update2: Same length but more straightforward:
    map{$x=e;$_,[grep{e eq$x}@{$_[4]}]} grep{/^.{$_[2],$_[3]}$/s}$_[0]=~/[^ ]+/g
    Well it would be the same length:  `grep/.../,' parses, but    `grep/.../s,' doesn't seem to.
    Oh, and this last has the args ordered right. As to how I got it "working with the parameter errors"; simple, I was calling it with the args in the wrong order.

      p

      I tried a few different approaches, but to no avail. So instead, I'm going to post a variation on petral's that has the added advantage of actually working. I'm just being funny because I don't have anything better.
      sub e{$a=$_;eval"\$a=~y/\Q$_\E/\0-¦/";$a} sub c{ ($_,$m,$n,$a,*d)=@_; map{$x=e;$_,[grep{e eq$x}@d]} /\b[^ ]{$n,$a}\b/g }
      Two quick fixes. Instead of using @$d, just use @d, but declare a glob in the parameter extraction phase, *d. A silly trick, I know, but I've got to use it while I can before Perl 6 comes and takes away all my toys. Second, for whatever reason, petral wanted to use the second parameter $m (the "alphabet") as the minimum, instead of $n. Thus, the range should be {$n,$a}, not {$m,$n}.

      A change for the detail oriented required using '¦' (ASCII 127) instead of '~' (ASCII 126) in the pattern. It looks like the vertical pipe character, but isn't. The input specification said 7 bit (0-127), so this was a pseudo-off-by-one error.

      Not sure how petral managed to get that thing working without noticing the parameter errors. Good show!

      I'll just go on to post a few observations.

      1. The tr// operator can be fed "garbage" input and still make sense of it. This allows you to do duplicate characters in the first part, such as "tr/froggy/abcdef" which when run on "froggy" will give you "abcddf" even though 'g' has two possibilities. The first is taken, subsequent ones are ignored.
      2. You can always "overshoot" the second part of a tr by as much as you want. The extra stuff is ignored.
      3. The \b characters do not match newlines or tabs when the [^ ] (not space) element is there to pick up the slack (greedy).
        Two quick comments... :)

        Remember that one 7-bit character, the space, is being used as puncuation, so there really are at most 126 characters in the alphabet.

        The use of word-boundaries (\b) to match the words is rather strange here, because we're using words that don't necessarily consist of "word characters". For example, if $n and $a were 2 and 4, that regex would match ('this', '-is!', 'one+', word') in 'this-is!one+word?'.

        Negative look-ahead/-behind assertions are necessary for this approach: /(?<! )[^ ]{$n,$a}(?! )/g