Actually, my understanding has always been that ROT-N is just a notation for a Caesar cipher with a specified N. ROT13 is the case of a Caesar cipher rotated by 13d characters, which when using the 26d (2d*13d) character Latin alphabet means that rot(rot(x, 13d), 13d)=x. For any other N, deciphering would be the case of using (26d-N), or rot(rot(x, N), (26d-N))=x. Extending this further, giving a ROT-N of an M-character alphabet, this becomes rot(rot(x, N), (M-N))=x. If N is larger than M, the encoding can be simplified to (N % M) and decoding to (M - (N % M)) (thus if M=26d, ROT-53d simplifies to ROT-1d, decoded by ROT-25d).
I have never heard of ROT-N notation being in anything but decimal (but that may also be my lack of exposure). As far as the most common encodings (UTF-8, UTF-16, and UTF-32), all support the 1_112_064d Unicode code points currently defined. Thus an N value of 556_032d (hex: 0x8_7C00) should result in the equivalent behavior for the existing defined code points to the ROT-13d with the 26d-character Latin alphabet (i.e., a self-decoding function).
Below are the encoding and decoding rotations for a 26d, 256d, and 1_112_064d character "alphabets" for various N. It should be noted using 0x8000 (32_768d) rotations on a 256-character alphabet is the equivalent of "double ROT-13d encoding" on a 26-character alphabet, and that using the current number of code points (1_112_064d) has the effect on both a 256-character and 1_112_064-character alphabet.
(If you find an error in my logic or values, please advise, so I can correct my understanding and/or data,
as appropriate.)
Rotations |
26d-char encoding |
26d-char decoding |
256d-char encoding |
256d-char decoding |
1_112_064d-char encoding |
1_112_064d-char decoding |
13d (0x0D) |
13d (0x0D) |
13d (0x0D) |
13d (0x0D) |
243d (0xF3) |
13d (0x0D) |
1_112_051d (0x10_F7F3) |
26d (0x1A) |
0d (0x00) |
0d (0x00) |
26d (0x1A) |
230d (0xE6) |
26d (0x1A) |
1_112_038d (0x10_F7E6) |
128d (0x80) |
24d (0x18) |
02d (0x02) |
128d (0x80) |
128d (0x80) |
128d (0x80) |
1_111_936d (0x10_F780) |
256d (0x100) |
22d (0x016) |
4d (0x004) |
0d (0x000) |
0d (0x000) |
256d (0x100) |
1_111_808d (0x10_F700) |
8000d (0x1F40) |
18d (0x12) |
8d (0x08) |
64d (0x40) |
192d (0xC0) |
8000d (0x1F40) |
1_104_064d (0x10_D8C0) |
32_768d (0x8000) |
8d (0x08) |
18d (0x12) |
0d (0x00) |
0d (0x00) |
32_768d (0x8000) |
1_079_296d (0x10_7800) |
556_032d (0x8_7C00) |
22d (0x0_0016) |
4d (0x0_0004) |
0d (0x0_0000) |
0d (0x0_0000) |
556_032d (0x8_7C00) |
556_032d (0x8_7C00) |
1_112_064d (0x10_F800) |
18d (0x00_0012) |
8d (0x00_0008) |
0d (0x00_0000) |
0d (0x00_0000) |
0d (0x00_0000) |
0d (0x00_0000) |
Hope that helps.
| [reply] [d/l] [select] |
As I said, it's a misnomer.
There is no formula with fixed N here because it operates with a 2^16 lookup table to avoid non-printable characters in both directions.
Edit
So the actual N(X) for a mapping
Y=R(X)
N(X)= N(Y)= Y-X
will vary near approximately 2^15-20+-8 (?).*
And it ignores anything >= 2^16 like emojis, similar to ROT13 ignoring any ASCII outside the alphabet.
*) Actually the description of the author was outdated and doesn't fit his code.
He's excluding 2100 characters.
| [reply] |
Hnngngng ... it's a misnomer
AFAIS
- it's only operating on UCS-2 and ignoring the planes above
- it's ignoring whitespaces by it's own definition of whitespace
- it's avoiding some control characters
So not just a simple rotate by 0x8000 = 2^15!
One needs
- an ordered list of all allowed characters.
- divide them by two
- map them in a lookup hash with 2^16 entries
- the ignored ones map to themself
This JS implementation should be a good start for a Perl port
https://github.com/rottytooth/rot8000/blob/main/rot8000.js
You can use ord and chr to convert to codepoint and back
HTH!
| [reply] |