in reply to Re: Non-Formula based Text Encoding - with Compression
in thread Non-Formula based Text Encoding - with Compression

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re^3: Non-Formula based Text Encoding - with Compression
by jeffa (Bishop) on Apr 16, 2015 at 18:25 UTC

    "You all need to read the comments ..."

    To be fair, you really could have made the "instructions" more clear, but i did take your suggestion and re-read your comments and came up with the following code:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use List::Util qw( shuffle ); my $str = shift || 'The impatient fox jumped over the lazy camel with +hubris.'; my( @codes, %encode ); my @base = ('a' .. 'z', 'A' .. 'Z', 0 .. 9); for my $i (@base) { push @codes, $i; for my $j (@base) { push @codes, "$i$j"; for my $k (@base) { push @codes, "$i$j$k"; } } } @codes = shuffle @codes; for (split /\s+/, $str) { $encode{$_} = shift @codes; } $str =~ s/([A-Za-z0-9.]+)/$encode{$1}/eg; print $str,$/; my %decodes = reverse %encode; $str =~ s/(\w+)/$decodes{$1}/eg; print $str,$/;
    And yes, this is kind of cool. Why you didn't provide an end-to-end complete example is probably the root of your down-votes on the OP.

    "It is obvious to any Perl developer worth his salt what the code is doing and what the purpose is."

    Just remember, the more time you spend ensuring that your documentation is unambiguous, clear and complete the less time you'll spend defending your position, which essentially boils down to an inability to communicate effectively. Next time ... take the time to make your presentation more presentable -- in other words, implement the purpose as well.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      I've been trying to figure this out since jeffa posted this, as I find the topic interesting. I couldn't see how this accomplished its compression, but it seemed to work with a very long word. I had to puzzle over it quite a bit and add print statements to figure out how it worked.

      $ perl code3.pl string is Rumpelstiltskinrumpelstiltskindisestablishmentarianism Camel + Camel The the T t Tt he h e he e code is Gzv code is rQ code is Z9z code is FMp code is Urm code is kCd code is oC2 code is 71G code is Xyy code is 912 code is 4Ja code is BHX code is Ats code is syv code is iRP code is Lwn default is Rumpelstiltskinrumpelstiltskindisestablishmentarianism default is Camel default is Camel default is The default is the default is T default is t default is Tt default is he default is h default is e default is he default is e Gzv Z9z Z9z FMp Urm kCd oC2 71G BHX 912 Ats BHX Ats Rumpelstiltskinrumpelstiltskindisestablishmentarianism Camel Camel The + the T t Tt he h e he e $VAR1 = { 'e' => 'Ats', 'the' => 'Urm', 'Camel' => 'Z9z', 'T' => 'kCd', 'he' => 'BHX', 'Rumpelstiltskinrumpelstiltskindisestablishmentarianism' => +'Gzv', 'h' => '912', 'The' => 'FMp', 't' => 'oC2', 'Tt' => '71G' }; $

      One sees that the first random values in the code array become the keys of the encode hash. I was hoping to actually accomplish some of the functionality that OP had hinted at, so I created some apparatus for saving to file as well:

      #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use List::Util qw( shuffle ); use v5.12; use File::Slurp; use Path::Class; my $str = shift || 'Rumpelstiltskinrumpelstiltskindisestablishmentaria +nism Camel Camel The the T t Tt he h e he e'; say "string is $str"; my( @codes, %encode ); my @base = ('a' .. 'z', 'A' .. 'Z', 0 .. 9); for my $i (@base) { push @codes, $i; for my $j (@base) { push @codes, "$i$j"; for my $k (@base) { push @codes, "$i$j$k"; } } } @codes = shuffle @codes; # write to file my $path = '/home/twain/Desktop'; my $file = 'code1.txt'; my $code_sheet = file($path,$file); write_file( $code_sheet, "@codes \n" ) ; for my $i (0..15) { say "code is $codes[$i]"; } for (split /\s+/, $str) { say "default is $_"; $encode{$_} = shift @codes; } $str =~ s/([A-Za-z0-9.]+)/$encode{$1}/eg; print $str,$/; my %decodes = reverse %encode; $str =~ s/(\w+)/$decodes{$1}/eg; print $str,$/; my $refhash = \%encode; print Dumper $refhash; __END__ =head1 NAME non-formula-based-text-encoder-with-compression.pl =head1 DESCRIPTION =over 4 =item * Create 242,234 unique codes, 1-3 characters in length, from th +e characters {a-zA-Z0-9}. =item * 62 (1 char codes) + 3844 (2 char codes) + 238,328 (3 char code +s) = 242,234 unique codes. =item * (Somewhat like MIMEbase64 encoding, but my encoding is non-for +mula based.) =item * codesheet to be saved for look-up =back =cut

      I see this at a dead-end now, and the compression I thought I was seeing was actually just that the randomly-generated keys were smaller than the values of a hash. It was fun to play with, and I thank jeffa for writing it. I finally figured out how to display the program description:

      sudo apt-get install perl-doc perldoc code3.pl

      Q1) What are good perl tools for crytography of one's sensitive data?

Re^3: Non-Formula based Text Encoding - with Compression
by Your Mother (Archbishop) on Apr 16, 2015 at 16:54 UTC
    You all are just showing everyone you really don't understand Perl that well.
    split / */

    Cough… cough…

    For my part, I liked jeffa’s version. Comments are always better as Pod. I’m not quite sure what “non-formula” means in this context. Is it a technical term related to encryption (not my forte)?

      My guess is this is "non-formula" in that there is a lookup instead of evaluating a function that uses the plaintext contents. Excessively simple example of Caesar cipher:

      $plaintext = 'Attack at dawn!'; # formula - very simple encoding function foreach $chr (split //,$plaintext) { $ciphertext .= chr(ord($chr)+1) } print $ciphertext,"\n"; $ciphertext = ''; # non-formula # only need to do this next step once, then store hash somewhere to us +e when encoding foreach $letter ('A'..'Z','a'..'z',' ','!') { $shifted{$letter} = chr(ord($letter)+1) } foreach $chr(split //,$plaintext) { $ciphertext .= $shifted{$chr} } print $ciphertext, "\n";
      Dum Spiro Spero

      How do you get the comments to display when they are in POD format?

Re^3: Non-Formula based Text Encoding - with Compression
by GotToBTru (Prior) on Apr 16, 2015 at 17:31 UTC

    "The text document can be rewritten in encoded/compressed fashion this way, by mapping words to codes, then decoded at a later time using a persistent Perl SDBM database tied hash table holding the word-to-code mappings for a particular text document."

    It's a hash of word to code mappings, right?

    $hash{$word} = $code

    You put in one key, you get one value. One to one. You suggest there are many ways to accomplish this mapping, and perhaps you are thinking of using more than one of your codes to correspond to one word:

    $hash{$code1} = $word; $hash{$code2} = $word;

    This means encoding will require scanning the entire hash value list and then picking at random one of the possible code keys. It would work, of course; it just looks IMHO kind of clunky. You could make it as difficult to break as you needed, but in the end it would always be breakable.

    Since you already have in place a secure method for exchanging your database files, why not exchange random pads instead?

    Dum Spiro Spero