BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

You have a string containing equal numbers of the letters 'a' .. 'j'; randomly ordered:

chhbfbjjjcgaiiaachbaefeabcbjffdgaggbgccaihgjddedfheejdigcdeiacebdehfig +egifibdabhfdihbdhifcgjfjhcaejj

You want to map those 10 letters to the four digits '1'..'4', such that you end up with equal numbers of those four digits in the result.

This is just a random thought arising; but I'm stuck for an approach?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

Replies are listed 'Best First'.
Re: mapping 10 -> 4 equally.
by davies (Monsignor) on Jun 20, 2015 at 21:20 UTC

    I may be misunderstanding, but if you're after something where, say, a, b & c = 1, d, e & f = 2, g, h & i = 3 and j = 4, then the approach I would use is to count them first and then apply a linear programming tool to get the closest match, bearing in mind that it may not be possible to match perfectly. However, if you have a more complicated system, where you want to map to hundreds of IDs, you run into the travelling salesman problem. Theoretically, LP can do it, but not before the heat death of the universe.

    Update: Seeing your reply above, I had misunderstood. I, too, thought you wanted one number per letter. But if multiple numbers per letter are allowed, why not just allow the first 25% of all letters to be 1, the second to be 2 etc? I suspect you have stricter rules in mind. Could you be clearer about them, please?/update

    Regards,

    John Davies

Re: mapping 10 -> 4 equally.
by Anonymous Monk on Jun 20, 2015 at 21:00 UTC

    That's not possible for your example string, is it? (there's 10 x 10 characters)

    Maybe I'm overthinking... bin packing?

    use Algorithm::Bucketizer; while (my $in = <>) { chomp($in); my %counts; $counts{$_}++ for split //, $in; my $b = Algorithm::Bucketizer->new(bucketsize=>length($in)/4); $b->add_item($_, $counts{$_}) for keys %counts; $b->optimize(maxrounds=>100); print $_->serial, " => ", join(", ", map {"$_ ($counts{$_})"} $_->items ), "\n" for $b->buckets; }

    e.g.

    chhbfbjjjcgaiiaachbaefeabcbjffdgaggbgccaihgjddedfheejdigcdeiacebdehfig +egifibdabhfdihbdhifcgjfjhcaejj 1 => f (10), g (10) 2 => b (10), h (10) 3 => i (10), j (10) 4 => e (10), d (10) 5 => c (10), a (10) abbbbbbbccddddddeeefffffgggghhhh 1 => e (3), f (5) 2 => b (7), a (1) 3 => c (2), d (6) 4 => g (4), h (4)
      That's not possible for your example string, is it?

      I don't see why not?

      If a->1, b->2, c->3, d->4, e->1, f->2, g->3, h->4, every other i->1|2, every other j->3|4; that'd would 25 each of 1, 2, 3 & 4.

      Which I guess indicates a laborious way to do it:

      chhbfbjjjcgaiiaachbaefeabcbjffdgaggbgccaihgjddedfheejdigcdeiacebdehfig +egifibdabhfdihbdhifcgjfjhcaejj 3442223433311211342112112324224313323331143344142411442334111312414223 +131222412424142442233324431134 11 11 11 11 1 11 1 11 111 1 1 +1 1 1 1 11 25 222 2 2 2 2 2 22 2 2 2 2 22 + 222 2 2 2 22 2 25 3 3 333 3 3 3 33 333 33 33 3 3 + 3 333 3 3 25 44 4 4 4 4 4 44 4 4 44 4 4 4 + 4 4 4 4 44 44 4 25

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

        Ah I see, I thought each letter could only be mapped to one number.

Re: mapping 10 -> 4 equally.
by marinersk (Priest) on Jun 21, 2015 at 16:09 UTC

    This should do it -- not using any advanced algorithms, just a simple, two-pass distribution:

    #!/usr/bin/perl use strict; use List::Util; # -----[ Constants ]-------------------------------------------------- +---------- my @ORGLTR = qw /a b c d e f g h i j/; my @REQNUM = qw /1 2 3 4/; my $LTRREP = 10; # Number of times we repeat ea +ch letter # -----[ Generate unrandomized list ]--------------------------------- +---------- my @basgen = (); foreach my $curltr (@ORGLTR) { for (my $currep = 0; $currep < $LTRREP; $currep++) { push @basgen, $curltr; } } my $basdsp = join '', @basgen; # -----[ Randomize ]-------------------------------------------------- +---------- my @rndstr = &List::Util::shuffle(@basgen); # -----[ Break it down ]---------------------------------------------- +---------- my $ltrcnt = @ORGLTR; # How many distinct letters in + original set? my $numcnt = @REQNUM; # How many numbers in requeste +d results? my $rescnt = @rndstr; # How many letters in full set +? my $basdst = int($ltrcnt / $numcnt); # Base integral distribution my $remltr = $ltrcnt % $numcnt; # What isn't covered by the ba +se my $supdst = $numcnt / $remltr; # Supplemental distribution if ($numcnt % $remltr) { print "Sorry -- $LTRREP x $ltrcnt letters cannot be redistributed +equally as $numcnt numbers.\n"; } else { my @wrkltr = @ORGLTR; # Letters to work on my $rndstr = join '', @rndstr; # The randomized string print "Original String: $rndstr\n"; # Perform base distribution { foreach my $curdst (1 .. $basdst) { foreach my $curnum (@REQNUM) { my $curltr = shift @wrkltr; $rndstr =~ s/$curltr/$curnum/g; } } } # Perform supplemental distribution { my $curnum = 0; my $supcnt = $ltrcnt * $remltr; foreach my $curdst (1 .. $supcnt) { foreach my $curltr (@wrkltr) { $curnum++; if ($curnum > $numcnt) { $curnum = 1; } $rndstr =~ s/$curltr/$curnum/; } } } # Display result print " Final string: $rndstr\n"; } # -----[ Cleanup ]---------------------------------------------------- +---------- exit; __END__ # -----[ Fini ]------------------------------------------------------- +----------

    Results:

    D:\PerlMonks>remap-nodebug.pl Original String: ibcdhjfifhiabahgejedgihgacchdijaedgihcifjejfghecgdae +cjdaibcgacaegjcbegghfeibbfdhhjbjbdadbacfdffbijfe Final string: 1234422324112143141433431334412114334312412234133411 +344132331311323213342112224444222414213242223421 D:\PerlMonks>

    Update: Clarified reason for apology message.

      FWIW: this is what I came up with yesterday after anonymonk prompted me into getting my mind around what I was trying to describe in the OP:

      $x = join'', shuffle map{ ($_)x10 } 'a'..'j'; print $x;; ahgbjgbcjjgbbfdiadeehgdhcijjebaaegehccdfdbejcgdhehcbehdgcihiffgefjijab +fbjadfaaiiiahfegihdgdaccicffjb $y = $x; $y =~ tr[abcdefgh][12341234]; print $y;; 1432j323jj32224i141143443ijj121113143342421j3344143214433i4i22312jij12 +22j14211iii14213i4434133i322j2 $i=0; $j=0; $y =~ s[([ij])]{ $1 eq 'i' ? ( ($i^=1) ? '1' : '2' ) : ( ($j^=1) ? '3' + : '4') }ge; print $y;; 1432332343322241141143443243121113143342421433441432144331422231231412 +223142112121421314434133232242

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!