Given a set of strings S of fixed length |S|=l and alphabet = {a,b,c} such that:
aaaaaaaaabababacbbbbbbbaccaabc aaaaaaaaaaaaaaaaacbbbbbbbaaaca aaaaaaaaabaaaaabbaaaaabaaccccc
what transforms can i apply to homogenize them by either:

a) grouping same characters together within each string
b) grouping same characters together across all strings (e.g. have most aaa's in one, bbb's in the other, etc. - of course, sorting does not count as the number of character occurrences does not change per string)
c) transform (map) characters e.g. a-> b
d) something else

However, conditions are:
a) the size of the original input data cannot be smaller than its transform, meaning the index needs to be implicitly built into the data (in the same way BWT does it).
b) Types of queries that need to be supported:
What is the character on i-th position of the original string. where 0<=i<=l.

Some of the obvious solutions are:

a) BWT
b) BWT using all strings
c) replace a with b and record coordinates using either bitstrings or ints (but this violates the size condition unless there is a smart way to record positions within the strings itself somehow)
d) something else

Has anyone encountered this problem before? How did you solve it?

thnx


In reply to Problems with strings by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.