Oh. Yes, it's probably better to decline that offer...

Hm. I think I may have given a wrong impression here.

Think of the description lines in FASTA files. They can contain anything useful to the researcher, and often contain stuff that only makes sense to the originator; thus it was often written in a local code page. Each individual string makes sense in the context of its file and origin.

Now take a bunch of legacy FASTA files that originate from all over the world and bring them together into a central DB and index them by their descriptions. And then try to bring the index of legacy descriptions together with more modern ones with their descriptions in Unicode. Now sort them together to provide a single index.

That's pretty close to the problem.

Ideally, the descriptions would all be converted into Unicode; but that requires a huge effort entailing a bunch of translators working in many different languages to translate technical terms; abbreviations, and anything else the originating researchers felt important to put there in his own language. Basically an impossible task.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^8: Mixed Unicode and ANSI string comparisons? by BrowserUk
in thread Mixed Unicode and ANSI string comparisons? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.