ramish has asked for the wisdom of the Perl Monks concerning the following question:

I have been trying to emulate EBCDIC sort in perl but without any success. I am a new user of perl . I have seen answers for alphanumeric sorts but not mixed alphabet and numeric sort. In EBCDIC Collating sequence, alphabets are soreted first in ascending order and then numeric. Here is what I want to achieve: Unsorted ased12 frtg Frth zdcs 123 12h awq ------------------ Sorted ased12 awq frtg Frth zdcs 123 12h Another unrelated question but how to get LC_COLLATE=EBCDIC working in UNIX sort

Replies are listed 'Best First'.
Re: EBCDIC sort
by BrowserUk (Patriarch) on May 21, 2007 at 18:36 UTC

    Assuming that you got the last two strings in your sorted example the wrong way around; Ie. '12h' should sort before '123' according to the logic of the rest of the example? (Ignore this if you didn't, but explain better :)

    Using an ST. Transliterate the strings to map the characters to the collating sequence desired, sort and then discard the transliterated values:

    @a = qw[ ased12 frtg Frth zdcs 123 12h awq ];; print for map { $_->[0] } sort{ $a->[1] cmp $b->[1] } map{ (my $t = $_) =~ tr[aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWx +XyYzZ0-9] [\x00-\x3e]; [ $_, $t ] } @a;; ased12 awq frtg Frth zdcs 12h 123

    Update: Actually, skip the ST. Just tranliterating on the way in and back again on the way out works out much (65%) quicker:

    @a = qw[ ased12 frtg Frth zdcs 123 12h awq ]; print for map{ tr[\x00-\x3e][aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ +0-9]; $_; } sort map{ tr[aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0-9][\x00-\ +x3e]; $_; } @a; ased12 awq frtg Frth zdcs 12h 123

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: EBCDIC sort
by Errto (Vicar) on May 21, 2007 at 18:39 UTC
    I do not have experience with this but a quick reading of perlebcdic suggests that perhaps this could be accomplished with Encode. Given @strings which contains a set of ordinary character strings on a non-EBCDIC platform, the following should sort them in EBCDIC order:
    use Encode qw(encode decode); ... my @sorted = map { decode('cp1047', $_) } sort (map { encode('cp1047' +, $_) } @strings);
      Esteemed Monks,

      Thanks for the suggestionsSorry for not formatting the question properly and not listing my code.

      The code works but I am facing another problem.

      I am sorting a file that it more than 400MB size and using multiple keys. While executing, the sort fails with core dump and sometimes illegal instruction message.

      I did some investigation and found out that it is due to insufficeint memory.

      I am running it on AIX server and when I did ulimit -a it gave memory as 65536 bytes. I have unlimited file size permission. I reduced the size of the input file to about 60 K and the sort worked. I have checked and I can't use malloc() or reset it to be used during runtime.

      use strict; use Encode qw(encode decode); ### Define the sort key here ### # Sorts in ascending order. sub key1 { ( substr( $a, 3, 17 )) cmp ( substr( $b, 3, 17 )); } # Sorts descending order. sub key2 { ( substr( $b, 20, 2 )) cmp ( substr( $a, 20, 2 )); } # ### Sort processing starts ### my @infile = <>; # Reads file ### Multiple sort keys can be defined and sorted in the order of t +he key my @sorted = map { decode('cp1047', $_) } sort { key1 || key2 } (m +ap { encode('cp1047', $_) } @infile); print @sorted;
      Is there a more efficient way to reduce memory usage?

        Do a GRT through /bin/sort.

        # ... use IPC::Open2 'open2'; my( $chaos, $sorted ); my $pid= open2( $sorted, $chaos, 'sort' ); while( <> ) { my $ascii= encode( 'cp1047', $_ ); print $chaos substr($ascii,3,17), substr($ascii,20,2), $_; } close $chaos; while( <$sorted> ) { print substr( $_, 19 ); }

        or thereabouts. It looks like your EBCDIC file is using ASCII newlines, which simplifies things.

        - tye        

Re: EBCDIC sort
by talexb (Chancellor) on May 21, 2007 at 18:56 UTC
      I have been trying to emulate EBCDIC sort in perl but without any success.

    The Wikipedia entry for EBCDIC has a code conversion chart; if all else fails, you could use that.

    I also recommend looking at perldoc perlebcdic.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: EBCDIC sort
by educated_foo (Vicar) on May 21, 2007 at 19:52 UTC
    Since no one else has mentioned it... You're asking for a donation of others' time, with which people on this site are usually more than generous. It is more polite if you (1) take the time to format your question to make it easy for them to read, and (2) demonstrate that you have spent some of your own time on the question, by showing what you've tried so far.

    EDIT: alright, downvote away if you like, but I was trying to be helpful here, not rude. There have been a lot of these "omg pls help kthxbye!" posts recently. I thought it might be a cultural thing -- IRC and most message boards have different standards...

Re: EBCDIC sort
by Anonymous Monk on May 21, 2007 at 20:01 UTC
    Beware that EBCDIC comes in multiple versions, which do not necessarily include the same characters.