You said:
in unix I cannot configure the input record separator. So if the data comes with embedded \n charcters in the middle then this breaks the sort.

Is it essential that you preserve these "embedded \n" characters? Whatever the answer, if you can get the unix sort process to create exactly the ordering you want, you could use Perl just to "normalize" the records so that they are all single-line and well-behaved when they go through the unix sort. (But unless you're using Gnu sort, you might still have a problem if some of the records end up being too long -- I think some flavors of unix sort may still have a limit of 1024 bytes per line). (update: For some reason, I feared that solaris might be one such limited flavor, but I was wrong -- I could pump lines of >8200 bytes through /usr/bin/sort with no loss of data. Still, if you're not on linux or solaris 8 or better, test it first.)

You are already handling record-based input by setting $/ in perl, so why not try a pipeline like this (I'm not sure if your reference to "164" was a decimal or octal value -- best to use hex and not worry about this ambiguity; I'll guess that you meant decimal):

perl -pe 'BEGIN {$/="\xa4\n"} s/(?<!\xa4)\n/ /g' | sort ...
Or, if you want to preserve these "extra" line-feeds, replace them with some character or string that doesn't naturally occur in the data; then, after the sort is done, do another one-liner to re-convert these back to "\n".

In reply to Re: perl sort versus Unix sort by graff
in thread perl sort versus Unix sort by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.