in reply to Re^2: Emulating 'sort' command through a Perl
in thread Emulating 'sort' command through a Perl

I think I have found 1 bug in the code.

Suppose I am sorting a text file containing following rows by column 3 i.e numeric 2:

FT9mWp<SPACE>d4fgMB<SPACE>gvZRJU<SPACE>XRRu0N

4ewejk<SPACE>pFnjd4<SPACE>ie0hZF<SPACE>pPipQJ

4ewejk<SPACE>4sqprx<SPACE>ie0hZF<SPACE>cqtexi

Fo1OKn<SPACE>qhZPvb<SPACE>qWZPrt<SPACE>ruBObs

The code should give the output:

FT9mWp<SPACE>d4fgMB<SPACE>gvZRJU<SPACE>XRRu0N

4ewejk<SPACE>4sqprx<SPACE>ie0hZF<SPACE>cqtexi

4ewejk<SPACE>pFnjd4<SPACE>ie0hZF<SPACE>pPipQJ

Fo1OKn<SPACE>qhZPvb<SPACE>qWZPrt<SPACE>ruBObs

But it is giving:

FT9mWp<SPACE>d4fgMB<SPACE>gvZRJU<SPACE>XRRu0N

4ewejk<SPACE>pFnjd4<SPACE>ie0hZF<SPACE>pPipQJ

4ewejk<SPACE>4sqprx<SPACE>ie0hZF<SPACE>cqtexi

Fo1OKn<SPACE>qhZPvb<SPACE>qWZPrt<SPACE>ruBObs

Difference can be found out through 2nd & 3rd cell values of 2nd column i.e numeric 1 of both the outputs.

  • Comment on Re^3: Emulating 'sort' command through a Perl

Replies are listed 'Best First'.
Re^4: Emulating 'sort' command through a Perl
by Marshall (Canon) on Oct 30, 2009 at 18:08 UTC
    I think you are confused with column numbering. In Perl, the first column is number 0 and the second is number 1, etc. The the output you show is what the above program does (sorts on ",4th3rd" column). If you want a different column number to sort by adjust the number accordingly. Remember that sort order is according to ASCII values, but the Unix utilities, Excel will also sort alphanumeric fields this same way.

    In short, check you "column math" the program works according to my testing.

      Yes I know the column here is 3rd. Let me explain the bug taking an example.

      Suppose if any 2 row values of 3rd column are same (like in this case ie0hZF)

      In that case it should sort by first column values of those 2 rows.

      Now what if those 2 values are also same as. (like in this case - 4ewejk)

      Then it will try to compare by respective 2nd column values and what if one of the value start with a digit (like in this case 4sqprx) and other with alphabet (like in this case pFnjd4) then number should come first bcoz its ascii value is less.

      Which is not happening here...

        If you want to compare by more than one field, this is done like this:
        (extra line feeds added to prevent word wrap on the Monk page which is shorter than my standard 80 chars).

        The sort{} function can even be a separate subroutine rather than the anonymous one shown here. The main point is that the "less than, equal, greater than" value is the final return statement of this anon sub and "tie breaker" is implemented as a series of logical statements. Zero (equal) is false and if that is the case, then the logical statement keeps evaluating until it runs out of "tie breaker" stuff or reaches a conclusion.

        @input_lines = sort {my ($a_name_last,$a_name_first,$a_city ) = split(/\s+/,$a); my ($b_name_last,$b_name_first,$b_city) = split(/\s+/,$b); $a_name_last cmp $b_name_last or $a_name_first cmp $b_name_first or $a_city cmp $b_city }@input_lines;
        Note is is possible to intermix "cmp" compares (alpha) with the "spaceship" operator <=> which is numeric only. The "standard" ways that these comparison functions work is returning (+1,0,-1) although in Perl it could be that positive value, 0, negative value works? I would stick with just +1,0,-1 if you write your own wild comparison function. I mean you can literally make a compare function where B sorts before A if you want to do that.

        Update: If the context of what is being compared is known, I strongly recommend putting that application context into the code. Like above "Smith Bob Chicago" will come before "Smith Bob Houston" and is easy to see and understand.