in reply to Bug in Sort::Fields?
in thread Split(), Initial Spaces, & a limit?

# Initial spaces in column 1 don't sort the same as...

It's impossible for a column to have initial spaces when spaces is your delimiter. The first field of most of @data is "".

use strict; use warnings; use Sort::Fields; use Data::Dumper; my @data = ( " 56 1752.eps", " 56 2613.eps", " 56 3469.eps", " 8 INPUT000", " 16 INPUT001", " 16 INPUT002", " 96 MTA.1.ps", " 96 MTA.6.ps", " 80 MTA.7.ps", " 32 head.eps", " 8 labs", " 0 lib", " 8 mkexe.bat", " 112 out", " 0 screenshots", "8720 trace.exe", " 16 trace.pl", " 8 tracehosts", "1160 trace.041409.exe", "1160 trace.orig.exe", ); s/^\s+// for @data; my @sorted = fieldsort( ['1n'], @data); print(Dumper(\@sorted));

By the way, you were using grep as map, and you were clobbering @data in the process.

Replies are listed 'Best First'.
Re^2: Bug in Sort::Fields?
by cmv (Chaplain) on Jul 20, 2010 at 18:39 UTC
    ikegami-

    I'm sorry, but I don't believe I understand your point. It seems that all you did to fix the problem was to remove the initial spaces in the original data.

    In my opinion Sort::Fields should sort the data the same way, regardless of where the data is (field 1 or field 2). If you try to numerically sort the output of an 'ls -s' command, you can see the problem clearly:

    use strict; use warnings; use Sort::Fields; use Data::Dumper; my @data = `ls -s`; chomp(@data); my @sorted = fieldsort( ['1n'], @data); print(Dumper(\@sorted));
    This doesn't do what is intended, and is why I made the report to the author. I'm sure I could remove the initial spaces for Data::Dumper, then put them back after it's done, but that doesn't seem right to me.

    -Craig

      regardless of where the data is (field 1 or field 2).

      The key must be either in field 1 or in field 2. It can't vary by row. You're providing

      Field 1 Field 2 Field 3 ----------- ----------- ----------- 56 1752.eps "", "56", "1752.eps" key in 2 1160 trace.exe "1160", "trace.exe" key in 1 123 foo bar.pl "123", "foo", "bar.pl" key in 1

      You need to normalize your fields so that they are the same for each row. I did it by removing the extraneous delimiter in the front of some lines.

      Field 1 Field 2 Field 3 ----------- ----------- ----------- 56 1752.eps "56", "1752.eps" key in 1 1160 trace.exe "1160", "trace.exe" key in 1 123 foo bar.pl "123", "foo", "bar.pl" key in 1

      You could also add an extraneous delimiter to the lines that don't have one.

      Field 1 Field 2 Field 3 Field 4 ----------- ----------- ----------- ----------- 56 1752.eps "", "56", "1752.eps" 1160 trace.exe "", "1160", "trace.exe" 123 foo bar.pl "", "123", "foo", "bar.pl"

      By the way, why not just let ls do the sorting if you're going to use ls?

      Update: Improved visuals.

        Yes, I agree with you that the key cannot vary by row.

        Since this is what Sort::Fields is doing, are you agreeing with me that this is a bug in that module?

        As I implied earlier, the module should sort the exact same data, the same way, no matter where that data shows up (field 1, field 2, or field N).

        Thanks

        -Craig

        update: Ah, sorry, you were still editing, and added more to the reply while I was responding.

        I think my point is that Sort::Fields will "do the right thing" if the data containing initial spaces is in any place other than field 1. If the data is in field 1, then it does something different.

        I believe this is because the author specifically uses /\s+/ as the field delimiter in his code. This works fine for every field except field 1 (as shown by the discussion at the beginning of this post). I would like to see this module do the same thing on the same data, no matter what field it show up in.

        I believe this should be easily fixable, by replacing the /\s+/ with /' '/, as you showed me in an earlier post. I just can't figure out how to do that in his code. I also don't know what side effects that would have.