Hi, I'm seeing some unexpected/undesired behavior from some "sort" code I wrote, and I was hoping the Perl Monks could help me understand/solve my problem.

Sparing what I'm hoping are unnecessary details, I basically have a large tab delimited spreadsheet and I'm trying to sort the rows according to the data in a couple of the columns. Here is the relevant code snippet.

use utf8; use 5.022; use strict; # Open the file for input, discard all the header lines, but stop on a +nd save the column names in an array. # Read all the remaining lines in to an array and close the file. open($fh, "<", $filename) or die "Cannot open \"$filename\" for input: + $!\n"; my $column_line = ""; $column_line = <$fh> while !($column_line =~ /ABC RBC Name/i); chomp(my @columns = split /\t/, $column_line); chomp(my @data_lines = <$fh>); close $fh; # Sort the data lines according to the "Company Name" field and then t +he "Invoice ID" field. my ($company_name_index) = grep { $columns[$_] eq "Company Name" } (0. +.$#columns); my ($invoice_ID_index) = grep { $columns[$_] eq "Invoice ID" } (0..$#c +olumns); @data_lines = sort { my($company_name_a, $invoice_ID_a) = (split /\t/, $a)[$company_nam +e_index, $invoice_ID_index]; my($company_name_b, $invoice_ID_b) = (split /\t/, $b)[$company_nam +e_index, $invoice_ID_index]; fc($company_name_a) cmp fc($company_name_b) or $invoice_ID_a <=> $invoice_ID_b } @data_lines; # Open the file for output, print a new header and the column line to +it, then print the now sorted data to it and close the file. open($fh, ">", $filename) or die "Cannot open \"$filename\" for output +: $!\n"; print $fh "Replacement Header Text Here\n\n$column_line"; print $fh "$_\n" foreach @data_lines;

The unexpected/undesired behavior is that the attempted first level sort by company name puts any company with a comma in its name at the top of the list. So for the example of the following data set of company names:

SEALEVEL SYSTEMS SEALEVEL SYSTEMS, INC. SEBASTIAN COMMUNICATIONS MASQUE SOUND MASSTECH, INC MASTERBILT SE INTERNATIONAL

The sort will give:

MASSTECH, INC SEALEVEL SYSTEMS, INC. MASQUE SOUND MASTERBILT SE INTERNATIONAL SEALEVEL SYSTEMS SEBASTIAN COMMUNICATIONS

When what's actually desired would be this...

MASQUE SOUND MASSTECH, INC MASTERBILT SE INTERNATIONAL SEALEVEL SYSTEMS SEALEVEL SYSTEMS, INC. SEBASTIAN COMMUNICATIONS

I'm hoping I'm just missing something obvious, but why does it always put the ones with commas at the top? Is there a simple way to get it to sort the way I'd like instead?

UPDATE: Ha, well I found the problem after wasting a day on this. Sorry guys, I appreciate all the replies, but this turned out to be a case of Microsoft Excel being the bane of my existence once again. Turns out the database tool that originally compiles all this data puts double quotes around all those names that have a comma in them. But good ol' Excel naturally decides to remove them when you open up a tab delimited file in it. Due to the sheer amount of the data, I'd only been viewing it in Excel to make it easier to sift through and make sure my program was doing the things I thought it was doing. When I copied and pasted certain cells form Excel to notepad to then post here in the forum, there were no double quotes because Excel had already removed them. However, when I just open up the tab delimited text file in a notepad without Excel's interference, "hey, look at that, double quotes are around all those things... Does a double quote come first in asciibetical order? Yup, sure does."

UPDATE: Sorry again guys, I hope I didn't waste too much of your time. Not a very promising first post for me, I know.


In reply to Sorting an array of strings when some of the strings have commas in them? by perldigious

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.