comment on

I have a CSV file full of fax transaction records one per line. I need to sort these transactions by the contents of field 10. I wrote a program to import these records from a file, sort them based on contents of field 10 and export either to a SORTED.TXT file or REJECT.TXT file depending upon what was in field 10. I've used split to import each line as an array and attempted to filter for a specific value in field 10 ("32"). Because one of the fields prior to field 10 included a comma embedded within a quoted string the split pushed the contents of field 10 to field 11. The data in the first two records looks like this:

"SHAUNAGHH","","","Hawaiian Properties, Ltd.","4","911","Cost Recovery
+ Systms",40,6,32,0,1,1,"01/10/2009","13:33","ADM4968A3E0FA34",0x20000
+03,1,0,"EVERYONE",6754
"SHAUNAGHH","","","","","","Cost Recovery Systms",40,10,32,0,1,1,"01/1
+0/2009","13:33","ADM4968A3E0FA34",0x2000004,1,0,"EVERYONE",6754
[download]

There are 21 fields in each line and because field 4 in the first record has a comma in it ("Hawaiian Properties, Ltd."), the split creates an array of 22 records moving field 10 (Record 9) contents to field 11 (Record 10) and it fails to pass the filter properly. I've spend two weeks working on this, looking in the books, working on it again, going online and searching for regex's that I can understand and apply to this problem. I am extremely frustrated by my inability to get this job done. Any guidance would be appreciated.

#!c:\perl\bin\perl.exe -w
use strict;
use diagnostics;

# Open a filehandle READFILE and associate it with my data full of CSV
+ records

open (READFILE,"<s:\\RFax-L7.txt");

# Establish an array @lines and populate it with the lines in the data
+file

my @lines = <READFILE>;

# Open two more files to write to keepers in SORTED rejects in REJECT

    open SORTED,">","s:\\sorted.txt" or die "Couldn't open SORTED.TXT 
+file: $!\n";
    open REJECT,">","s:\\reject.txt" or die "Couldn't open REJECT.TXT 
+file: $!\n";

# preprocess $lines to remove embedded commas in quoted fields
# The following line worked when I declared the contents of $::lines a
+s "Aaron,\"1234 Main St, USA\",555-555-1212"
# it produced the output "Aaron,"1234 Main St USA",555-555-1212"
# but when I moved it into this program incorporating it into the loop
+ it doesn't work!

foreach $::lines (@lines)    {
    $::lines =~ s/("*),(,")/\$1\$2/g;

# Import each line into an array seperating by the commmas.
    @::field = split(/,/, $::lines);
# Test to see if the 10th field contains "32", these we keep
        if ($::field[9] != 32)    {
        print REJECT "* rejected not 32 * $::lines";
        print "* REJECTED * $::lines";
                    } else    {
        print SORTED "$::lines";
        print "\$ KEEPER ==> $::lines";
                        }
                }
close READFILE;
    close SORTED;
        close REJECT;
[download]

I managed to find a code snippet online which seemed to work when using a static $text value defined within the program.

# remove commas from quoted text strings;

$text = "Aaron,\"1223 Main St, USA\",555-555-1212";

print " INBOUND \$text is equal to: $text\n";

$text =~ s/("[\w\ ]*),([\w\ ]+")/\1\2/g;

print " OUTBOUND \$text is equal to: $text";

# This produced output as follows:
# INBOUND $text is equal to: Aaron,"1234 Main St, USA",555-555-1212
# OUTBOUND $text is equal to: Aaron,"1234 Main St USA",555-555-1212
[download]

I tried to move the s/// line into my earlier program as a pre-processing step to remove any commas from within "quoted" string values of $text. But in the first program it fails. I have the Text-CSV_XS package but can't figure out how to use it. I would greatly appreciate getting some direction from more experienced PERL programers. Thanks for taking a look.

In reply to Commas in quoted CSV records by generator

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.