hi guys, i mentioned this problem to somebody today in the chatterbox, but had no luck, therefore i put myself in the hands of the experts. basically I'm trying to parse a csv file and pass the values in a database. even before getting to the database part, however, I'm having issues with the parsing.
my original solution (very sure this is sub-optimal!) was based on a regex expression matching:
#solution 1, using regex and parsewords use Text::ParseWords; my $file = "zs_PS0001_9_epack_2006b.csv"; open(FILE,"<$file") or die("Could not open $file: $!."); $count = 0; # Process the data. while (<FILE>){ # Line used to debug the script taking only a few lines. exit if ($count >= 20); #print "=========================== Record $count\n"; $count++; print "myCounter: ", $count , "\n"; @words= $_ =~ m/"[^"\r\n]*"|[^,\r\n]*/mg; @result = &quotewords(',', 0, @words ); #build the array for the database command #@myArray=($result[0],$result[2],$result[4],$result[6],$result[8],$res +ult[10],$result[12],$result[14],$result[16],$result[18]); my $courseID=$result[0]; my $sessionID=$result[1]; my $userID=$result[2]; my $PersonID=$result[3]; my $mydatetime=$result[4]; my $millisecond=$result[5]; my $loc=$result[6]; my $action=$result[7]; my $page=$result[8]; my $time=$result[9]; print STDOUT "result: ", $courseID,"\n"; print STDOUT "result: ", $sessionID,"\n"; print STDOUT "result: ", $userID,"\n"; print STDOUT "result: ", $PersonID,"\n"; print STDOUT "result: ", $mydatetime,"\n"; print STDOUT "result: ", $millisecond,"\n"; print STDOUT "result: ", $loc,"\n"; print STDOUT "result: ", $action,"\n"; print STDOUT "result: ", $page,"\n"; print STDOUT "result: ", $time,"\n"; print STDOUT "***********\n"; } # end solution 1_______________________
# solution 2, using text:csv_xs use Text::CSV_XS; my $csv = Text::CSV_XS->new(); $count=0; my $file = 'PS0002_9_2006b.txt'; if (defined $ARGV[0]) { $file = $ARGV[0]; } my $sum = 0; open(my $data, '<', $file) or die "Could not open '$file'\n"; while (my $line = <$data>) { chomp $line; exit if ($count >= 20); $count++; if ($csv->parse($line)) { my @columns = $csv->fields(); $sum += $columns[2]; } else { warn "Line could not be parsed: $line\n"; } } print "$sum\n"; # end solution 2_______________________
now, if I'm running any of these with my CSV files, this is what i get (example from the first loop):
myCounter: 1
result: ■z s _ P S 0 0 0 1 _ 9 _ e p a c k
result: 9 1 6 9 4 9 8 1 1 0 2 1
result: g u e s t
result: 9 1 6 9 4 7 6 2 7 0 2 1
result: 2 0 0 6 - 1 0 - 1 1 2 3 : 1 2 : 0 0 . 0 0 0
result: 1 1 6 0 6 0 4 7 2 0 0 0 0
result: l o g o u t
result: l o g i n
result: W e b C T V i s t a s e r v e r
result: 0
***********

I've no idea why there are spaces which are not displayed in the file opened with any application... i had a look at the encoding and attempted to convert the files; it seems that with utf-16 the output looks like the above, but I'm loosing the spaces if I'm re-saving it in utf-8.
the problem I'm having is that there is something wrong which is messing up the string; solution 2 doesn't work at all refusing to parse the csv as lines. depending on the encoding of the file you can see a 'funny' box char at the start above, in utf8 it looks like this:
´╗┐zs_PS0001_9_epack ....(rest of the line)
does anyone have any suggestion? thanks for your help!

by the way, i tried also to substitute the first char with no avail: the char is removed from each line but the first(ie)
my $courseID=substr($result[0],1);

finally, if anyone has enlightment about this, when i went back to edit this post i noticed that what i pasted in from the cmd window shows as a code: more precisely as &#9632; in the first example and as ´&#9559;&#9488; in the second.

lorenzo

In reply to CSV nightmare by lorenzov

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.