Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

dump text file in ASCII and hex

by kevind0718 (Scribe)
on Jun 26, 2008 at 18:26 UTC ( [id://694239]=perlquestion: print w/replies, xml ) Need Help??

kevind0718 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks:

I come to you because I know you are kind and wise.
I have written a number of parsers for CSV files and they have all been straight forward. Just used Text::CSV_PP to parse the lines into an array and away you go.
However the lastest CSV file I have been tasked with parsing has caused me much grief. There is something different about it. If I attempt to parse the file as I received it, the following code returns undefined:
$status = $csv->parse($row);
$col = $csv->fields();

$col is undefined. CSV_PP seems to be getting confused. Which means this loop will also fail:
while (defined($col = $csv->getline($fh) )) {

If I open the same CSV file in MS-Excel and save it out under a different name, as a CSV file, the parsing works just fine.

To get a handle on what is going on here I wrote this bit of code.

while (defined($line = <> )) { print $line . "~~\n"; for ( $i=0; $i <length($line); $i++) { $char = substr($line, $i,1); $hex = sprintf("%1x", $char); print $char . "\t". $hex . "\n"; } #for print "--new line--\n"; } #while
What I am trying to do is read the text file in and then print out the line. Then print each character, every character including non-printables, and the corresponding hex value. This line is failing:
$hex = sprintf("%1x", $char);

just want to get the hex value of the character. But my Perl is not strong enough.

your kind assistance is requested.

kd

Replies are listed 'Best First'.
Re: dump text file in ASCII and hex
by zentara (Archbishop) on Jun 26, 2008 at 18:31 UTC
    This is what I use to dump ascii and hex, it gives a vertical hex under eash ascii letter, making it easy to read. Just feed it a file as an argument. I still get confused trying to figure out the extra Perl options...... but it works. :-)
    #!/usr/bin/perl -wnl012 # Prints the contents of a file a line at a time # followed by the ASCII value of each character in vertical columns. # Useful for debugging. # If no filename is specified then input is read from the keyboard. # Version 1.00 Ian Howlett ian@ian-howlett.com 6 July 2001 # Version 1.10 James Yolkowski ajy@sentex.net 8 July 2001 print; # Print the line we've just read @hexvals = map {sprintf "%02X", ord $_} split //; # Get hex value of e +ach char for $a (0, 1) {print map {substr $_, $a, 1} @hexvals} # Print the hex +values. print "\n";

    I'm not really a human, but I play one on earth CandyGram for Mongo
Re: dump text file in ASCII and hex
by tachyon-II (Chaplain) on Jun 26, 2008 at 18:35 UTC

    You need 2 hex chars to encode 256 chars. You also want the char number given by ord so this should work:

    $hex = sprintf "%02x", ord($char); # but why not just do it all in one line printf "%s %02x\n", $char, ord($char);
      Thank you monks, your suggestion helped me learn more about the deatils of the CSV file.

      I thought that there might be a strange end-of-line code or something like that. But that does not seem to be the case. The best I can determine is that Text::CSV_PP is having an issue with the double quotes("). Please take a look at the following test data
      "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N","",1,"C",710.964086349999,710.964086349999,710.964086 +35,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","G1150G111","","2763958","BMG1150G1 +116","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,2 +25212,"ACN UN","ACN.N","",1,"C",-699.4375011625,-699.4375011625,-699. +4375011625,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","004930202","","2575818","US0049302 +021","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,2252 +12,"ATVI UW","ATVI.OQ","",1,"C",819.153462549999,819.153462549999,819 +.15346255,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","00817Y108","","2695921","US00817Y1 +082","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AE +T UN","AET.N","",1,"C",2831.9813292375,2831.9813292375,2831.981329237 +5,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","00846U101","","2520153","US00846U1 +016","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A + UN","A.N","",1,"C",-45.9117876750024,-45.9117876750024,-45.911787675 +,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","008916108","","2015530","CA0089161 +081","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"A +GU UN","AGU.N","",1,"C",4754.379720375,4754.379720375,4754.379720375, +,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","00971T101","","2507457","US00971T1 +016","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718 +,225212,"AKAM UW","AKAM.OQ","",1,"C",2580.1367137875,2580.1367137875, +2580.1367137875,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","01741R102","","2526117","US01741R1 +023","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,19 +610718,225212,"ATI UN","ATI.N","",1,"C",-655.71107175,-655.71107175,- +655.71107175,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","018804104","","2017677","US0188041 +042","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,19610 +718,225212,"ATK UN","ATK.N","",1,"C",314.388352562499,314.38835256249 +9,314.3883525625,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","019589308","","2039831","US0195893 +088","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,19 +610718,225212,"AW UN","AW.N","",1,"C",538.672694612502,538.6726946125 +02,538.6726946125,,,2.45938,,,,"R" test lines, without double double quotes "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N",,1,"C",710.964086349999,710.964086349999,710.96408635 +,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","G1150G111",,"2763958","BMG1150G111 +6","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,225 +212,"ACN UN","ACN.N",,1,"C",-699.4375011625,-699.4375011625,-699.4375 +011625,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","004930202",,"2575818","US004930202 +1","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,225212 +,"ATVI UW","ATVI.OQ",,1,"C",819.153462549999,819.153462549999,819.153 +46255,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","00817Y108",,"2695921","US00817Y108 +2","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AET +UN","AET.N",,1,"C",2831.9813292375,2831.9813292375,2831.9813292375,,, +2.45938,,,,"R" "fred1234","bedrock quary","L","t","00846U101",,"2520153","US00846U101 +6","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A U +N","A.N",,1,"C",-45.9117876750024,-45.9117876750024,-45.911787675,,,2 +.45938,,,,"R" "fred1234","bedrock quary","L","t","008916108",,"2015530","CA008916108 +1","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"AGU + UN","AGU.N",,1,"C",4754.379720375,4754.379720375,4754.379720375,,,2. +45938,,,,"R" "fred1234","bedrock quary","S","t","00971T101",,"2507457","US00971T101 +6","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718,2 +25212,"AKAM UW","AKAM.OQ",,1,"C",2580.1367137875,2580.1367137875,2580 +.1367137875,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","01741R102",,"2526117","US01741R102 +3","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"ATI UN","ATI.N",,1,"C",-655.71107175,-655.71107175,-655. +71107175,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","018804104",,"2017677","US018804104 +2","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,1961071 +8,225212,"ATK UN","ATK.N",,1,"C",314.388352562499,314.388352562499,31 +4.3883525625,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","019589308",,"2039831","US019589308 +8","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"AW UN","AW.N",,1,"C",538.672694612502,538.672694612502,5 +38.6726946125,,,2.45938,,,,"R" test line, without any double quotes fred1234,bedrock quary,S,t,88579Y101,4851,2595708,US88579Y1010,MMM,3M +CO,USD,USD,SB7,1,19610718,19610718,225212,MMM UN,MMM.N,,1,C,710.96408 +6349999,710.964086349999,710.96408635,,,2.45938,,,,R fred1234,bedrock quary,L,t,G1150G111,,2763958,BMG1150G1116,ACN,ACCENTU +RE LTD-CL A,USD,USD,SB7,19610718,19610718,225212,ACN UN,ACN.N,,1,C,-6 +99.4375011625,-699.4375011625,-699.4375011625,,,2.45938,,,,R fred1234,bedrock quary,L,t,004930202,,2575818,US0049302021,ATVI,ACTIVI +SION INC,USD,USD,SB7,19610718,19610718,225212,ATVI UW,ATVI.OQ,,1,C,81 +9.153462549999,819.153462549999,819.15346255,,,2.45938,,,,R fred1234,bedrock quary,S,t,00817Y108,,2695921,US00817Y1082,AET,AETNA I +NC,USD,USD,SB7,19610718,19610718,225212,AET UN,AET.N,,1,C,2831.981329 +2375,2831.9813292375,2831.9813292375,,,2.45938,,,,R fred1234,bedrock quary,L,t,00846U101,,2520153,US00846U1016,A,AGILENT T +ECHNOLOGIES INC,USD,19610718,19610718,225212,A UN,A.N,,1,C,-45.911787 +6750024,-45.9117876750024,-45.911787675,,,2.45938,,,,R fred1234,bedrock quary,L,t,008916108,,2015530,CA0089161081,AGU,AGRIUM +INC,USD,USD,SB7,19610718,19610718,225212,AGU UN,AGU.N,,1,C,4754.37972 +0375,4754.379720375,4754.379720375,,,2.45938,,,,R fred1234,bedrock quary,S,t,00971T101,,2507457,US00971T1016,AKAM,AKAMAI + TECHNOLOGIES,USD,USD,SB7,19610718,19610718,225212,AKAM UW,AKAM.OQ,,1 +,C,2580.1367137875,2580.1367137875,2580.1367137875,,,2.45938,,,,R fred1234,bedrock quary,L,t,01741R102,,2526117,US01741R1023,ATI,ALLEGHE +NY TECHNOLOGIES INC,USD,USD,SB7,19610718,19610718,225212,ATI UN,ATI.N +,,1,C,-655.71107175,-655.71107175,-655.71107175,,,2.45938,,,,R fred1234,bedrock quary,S,t,018804104,,2017677,US0188041042,ATK,ALLIANT + TECHSYSTEMS INC,USD,USD,SB7,19610718,19610718,225212,ATK UN,ATK.N,,1 +,C,314.388352562499,314.388352562499,314.3883525625,,,2.45938,,,,R fred1234,bedrock quary,S,t,019589308,,2039831,US0195893088,AW,ALLIED W +ASTE INDUSTRIES INC,USD,USD,SB7,19610718,19610718,225212,AW UN,AW.N,, +1,C,538.672694612502,538.672694612502,538.6726946125,,,2.45938,,,,R
      If I run the above data through the following code, using the command: perl -w dumpascii2hex.pl testpos_b2.csv > testpos_b.2out.txt
      use Text::CSV_PP; use Data::Dumper; $csv = Text::CSV_PP->new(); # create a new CSV parser object while (defined($line = <> )) { print $line . "~~\n"; #**for ( $i=0; $i <length($line); $i++) { #** $char = substr($line, $i,1); #** $hex = sprintf("%02x", ord($char)); #** print $char . "\t". $hex . "\n"; #**} #for $status = $csv->parse($line); @col = $csv->fields(); print Dumper @col; print "--new line--\n"; } #while

      I get the following
      "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N","",1,"C",710.964086349999,710.964086349999,710.964086 +35,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","G1150G111","","2763958","BMG1150G1 +116","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,2 +25212,"ACN UN","ACN.N","",1,"C",-699.4375011625,-699.4375011625,-699. +4375011625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","004930202","","2575818","US0049302 +021","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,2252 +12,"ATVI UW","ATVI.OQ","",1,"C",819.153462549999,819.153462549999,819 +.15346255,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00817Y108","","2695921","US00817Y1 +082","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AE +T UN","AET.N","",1,"C",2831.9813292375,2831.9813292375,2831.981329237 +5,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","00846U101","","2520153","US00846U1 +016","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A + UN","A.N","",1,"C",-45.9117876750024,-45.9117876750024,-45.911787675 +,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","008916108","","2015530","CA0089161 +081","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"A +GU UN","AGU.N","",1,"C",4754.379720375,4754.379720375,4754.379720375, +,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00971T101","","2507457","US00971T1 +016","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718 +,225212,"AKAM UW","AKAM.OQ","",1,"C",2580.1367137875,2580.1367137875, +2580.1367137875,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","01741R102","","2526117","US01741R1 +023","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,19 +610718,225212,"ATI UN","ATI.N","",1,"C",-655.71107175,-655.71107175,- +655.71107175,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","018804104","","2017677","US0188041 +042","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,19610 +718,225212,"ATK UN","ATK.N","",1,"C",314.388352562499,314.38835256249 +9,314.3883525625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","019589308","","2039831","US0195893 +088","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,19 +610718,225212,"AW UN","AW.N","",1,"C",538.672694612502,538.6726946125 +02,538.6726946125,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- test lines, without double double quotes ~~ $VAR1 = 'test lines'; $VAR2 = ' without double double quotes'; --new line-- "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N",,1,"C",710.964086349999,710.964086349999,710.96408635 +,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","G1150G111",,"2763958","BMG1150G111 +6","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,225 +212,"ACN UN","ACN.N",,1,"C",-699.4375011625,-699.4375011625,-699.4375 +011625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","004930202",,"2575818","US004930202 +1","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,225212 +,"ATVI UW","ATVI.OQ",,1,"C",819.153462549999,819.153462549999,819.153 +46255,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00817Y108",,"2695921","US00817Y108 +2","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AET +UN","AET.N",,1,"C",2831.9813292375,2831.9813292375,2831.9813292375,,, +2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","00846U101",,"2520153","US00846U101 +6","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A U +N","A.N",,1,"C",-45.9117876750024,-45.9117876750024,-45.911787675,,,2 +.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","008916108",,"2015530","CA008916108 +1","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"AGU + UN","AGU.N",,1,"C",4754.379720375,4754.379720375,4754.379720375,,,2. +45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00971T101",,"2507457","US00971T101 +6","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718,2 +25212,"AKAM UW","AKAM.OQ",,1,"C",2580.1367137875,2580.1367137875,2580 +.1367137875,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","01741R102",,"2526117","US01741R102 +3","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"ATI UN","ATI.N",,1,"C",-655.71107175,-655.71107175,-655. +71107175,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","018804104",,"2017677","US018804104 +2","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,1961071 +8,225212,"ATK UN","ATK.N",,1,"C",314.388352562499,314.388352562499,31 +4.3883525625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","019589308",,"2039831","US019589308 +8","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"AW UN","AW.N",,1,"C",538.672694612502,538.672694612502,5 +38.6726946125,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- test line, without any double quotes ~~ $VAR1 = 'test line'; $VAR2 = ' without any double quotes'; --new line-- fred1234,bedrock quary,S,t,88579Y101,4851,2595708,US88579Y1010,MMM,3M +CO,USD,USD,SB7,1,19610718,19610718,225212,MMM UN,MMM.N,,1,C,710.96408 +6349999,710.964086349999,710.96408635,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '88579Y101'; $VAR6 = '4851'; $VAR7 = '2595708'; $VAR8 = 'US88579Y1010'; $VAR9 = 'MMM'; $VAR10 = '3M CO'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '1'; $VAR15 = '19610718'; $VAR16 = '19610718'; $VAR17 = '225212'; $VAR18 = 'MMM UN'; $VAR19 = 'MMM.N'; $VAR20 = ''; $VAR21 = '1'; $VAR22 = 'C'; $VAR23 = '710.964086349999'; $VAR24 = '710.964086349999'; $VAR25 = '710.96408635'; $VAR26 = ''; $VAR27 = ''; $VAR28 = '2.45938'; $VAR29 = ''; $VAR30 = ''; $VAR31 = ''; $VAR32 = 'R '; --new line-- fred1234,bedrock quary,L,t,G1150G111,,2763958,BMG1150G1116,ACN,ACCENTU +RE LTD-CL A,USD,USD,SB7,19610718,19610718,225212,ACN UN,ACN.N,,1,C,-6 +99.4375011625,-699.4375011625,-699.4375011625,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = 'G1150G111'; $VAR6 = ''; $VAR7 = '2763958'; $VAR8 = 'BMG1150G1116'; $VAR9 = 'ACN'; $VAR10 = 'ACCENTURE LTD-CL A'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ACN UN'; $VAR18 = 'ACN.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '-699.4375011625'; $VAR23 = '-699.4375011625'; $VAR24 = '-699.4375011625'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,L,t,004930202,,2575818,US0049302021,ATVI,ACTIVI +SION INC,USD,USD,SB7,19610718,19610718,225212,ATVI UW,ATVI.OQ,,1,C,81 +9.153462549999,819.153462549999,819.15346255,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '004930202'; $VAR6 = ''; $VAR7 = '2575818'; $VAR8 = 'US0049302021'; $VAR9 = 'ATVI'; $VAR10 = 'ACTIVISION INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ATVI UW'; $VAR18 = 'ATVI.OQ'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '819.153462549999'; $VAR23 = '819.153462549999'; $VAR24 = '819.15346255'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,00817Y108,,2695921,US00817Y1082,AET,AETNA I +NC,USD,USD,SB7,19610718,19610718,225212,AET UN,AET.N,,1,C,2831.981329 +2375,2831.9813292375,2831.9813292375,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '00817Y108'; $VAR6 = ''; $VAR7 = '2695921'; $VAR8 = 'US00817Y1082'; $VAR9 = 'AET'; $VAR10 = 'AETNA INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AET UN'; $VAR18 = 'AET.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '2831.9813292375'; $VAR23 = '2831.9813292375'; $VAR24 = '2831.9813292375'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,L,t,00846U101,,2520153,US00846U1016,A,AGILENT T +ECHNOLOGIES INC,USD,19610718,19610718,225212,A UN,A.N,,1,C,-45.911787 +6750024,-45.9117876750024,-45.911787675,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '00846U101'; $VAR6 = ''; $VAR7 = '2520153'; $VAR8 = 'US00846U1016'; $VAR9 = 'A'; $VAR10 = 'AGILENT TECHNOLOGIES INC'; $VAR11 = 'USD'; $VAR12 = '19610718'; $VAR13 = '19610718'; $VAR14 = '225212'; $VAR15 = 'A UN'; $VAR16 = 'A.N'; $VAR17 = ''; $VAR18 = '1'; $VAR19 = 'C'; $VAR20 = '-45.9117876750024'; $VAR21 = '-45.9117876750024'; $VAR22 = '-45.911787675'; $VAR23 = ''; $VAR24 = ''; $VAR25 = '2.45938'; $VAR26 = ''; $VAR27 = ''; $VAR28 = ''; $VAR29 = 'R '; --new line-- fred1234,bedrock quary,L,t,008916108,,2015530,CA0089161081,AGU,AGRIUM +INC,USD,USD,SB7,19610718,19610718,225212,AGU UN,AGU.N,,1,C,4754.37972 +0375,4754.379720375,4754.379720375,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '008916108'; $VAR6 = ''; $VAR7 = '2015530'; $VAR8 = 'CA0089161081'; $VAR9 = 'AGU'; $VAR10 = 'AGRIUM INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AGU UN'; $VAR18 = 'AGU.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '4754.379720375'; $VAR23 = '4754.379720375'; $VAR24 = '4754.379720375'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,00971T101,,2507457,US00971T1016,AKAM,AKAMAI + TECHNOLOGIES,USD,USD,SB7,19610718,19610718,225212,AKAM UW,AKAM.OQ,,1 +,C,2580.1367137875,2580.1367137875,2580.1367137875,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '00971T101'; $VAR6 = ''; $VAR7 = '2507457'; $VAR8 = 'US00971T1016'; $VAR9 = 'AKAM'; $VAR10 = 'AKAMAI TECHNOLOGIES'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AKAM UW'; $VAR18 = 'AKAM.OQ'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '2580.1367137875'; $VAR23 = '2580.1367137875'; $VAR24 = '2580.1367137875'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,L,t,01741R102,,2526117,US01741R1023,ATI,ALLEGHE +NY TECHNOLOGIES INC,USD,USD,SB7,19610718,19610718,225212,ATI UN,ATI.N +,,1,C,-655.71107175,-655.71107175,-655.71107175,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '01741R102'; $VAR6 = ''; $VAR7 = '2526117'; $VAR8 = 'US01741R1023'; $VAR9 = 'ATI'; $VAR10 = 'ALLEGHENY TECHNOLOGIES INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ATI UN'; $VAR18 = 'ATI.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '-655.71107175'; $VAR23 = '-655.71107175'; $VAR24 = '-655.71107175'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,018804104,,2017677,US0188041042,ATK,ALLIANT + TECHSYSTEMS INC,USD,USD,SB7,19610718,19610718,225212,ATK UN,ATK.N,,1 +,C,314.388352562499,314.388352562499,314.3883525625,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '018804104'; $VAR6 = ''; $VAR7 = '2017677'; $VAR8 = 'US0188041042'; $VAR9 = 'ATK'; $VAR10 = 'ALLIANT TECHSYSTEMS INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ATK UN'; $VAR18 = 'ATK.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '314.388352562499'; $VAR23 = '314.388352562499'; $VAR24 = '314.3883525625'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,019589308,,2039831,US0195893088,AW,ALLIED W +ASTE INDUSTRIES INC,USD,USD,SB7,19610718,19610718,225212,AW UN,AW.N,, +1,C,538.672694612502,538.672694612502,538.6726946125,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '019589308'; $VAR6 = ''; $VAR7 = '2039831'; $VAR8 = 'US0195893088'; $VAR9 = 'AW'; $VAR10 = 'ALLIED WASTE INDUSTRIES INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AW UN'; $VAR18 = 'AW.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '538.672694612502'; $VAR23 = '538.672694612502'; $VAR24 = '538.6726946125'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line--


      The only time the parsing works is when there are no double quotes in the text. This is confusing me. Because I parse six other files from this source with the same CSV encoding. Ie Strings are contained in double quotes, and there are no issues.

      I need the help of someone with more advanced Perl skills to tell me where my mistake is.

      Many thanks

      kd
Re: dump text file in ASCII and hex
by psini (Deacon) on Jun 26, 2008 at 18:34 UTC

    Should it not be:

    $hex = sprintf("%2x", $char);

    with 2 instead of 1?

    Update: Yes, of course, %02x

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: dump text file in ASCII and hex
by ganeshk (Monk) on Jun 26, 2008 at 21:08 UTC

    I just remembered the hexdump utility in Unix for doing this sort of thing. You can probably try the Data::Hexdump or Data::Hexdumper if you want to do it through perl.


    Thanks,
    Ganesh
Re: dump text file in ASCII and hex
by oko1 (Deacon) on Jun 27, 2008 at 04:16 UTC

    As the editor at the Linux Gazette, I get a lot of articles, code, etc. that's been written in every possible variety of editor out there and sometimes contains weird and invisible characters. My solution was to code up a script that I called "weirdchar" that will display and highlight the characters and their ASCII values along with the line (and line number) where they occur. It's solved and prevented a huge variety of problems for me over the years.

    Note: this is *nix-specific (works in Linux and Solaris), since it uses an external prog.

    #!/usr/bin/perl -w # Created by Ben Okopnik on Tue Feb 15 18:48:24 EST 2005 # Weird character highlighter my $a=`/usr/bin/tput -T $ENV{TERM} smso`; # Start 'standout' mode my $b=`/usr/bin/tput -T $ENV{TERM} rmso`; # End 'standout' mode my $re = qr/([^\011\012\015\040-\176])/; # "Inverted" list of valid c +hars while (<>){ print "Line $.: $_" if s/$re/"$a\\" . sprintf( "%03o", ord $1 ) . +$b/eg; }
    
    -- 
    Human history becomes more and more a race between education and catastrophe. -- HG Wells
    
      To look for nonprinting chars you can also run the file through cat -vt and diff it with the original.
Re: dump text file in ASCII and hex
by DrHyde (Prior) on Jun 27, 2008 at 10:38 UTC

    I'll wager a small amount that this will fix it for you:

    my $csv = Text::CSV_PP->new({binary => 1});

    Also it doesn't look like you're checking the $status returned from the parse() method. If that returns false, then the error_input() and error_diag() methods may be useful.

Re: dump text file in ASCII and hex
by salva (Canon) on Jun 27, 2008 at 12:59 UTC
    I use this:
    sub hexdump { no warnings qw(uninitialized); my $data = shift; while ($data =~ /(.{1,32})/smg) { my $line=$1; my @c= (( map { sprintf "%02x",$_ } unpack('C*', $line)), ((" ") x 32))[0..31]; $line=~s/(.)/ my $c=$1; unpack("c",$c)>=32 ? $c : '.' /egms; print join(" ", @c, '|', $line), "\n"; } }
Re: dump text file in ASCII and hex
by Rudif (Hermit) on Jun 28, 2008 at 22:07 UTC
    kevind0718

    Text::CSV_PP manual is your friend. It mentions a method that can be helpful :

    $csv->error_diag()
    When I added a call and a print like this
    $status = $csv->parse($line); printf "status=%d\n", $status; unless ($status) { printf "error=%d %s\n", ( $csv->error_diag()); } @col = $csv->fields(); print Dumper \@col;
    I obtained this printout for your first test line :
    "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N","",1,"C",710.964086349999,710.964086349999,710.964086 +35,,,2.45938,,,,"R" ~~ status=0 error=2027 EIQ - Quoted field not terminated $VAR1 = [ undef ]; --new line--
    You will notice that there is a space after the last doublequote character, and this is what the parser does not like.

    Below, I added a kludge that makes the problem go away.

    #! perl -w use strict; use Text::CSV_PP; use Data::Dumper; $csv = Text::CSV_PP->new(); # create a new CSV parser object while (defined($line = <> )) { chomp $line; $line =~ s/\" $/\"/; ### kludge to remove space after the last dou +blequote, if any print $line . "~~\n"; $status = $csv->parse($line); printf "status=%d\n", $status; unless ($status) { printf "error=%d %s\n", ( $csv->error_diag ()); } @col = $csv->fields(); print Dumper \@col; print "--new line--\n"; } #while
    HTH

    Rudif

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://694239]
Approved by almut
Front-paged by DrHyde
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-25 19:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found