nafri has asked for the wisdom of the Perl Monks concerning the following question:

i got perl script not written by me which parses data files and write them to database. One of the file i am having a issue is a csv file.The file is using CRLF terminators. The script when i run throws an error "Something went wrong while parsing/importing " and it displays the line number.I have managed to diagnose there is LF in the middle of line and the normal CRLF in the end. how can i remove this LF.I been told it is LF by notepad++
# parse, process and save data for further usage my $fh = new IO::File; my $ic_failed = 0; my $ic_total = 0; if ($fh->open($feed)) { my $parser = Parse::CSV->new( handle => $fh, csv_attr => $csv_setup, fields => $csv_fields, filter => sub { retrievefiles($_) }, );
I have tried binary mode in $main::csv_setup but cant get it to read it..

Replies are listed 'Best First'.
Re: issue with LF & CRLF
by Loops (Curate) on Aug 06, 2013 at 22:33 UTC

    You don't show the value of $csv_setup, but it needs to include binary => 1, also make sure that $/ = "\r\n"; before you open the feed. That should be enough.

    Parse::CSV uses Text::CSV_XS as the underlying parser. So you can look at the documentation there for which parameters are available to you.

      hi thanks for replying binary => 1 is included.. where do i exactly include. $/ = "\r\n"; i added it before if ($fh->open($feed)) { but it makes no difference

        If you create an example file that contains one good record, and one problematic record, then you'll have a test case that is easier to Dump and share here. Below is a little example that works here and shows an embedded newline being handled correctly by Parse::CSV:

        use Parse::CSV; use Data::Dump 'pp'; my $fh = new IO::File('failing.csv', 'r'); my $fail = do { local $/; <$fh> }; pp $fail; # Print input $fh->seek(0,0); $/ = "\r\n"; my $parser = Parse::CSV->new( handle => $fh, csv_attr => { binary => 1 }, ); print pp $_ while $_ = $parser->fetch; # Print output
        Which prints:
        "a,b,c,d,e,f,g\r\na,b,c,\"kkkk\n\",d,e,f,g\r\n" ["a" .. "g"]["a", "b", "c", "kkkk\n", "d" .. "g"]
Re: issue with LF & CRLF
by Athanasius (Archbishop) on Aug 07, 2013 at 02:53 UTC
    I have managed to diagnose there is LF in the middle of line and the normal CRLF in the end. how can i remove this LF.

    You can use a negative look-behind assertion to remove each LF that is not immediately preceded by a CR:

    #! perl use strict; use warnings; my $string = qq{"Tap Rackmount for 3 units\x0A","USR4500-RMK " +\x0D\x0A}; print "Before:\n>>>$string<<<\n"; # Dec Hex Oct # LF 10 0A 012 # CR 13 0D 015 $string =~ s/ (?<!\x0D) \x0A //gx; print "After:\n>>>$string<<<\n";

    Output:

    12:49 >perl 679_SoPW.pl Before: >>>"Tap Rackmount for 3 units ","USR4500-RMK " <<< After: >>>"Tap Rackmount for 3 units","USR4500-RMK " <<< 12:49 >

    See the section “Look-Around Assertions” in perlre#Extended-Patterns.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: issue with LF & CRLF
by Laurent_R (Canon) on Aug 07, 2013 at 07:03 UTC

    Or you can remove all line separators from the line (and then adding one at the end again if needed):

    $line =~ s/[\r\n]//g;
      raw dump of the file..first line is okay.Second line has the issue
      "Networking Hardware \",\"MDUSR226 \",\"10 Gigabit LR Networking TA +P \",\"USR4516 \",\"USRobotics + \",363.33,454.17,0\r\n\"Networking Hardware \",\"MDUSR227 \",\ +"Tap Rackmount for 3 units\n \",\"USR4500-RMK \" +,\"USRobotics \",43.33,54.17,0\r\n\
      There is \n within the quotes.I thought setting binary to 1 should take care of this..I have tried the other solutions but none of them seem to work..

        Okay, the data looks just as you described it. I placed it into a file named "failing.csv" and ran:

        use Parse::CSV; use Data::Dump 'pp'; my $fh = new IO::File('failing.csv', 'r'); my $fail = do { local $/; <$fh> }; pp $fail; # Print input $fh->seek(0,0); $/ = "\r\n"; my $parser = Parse::CSV->new( handle => $fh, csv_attr => { binary => 1 }, ); pp $_ while $_ = $parser->fetch; # Print output
        Which produced...
        "\"Networking Hardware \",\"MDUSR226 \",\"10 Gigabit LR Networking +TAP \",\"USR4516 \",\"USRobotics + \",363.33,454.17,0\r\n\"Networking Hardware \",\"MDUSR227 \" +,\"Tap Rackmount for 3 units\n \",\"USR4500-RMK +\",\"USRobotics \",43.33,54.17,0\r\n" [ "Networking Hardware ", "MDUSR226 ", "10 Gigabit LR Networking TAP ", "USR4516 ", "USRobotics ", 363.33, 454.17, 0, ] [ "Networking Hardware ", "MDUSR227 ", "Tap Rackmount for 3 units\n ", "USR4500-RMK ", "USRobotics ", 43.33, 54.17, 0, ]

        If this doesn't work for you there is some local configuration issue. Odd.

Re: issue with LF & CRLF
by Anonymous Monk on Aug 06, 2013 at 23:39 UTC
    Data::Dump::dd-er up a few lines of input data please

      sample data from the csv.. these should be two lines but they are split into three. the first line is okay.. but 2 and 3 should be one line

      1)  "Networking Hardware ","MDUSR226    ","10 Gigabit LR Networking TAP            ","USR4516             ","USRobotics                    ",363.33,454.17,0
      2) "Networking Hardware ","MDUSR227 ","Tap Rackmount for 3 units 3) ","USR4500-RMK ","USRobotics + ",43.33,54.17,0
      when i print the dumper i am only geeting line before the error
      $VAR1 = { 'manufacturer_url' => '', 'eancode' => '', 'model' => 'MDUSR226', 'availability' => 1, 'longsummary' => '', 'manufacturer_model' => 'USR4516', 'category_id' => '198', 'icecat_prodid' => 0, 'manufacturer_id' => '49', 'weight' => 0, 'category' => 'Networking Hardware', 'vendor_id' => 7, 'quantity' => '0', 'description' => '10 Gigabit LR Networking TAP', 'image' => 'product_noimage.gif', 'shortdesc' => '', 'manufacturer' => 'US Robotics', 'price' => '377.8632', 'title' => '10 Gigabit LR Networking TAP', 'shortsummary' => '', 'product_url' => '' };

        sample data from ...

        That is not a Dumper-ing of a raw file -- that doesn't preserve the magic bytes

        Try  perl -MData::Dump -MFile::Slurp -e " dd scalar read_file shift, { qw/ binmode :raw / }; "  tenlinefile > tenlinefileasperl.pl