TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

Been a long time since I've been here, but I'm encountering a problem I need some assistance. Hopefully it's something easy, but I just can't see to get the code to work how I want it too.

So, I have this csv file that I want to parse. but the problem is, the person who was updating the spreadsheet put "newlines" in the cells (using alt+return I believe). So, as you can imagine, when I'm trying to parse the file, one record maybe contain several lines in the file.

What I would like to do is replace the new lines (or returns) with
so when I update to our database, and view with a browser, the format is preserves. I'm doing the replacement with  $_ =~ s/(\r+|\n+)/<br>/g;. Which is working, but it's putting
in places that I don't want.

Example, 1. The first line in the csv file is each of the column names, I don't need
there. 2. The start of each new records begins with a "TC-" so I don't need a
at the very end of that record either. Again, the record may be several lines (it varies), until the next "TC-" is found.

Any thoughts ???

Thanks all !!!

Replies are listed 'Best First'.
Re: Parsing a CSV file in a unique way
by Fletch (Bishop) on Oct 11, 2006 at 17:49 UTC

    Use Text::CSV_XS and enable its binary mode (which allows newlines and other things inside fields).

    use Text::CSV_XS (); my $c = Text::CSV_XS->new( { binary => 1 } ); $c->parse( qq{"abc","def\nghi","jkl"\n} ); print ">>", join( "<<\n>>", $c->fields ), "<<\n";

    Update: Twiddled markers around items.

Re: Parsing a CSV file in a unique way
by blue_cowdawg (Monsignor) on Oct 11, 2006 at 17:49 UTC
        Any thoughts ???

    Have you looked at Text::CSV_XS perchance? I have used that module to parse comma delimited files where I had problems before with great success.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Parsing a CSV file in a unique way
by shmem (Chancellor) on Oct 11, 2006 at 18:10 UTC
    Check whether the newlines in fields are really CRLF ("\r\n"); they could be just "\n", while the record separator is "\r\n". Then you could set $/ = "\r\n" and do a s/\n/<br>/g over the fields.

    If that wasn't the case, I'd do something like (untested)

    my $sep = ';'; # field separator my $expected_field_count = scalar split /$sep/, <>; # header line while(<>) { chomp; my @fields = split /$sep/, $_; while(@fields < $expected_field_count) { my $fieldline = <>; my @l = split /$sep/, $_; $fields[-1] .= '<br>'. shift @l; push(@fields,@l) if @l; } ... }

    That works only if your csv has a fixed number of fields. BTW, there are CSV modules out there...

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Howdy!

      Fletch got it in one. I've used the same technique with no problems and great success. CSV is one of those areas where you can screw it up easily and get wrong results. Use the CPAN, Luke!

      yours,
      Michael
        Of course I use CPAN, herveus, and I included a CSV module search link in my previous post. But I'd learn less if I was running up to CPAN to let modules do my work everytime I hit a problem. IMHO, the way is: know how to solve the problem, then use CPAN modules to make your life easier...

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}