in reply to Regexp nightmare with CSV

I would recommend checking out one of the CSV modules on CPAN rather than rolling your own. Possible candidates:

----
Coyote

Replies are listed 'Best First'.
Re (tilly) 2: Regexp nightmare
by tilly (Archbishop) on May 28, 2001 at 21:05 UTC
    Text::CSV cannot handle embedded returns, nor is its API consistent with handling them. For a pure Perl solution that does handle embedded returns correctly you can try Text::xSV.
      Do you mean CR or CRLF in the fields?

      The way I always get around it with Text::CSV_XS is to treat it like an MS-DOS/Win32 text file.
      # Code that writes CSV out. $csvstring=~s/\cM\cJ/\cM/g; print SH $string."\cM\cJ"; # Code that reads Parses CSV { local $/ = "\cM\cJ"; # end of line is now \cM\cJ while (<INFILE>){ if ($csv->parse($line) ){ my @columns=$csv->fields; # Process data here }else{ die "Error Parsing: $csv->error_input\n"; } } }


      -Lee

      "To be civilized is to deny one's nature."
        By default on Windows I handle things in text format so the file will have \r\n and it will be seen as \n. There is an "input filter" option that I use to strip the carriage returns on Linux, which I also use to strip the moronic "smart quotes". On MacOS you would need to play games with $/ or convert the file.

        My handling of these things is intended to be compatible with the output of Microsoft applications. When I want my own file format, I have plenty of options available in which there are no subtle, "Can't handle some data" issues and likewise no subtle, "Works differently on different platforms".

        So my answer is that if you have data in an Access table that includes embedded returns, and you export that table to a .csv file, on Windows Text::xSV will handle the return as exported by Access. On other operating systems there are file format issues which can be solved in a number of ways. The most general being to convert the file to native text format (whatever that is).

      Lovely - does exactly what it says on the tin. I particularly like bind_header() and the ability to extract only those fields you require. Thankyou for that you have solved my prob. Pingu (logged in at work and can't remember my p/word ---