in reply to Re: Regexp nightmare
in thread Regexp nightmare with CSV

Text::CSV cannot handle embedded returns, nor is its API consistent with handling them. For a pure Perl solution that does handle embedded returns correctly you can try Text::xSV.

Replies are listed 'Best First'.
Re: Re (tilly) 2: Regexp nightmare
by shotgunefx (Parson) on May 29, 2001 at 00:29 UTC
    Do you mean CR or CRLF in the fields?

    The way I always get around it with Text::CSV_XS is to treat it like an MS-DOS/Win32 text file.
    # Code that writes CSV out. $csvstring=~s/\cM\cJ/\cM/g; print SH $string."\cM\cJ"; # Code that reads Parses CSV { local $/ = "\cM\cJ"; # end of line is now \cM\cJ while (<INFILE>){ if ($csv->parse($line) ){ my @columns=$csv->fields; # Process data here }else{ die "Error Parsing: $csv->error_input\n"; } } }


    -Lee

    "To be civilized is to deny one's nature."
      By default on Windows I handle things in text format so the file will have \r\n and it will be seen as \n. There is an "input filter" option that I use to strip the carriage returns on Linux, which I also use to strip the moronic "smart quotes". On MacOS you would need to play games with $/ or convert the file.

      My handling of these things is intended to be compatible with the output of Microsoft applications. When I want my own file format, I have plenty of options available in which there are no subtle, "Can't handle some data" issues and likewise no subtle, "Works differently on different platforms".

      So my answer is that if you have data in an Access table that includes embedded returns, and you export that table to a .csv file, on Windows Text::xSV will handle the return as exported by Access. On other operating systems there are file format issues which can be solved in a number of ways. The most general being to convert the file to native text format (whatever that is).

        I just checked out Text::xSV, looks pretty solid. I deal with CSV data a lot (Usually coming from Office apps) and have solved the problem many times over. I kept meaning to write a module to handle it but never got around to it. (Writing modules is "somewhat" new to me. Getting the hang of it I think.)

        Do you see any problems with my approach when dealing with CSV only? I've been using it for awhile without any problems.

        -Lee

        "To be civilized is to deny one's nature."
Re: Re (tilly) 2: Regexp nightmare
by Anonymous Monk on May 31, 2001 at 16:47 UTC
    Lovely - does exactly what it says on the tin. I particularly like bind_header() and the ability to extract only those fields you require. Thankyou for that you have solved my prob. Pingu (logged in at work and can't remember my p/word ---