Re^2: Compare and group unmatched records from 2 CSV files together

Replies are listed 'Best First'.
Re^3: Compare and group unmatched records from 2 CSV files together by Tux (Canon) on Jun 24, 2014 at 11:39 UTC
This smells like a binary `\r` is inside an unquoted field and the `binary` option is not passed to the parser. What versions of Text::CSV (and if installed Text::CSV_XS) and Tie::Array::CSV are you using? It might help to also show us a hextdump of the data that fails. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^3: Compare and group unmatched records from 2 CSV files together by Laurent_R (Canon) on Jun 24, 2014 at 22:01 UTC
Or this smells like a file prepared under Windows and used under Linux or Unix. If this is the case, just remove the CR (carriage returns, or \r) characters from your file before processing. One possible command to strip a file from these noisy Windows files under your shell: `perl -pi.bak -e 's/\r//g;' my_file.txt` [download] This removes the Windows CR characters from the file and saves the original fila as my_file.txt.bak (just in case something goes wrong).	[reply] [d/l]
Re^4: Compare and group unmatched records from 2 CSV files together by AppleFritter (Vicar) on Jun 24, 2014 at 22:17 UTC
There's also dos2unix (and unix2dos) for this sort of conversion. Very handy!	[reply]
Re^5: Compare and group unmatched records from 2 CSV files together by Laurent_R (Canon) on Jun 25, 2014 at 06:00 UTC
Right, usually available on Linux, but not on all Unix environments (for example not on our version of AIX, where I made a dos2unix alias which uses the above Perl one-liner).	[reply]
Re^4: Compare and group unmatched records from 2 CSV files together by Tux (Canon) on Jun 25, 2014 at 12:09 UTC
And it will remove `\r` also when correctly quoted: `code,value 1,"abc\rdef"` [download] Is valid CSV (`binary => 1` needed for Text::CSV and Text::CSV_XS), but will change content with that change. Not good! Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^5: Compare and group unmatched records from 2 CSV files together by Laurent_R (Canon) on Jun 25, 2014 at 17:20 UTC
Not quite sure to understand what the point is. I just know that I used to have a regex like this: `s/\r\n/\n/` [download] or even: `s/\r\n$/\n/` [download] but it turned our to insufficient, because the input data sometimes had line ending with "\r\r\n" and sometimes also "\r" in the middle of the line, generating all kinds of probblem. In the end, this regex: `s/\r//g` [download] turned out to solve all the problems. Now, of course, it depends on what your input data is and what you need at the end of the day. With different data and different goals, the regex would probably have to be changed. But that's not hot news, if you need to do data munging, the first prerequisite is to know your data well.	[reply] [d/l] [select]