Hi all,
Thanks in advance for looking.
I have some csv log files that come with embedded newlines ('\n's within double quotes). When I loop through them with a typical:
while(<$FILE>)The loop sees the embedded newlines as "real" newlines and breaks my CSV line up into pieces. I am running the script on a Linux (RHEL) machine fwiw in regards to file system newlines.
I did a little bit of research and settled (somewhat unwillingly since I can't quite parse it) on using the following one-liner to remove these embedded newlines and it worked.
perl -F'' -0 -ane 'map {$_ eq q(") && {$seen=$seen?0:1}; $seen && $_ eq "\n" &&{$_=" "}; print} @F' filename.csv > filename.csv.tmpSome of you probably already know where I'm headed with this but basically it chokes badly on larger files: throwing up "Out of Memory!" erorrs (the machine I'm running it on only has 4GB of memory).
So, getting down to it, I've been trying to turn the one-liner into a program that will read the file line by line and remove the embedded newlines but I'm running into the primary reason I'm trying to remove them -- that <> cannot distinguish between the embedded newlines and the "real" ones.
Has anyone run into this issue before? Is it possible to look at the file in chunks and remove the embedded ones rather than searching the entire thing at once?
Thanks again for looking!
In reply to Embedded Newlines or Converting One-Liner to Loop by mwb613
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |