in reply to Parsing a text file

On my machine it told me it found data on 1 line, which is what I would expect...

  1. neither elsif (/^M/) nor split /^M/ are doing what you expect. If you want cntrl-M you want \cM not ^M -- though I'd use \x0D, but that's a matter of taste.

  2. especially because something odd is apparently happening, I'd recommend an or die "failed to open $csv: $!" after the open.

  3. you don't really need the for loop, you could simply push @chunks, split(/\cM/, $_). Note that split throws away trailing separators, so if a line ends "\cM\cM\cM" you won't end up with three blank lines -- which may or may not be what you want.

  4. you pass the filename $csv to the read_csv subroutine, but don't use it, which doesn't look right.

But I cannot explain why you seem to get 0 lines... I don't suppose it's possible that you have set $/ to undef ?

Replies are listed 'Best First'.
Re^2: Parsing a text file
by calmthestorm (Acolyte) on Jan 14, 2009 at 00:53 UTC
    thank you oshalla for your time... In response to your bullet points here are my responses.

    1. I have made this change even though I had this part working merely because you are right and ^M's don't always behave the way one would expect.

    2. Also added this change even though I know it was not failing to open the csv file as it was printing the contents. Suffice it to say that it is a good coding practice that I usually use. I just whipped up an example script for perlmonks.org to show my problem.

    3. Must have the for loop as the ^M's are delineating lines of data. Throwing away anything after a ^M is throwing away data I need.

    4. You are right, in this particular example I am not using the passed variable name. It's just a habit

      I realised as I woke up that it could be that the ^M in your code could be actual ^M characters and not the ^M that I had mistaken them for. Now that that's clear, I'd still use an explicit escape sequence e.g. \x0D (or the divinely retro \015).

      I fear I didn't get the point across re the for loop... push takes a LIST of things to push onto the ARRAY. So in push @chunks, split(/\cM/, $_) the entire list returned by split is pushed onto @chunks all in one go.

      The caveat about trailing separators and split can be seen in

      print map("'$_' ", split(/:/, 'a:b:c:::') ), "\n" ; # 'a' 'b' 'c' print map("'$_' ", split(/:/, 'a:b:c:::', -1)), "\n" ; # 'a' 'b' 'c' +'' '' ''
      see split. I fear I confused the issue by addressing two points in one paragraph, for which I will now do penance and hope for forgiveness from the gods of clear English.

      I note that the problem is now fixed. Since the code as posted worked on my machine, I'm particularly curious as to what the problem was.