RandomWalk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks
I'm playing with S Cozen's Finance::QIF module and it contains the code
sub parse_file { my ($class, $file) = @_; local $/; open my $f, $file or croak $!; my $data = <$f>; return $class->parse($data); } =head2 parse Creates a C<Finance::QIF> object from a string. =cut sub parse { my ($class, $data) = @_; my @lines = split /\n/, $data; my $type = shift @lines; croak "Can only handle bank accounts right now, not type $type" unless $type eq "!Type:Bank";
...
I try to parse a QIF file but croak:
$ perl -MCarp=verbose qifread.pl at /usr/local/lib/perl5/site_perl/5.8.4/Finance/QIF.pm line 124 Finance::QIF::parse('Finance::QIF', '!Type:Bank\x{d}\x{a}D09/0 +7/2004\x{d}\x{a}PDTE ENERGY PAYMENT\x{d}\x{a}MDTE ENERGY PAY...') cal +led at /usr/local/lib/perl5/site_perl/5.8.4/Finance/QIF.pm line 111 Finance::QIF::parse_file('Finance::QIF', 'dfcu0904.qif') calle +d at qifread.pl line 6
My question is why didn't the file split on "\x{a}"? Isn't it the same as "/n"?

Thanks (as always).

20040909 Edit by ysth: change pre to code tags Sorry for the "chatty" question. Yes, I understand the sub would still choke on the carriage return character even if "\x{a}" were to be treated as "\n". I just thought maybe someone might explain when "\n" and "\x{a}" are the same in Perl and when they're different. Or maybe point out where in the docs to read--my searches haven't turned anything up.

Replies are listed 'Best First'.
Re: end of the line
by graff (Chancellor) on Sep 10, 2004 at 01:55 UTC
    Did your post actually contain all the text from Carp? I'm wondering, because I'd expect that the messages would include the string "Can only handle bank accounts right now..." -- when you run the test without "-MCarp=verbose", I'd expect you get just the "Can only handle..." report.

    Maybe the part you're confused about is that when you use the "verbose" setting on Carp, the error messages include not only what the module author wants to say but also the full content of @_ as it was passed to the failing subroutine.

    So, the string in $data really is being split on "\n" (which really is the same as \x{a}), but the Carp output isn't showing you the results of the split -- it's showing you what args were passed to the subroutine, and (I presume) the error message provided by the author.

    If you were to modify the source code, so that the message would be "Can't handle type <<$type>> things, only !Type:Bank", I think the problem would become clear.

      "... the error messages include ... the full content of @_ as it was passed to the failing subroutine."

      That's where I misunderstood. (I was using -MCarp=verbose because the unremoved /x{d} cut the terse error message off at the first line.)

      Dumb now that I see, which makes me all the more appreciative of your detailed and gentle response.

Re: end of the line
by simonm (Vicar) on Sep 09, 2004 at 22:28 UTC
    It looks like your lines have CRLF line endings -- maybe you copied a Windows text file to a Unix host without ASCII conversion? You may need to strip these out before parsing.
Re: end of the line
by ikegami (Patriarch) on Sep 09, 2004 at 22:40 UTC
    My question is why didn't the file split on "\x{a}"? Isn't it the same as "/n"?

    In *ix, it's the same. Elsewhere, it may not be. But that's not the problem, because what you see displayed is what $data contained when parse() was called, before split is even reached.

    The problem is that every line will have trailing \x{d}. For example, $type will contain !Type:Bank\x{d} when you're expected it to hold !Type:Bank. This is probably because you sent the file over using a method that didn't convert DOS/Windows newlines ("\x0D\x0A") to *ix ones ("\x0A").

    perl -i -pe 's/\x0D\x0A/\x0A/sg' dfcu0904.qif should fix your file