in reply to Text::CSV_XS and encoding

What version of Text::CSV_XS do you have. Try upgrading to latest version 1.36

print $Text::CSV_XS::VERSION;

poj

Replies are listed 'Best First'.
Re^2: Text::CSV_XS and encoding
by PeterKaagman (Beadle) on Sep 16, 2018 at 19:52 UTC

    Nope that is not it... :(

    pkn@precious:~/scripts/perl/csv$ ./parse.pl 1.36 $VAR1 = [];
    did manage to get the "detect_bom => 1" into it with the new version. But as a result it no longer parses the file (correctly).
    pkn@precious:~/scripts/perl/csv$ ./parse.pl 1.36 $VAR1 = [ { 'Naam' => 'Peter', 'Adres' => "Li\x{eb}r", 'Woonplaats' => "\x{f4}lsten" } ];
    withoud bom detection it parses the test file but the encoding is still screwed.

    Got some more reading to do. One thing I came across was:

    my $aoh = csv( in => $FH, headers => 'auto' );
    I thought having "headers => 'auto' in there would trigger an automagic detection of encoding. According to something I read in a man page this is not the case. Now if I could only remember what man page I was reading Text::CSV or Text::CSV_XS :S. Should not be to hard to find again.

Re^2: Text::CSV_XS and encoding
by PeterKaagman (Beadle) on Sep 16, 2018 at 15:58 UTC

    Version 1.21-1 from the Ubunto repro.
    Will reinstall from CPAN and try again.
    Thanks.... did not think of that.

      The detect_bom attribute was added in 1.22. I must admit the the ChangeLog was not very clear about that, as it was part of the new header works and naming all attributes to that didn't look very useful at the time. The docs clearly state:

      BOM (or Byte Order Mark) handling is available only inside the "header" method.

      The BOM-related changes in versions 1.25, 1.31, 1.33, 1.34, and 1.35 make its use more reliable. Note that BOM-handling is unreliable (or nor working at all) in perl-5.6.x.


      Enjoy, Have FUN! H.Merijn

        The thing which triggered my interest in BOM detection was an ugly escape sequence at the very beginning of my original data stream. I suspected it to be a BOM. I was kinda hoping BOM detection would get it out of the way. Otherwise I've got to come up with some other way of stripping it off.