hberven1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I'm having some difficulty reading a .csv file in Perl. I am trying to read a number of files containing data in specific columns where some of the columns contain superscript and some other interesting characters. For some reason Perl is not able to read them and just outputs them as a question mark with a black circle behind it. Below are the column headers giving me problems followed by the code I use to draw them out:

405nm Data: Nº of Objects(Nº)

405nm Data: Total 1º Area(µm²)

405nm: single cells: Mean 1º Area(µm²)

405nm: Sub G1: Nº of Objects(Nº)

405nm: Cell: Nº of Objects(Nº)

#!/usr/bin/perl use strict; my $sql; my $line; my @words; my $tag; open (FILE, "test.txt") || "Cannot open file: $!"; while ($line = <FILE>) { print $line; chomp $line; @words = split(/\t\s*/, $line); last; } foreach $tag (@words) { print $tag."\n"; }

Any help would be greatly appreciated!

thanks,

Haakon

Replies are listed 'Best First'.
Re: Odd symbols in read file function?
by jethro (Monsignor) on Jul 08, 2011 at 13:33 UTC

    See open about how to specify the encoding of the files you read or write. I would just blindly try out both

    open(FILE, "<:encoding(UTF-8)", "test.txt") or ... #or open(FILE, "<:encoding(iso-8859-1)", "test.txt") or ...

    and see if that works. If not, you would have to find out the real encoding of your file and use that instead

Re: Odd symbols in read file function?
by Tux (Canon) on Jul 08, 2011 at 14:04 UTC

    Maybe this particular file can be split on TAB correctly, but even then, it is not CSV. (The C stands for Comma). There are however several CSV modules that can perfectly deal with TCV data. You might want to try either of these.

    using DBD::CSV (utf8 encoded tab-separated .txt files):

    use DBI my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_ext => ".txt/r", f_encoding => "utf-8", csv_sep_char => "\t", # more options available RaiseError => 1, }); my $sth = $dbh->prepare ("select * from test"); $sth->execute; while (my @row = $sth->fetrow_array) { # do something with the fields }

    Using Text::CSV_XS:

    use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, sep_char = +> "\t" }); open my $fh, "<:encoding(utf-8)", "test.txt" or die "test.txt: $!"; while (my $row = $csv->getline ($fh)) { # do something with @$row } close $fh;

    Enjoy, Have FUN! H.Merijn
Re: Odd symbols in read file function?
by Anonymous Monk on Jul 08, 2011 at 13:38 UTC

    For some reason Perl is not able to read them and just outputs them as a question mark with a black circle behind it.

    Common misconception, perl doesn't draw anything you see in your console -- the ? is drawn by cmd.exe (or whatever)