fredo2906 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a very simple code which actually needs to read output from a shell command and this command may return lines with actually double-bytes charcters.
my @o = `./convert.sh \"$filename\"`; my @array; foreach my $line (@o) { next if ($line =~ /^\s*$/); my %obj; my $error = getRecordObject($line,\%obj); print "Error code returned is $error for record\n$line\n" if ($err +or<0); push @array, \%obj; } .... doing something print "Total Records : ".scalar(@array)."\n";
The above will return : "Total Records : 12708" However, there are
cat ../file.converted | wc -l 12715

Those 6 records missing are actually lines containing double-bytes char.

20000 1 0 928 20 20131008100121164+09 79a87b78ade9ea2f4c059055ace98046.±¿±Ä»ö̳¶É_<> 902

How to make Perl accept those lines ? Thanks a lot.

Replies are listed 'Best First'.
Re: Double-bytes handling with Perl
by kcott (Archbishop) on Oct 10, 2013 at 08:29 UTC

    G'day fredo2906,

    It sounds like the open pragma would possibly resolve your issue.

    If you need further help, please advise (as already requested) what specific encoding you're dealing with: "double-bytes charcters" [sic] is not particularly helpful.

    Also, a small sample of the data could be useful. Please post this within <pre>...</pre> tags. Yes, that's contrary to other advice you may have read about posting data within <code>...</code> tags; however, Unicode characters are often rendered as character entity references when <code>...</code> tags are used. Your data won't wrap within <pre>...</pre> tags: one of the reasons I stressed "a small sample".

    -- Ken

Re: Double-bytes handling with Perl
by Corion (Patriarch) on Oct 10, 2013 at 07:59 UTC

    Maybe you've already determined the cause of your problems, but there is one potential difference between wc -l and print scalar @array:

    In your code, you skip empty lines:

    next if ($line =~ /^\s*$/);

    ... but wc -l will count these. I would keep separate counts for all types of lines encountered and output these as statistics.

      Yes, i skip empty lines in perl, just in case. But in this example there is no empty lines in the file. Thanks for your concern.

Re: Double-bytes handling with Perl (UTF16 or UTF8)?
by Anonymous Monk on Oct 10, 2013 at 07:54 UTC
      Potentially I am not supposed to know what is the encoding of those characters. Gona try the use encoding "euc-jp". Thanks.
        It doesnt work. Looks like the issue is on the call to shell command. I pipe the command to "tee" then make a diff, and I get 6 lines that are not in the file. Very strange, because if I execute the command directly from my shell, it works perfectly.