Hello Monks,

I am using PDL in some modules, and I want to import numerical data from a text file using the rcols function found in PDL::IO::Misc. The rcols function will import the text file into piddles that correspond to the columns of the text file.

I stumbled across an issue when I was trying to import data from a tab-delimited text file. Some positions in my input data file will contain blank entries. It seems that handling of blank entries is inconsistent. If a blank entry is in the last column of the file then the $PDL::undefval is used in the piddle. If a blank appears elsewhere, then it appears that a value of "0" is used in the piddle.

Here is an example.

I have two data files. The data.txt file is tab-delimited and does not contain blank entries.
1 6 11 2 7 12 3 8 13 4 9 14 5 10 15
The data_missing.txt file is tab-delimited but contains some blank entries.
1 6 11 2 7 3 8 13 4 14 5 10 15
I use the following script to test the contents of the pdls created by rcols:
#!/usr/bin/env perl use strict; use warnings; use PDL; use PDL::IO::Misc; my $file_name = shift; die 'No file given.' unless defined($file_name); open(my $fh, '<', $file_name) or die "Can not open file: $!"; my @pdls = rcols $fh, { COLSEP => "\t" }; foreach (@pdls) { print "$_\n"; } exit;
The output for <data.txt> is:
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
The output for <data_missing.txt> is:
[1 2 3 4 5] [6 7 8 0 10] [11 0 13 14 15]
So far, so good. However, if I change the value for $PDL::undefval, I get a strange result. First, the default value of $PDL::undefval is zero.
perl -MPDL -E 'say $PDL::undefval' 0
Here is the code with $PDL::undefval set to -999.
#!/usr/bin/env perl use strict; use warnings; use PDL; use PDL::IO::Misc; my $file_name = shift; die 'No file given.' unless defined($file_name); open(my $fh, '<', $file_name) or die "Can not open file: $!"; local $PDL::undefval = -999; my @pdls = rcols $fh, { COLSEP => "\t" }; foreach (@pdls) { print "$_\n"; } exit;
The output for <data.txt> is:
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
The output for <data_missing.txt> is:
[1 2 3 4 5] [6 7 8 0 10] [11 -999 13 14 15]
The value of $PDL::undefval is used in one case (where the 12 was deleted at the end of a row in the input file), but a zero used (where the 9 was deleted in the middle of a row in the input file).

This looks like a bug to me. Does anyone else have experience using this feature of PDL?


In reply to Unexpected behavior when using PDL::IO::Misc::rcols with $PDL::undefval by kevbot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.