karpatov has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks, I am am new to Perl and I have been trying to cobble up some scripts for datasets manipulation (for a few hours:-)). I was trying to extract relevent lines from csv file - for numbers it worked great but for columns with strings I kept getting
Use of uniinitialized value in string eq line 59 <ALL>
(one of the input files). But the value was initialized (I mean variable was declared). I guess it must be some really basic mistake, but have no idea whats wrong - please help.

Thank you karpatov

The program should simplify the csv file by removing rows where a value of certain column is not in a subset of possible values.

What the programs does is:
1.read file with a keyID(1st line) and its values=keys(2nd...nth lines)
2. reads a file to be simplified line by line
3. compares values of column whose header equals keyID with keys
4. lines that pass the test are writen to a new file.
My script is as follows:

#! perl -w use strict; use warnings; sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; } my $forextract = shift; # IDs to be extracted, 1 per row , further inf +o tab delimited #header[0] is matched against header of dtp_all my $dtp_all = shift; # csv CANCER60xxx.lis.... my $dtp_my = shift; # csv output print "Warning: The file with items to extract has to have a header - +the first column name will be matched.\n"; open(IDS,"<$forextract") or die "Opening $forextract failed.\n"; open(ALL, "<$dtp_all") or die "Opening $dtp_all failed.\n"; open(MY, ">$dtp_my") or die "Opening $dtp_my failed.\n"; my @ids_table=<IDS>; close(IDS); my $myID=""; my @ids=(); my $count=0; my $colN=-1; foreach my $row (@ids_table){ my @allcolumns = split /\t/, $row; if($count==0){$myID= trim($allcolumns[0]);}else{ $ids[$count] = trim($allcolumns[0]);} $count++; } # copy the header my $line = <ALL>; print MY $line; my @colnames= split /,/, $line; my $count2=0; foreach my $colname (@colnames){ if($colname eq $myID){$colN = $count2;} $count2++; } if ($colN ==-1){print "Column $myID not found.\n"; exit; } # parse the input line by line, when ids matched write the line into t +he output while ($line = <ALL>) { my @columns = split /,/, $line; my $NSC = trim($columns[$colN]); foreach my $id (@ids) { if ($id eq $NSC) { print MY $line; } } } close(MY) or die "Not completed $dtp_my\n"; close(ALL);

Replies are listed 'Best First'.
Re: Uninitialized value in string eq error but I initialized
by FunkyMonk (Bishop) on Jan 05, 2008 at 20:59 UTC
    But the value was initialized (I mean variable was declared).
    Declaring a variable (my $var;) and initialising a variable (my $var = 42;) are not the same thing. When a scalar variable is declared (without an initialiser) its given a value of undef. That's probably the source of your uninitialized value:
    $ perl -we 'my $x; print $x' Use of uninitialized value in print at -e line 1.

    I'm sure that if if you were to post some sample data somebody will offer better targetted help.

Re: Uninitialized value in string eq error but I initialized
by FunkyMonk (Bishop) on Jan 05, 2008 at 23:32 UTC
    The part causing troubles is:
    foreach my $id (@ids) { if ($id eq $NSC) { print MY $line; } }
    and the uninitialized value is *** $id. ***.
    Well. $id comes from @ids. Where does @ids come from?

    You declare @ids about 20 ish lines above, and then populate the array with (reformatted, for clarity)

    foreach my $row (@ids_table) { my @allcolumns = split /\t/, $row; if($count==0) { $myID= trim($allcolumns[0]); } else { $ids[$count] = trim($allcolumns[0]); } $count++; }

    You perform different actions on your input when $count is zero, but you're still adding it to the array with $ids[$count] = ... inside the else block. So, $ids[0] is always going to be undefined.

    If you want to add something to the end of an array, use push, not subscripts.

      That was really the problem. Perl didnt allow me to push into an empty array so I changed the code as follows: $ids[$count-1] = trim($allcolumns[0]) but I will push whenever possible :-). Thanks for help to Funky Monk and all the others.
        Perl didnt allow me to push into an empty array...

        Are you sure?

        my @stuff; push @stuff, 'element'; push @stuff, 'elephant'; local $" = ')('; print "(@stuff)\n";
Re: Uninitialized value in string eq error but I initialized
by pc88mxer (Vicar) on Jan 05, 2008 at 21:49 UTC
    I would bet it's because one of your input lines doesn't have enough columns, and, in particular, doesn't have the $colN-th column.

    When $columns[$colN] is undefined, then trim returns undef which makes $NSC undefined causing the Uninitialized value in string eq error message.

    To check this, right after spliting the columns just put in a check like:

    ... while ($line = <ALL>) { my @columns = split /,/, $line; warn "column $colN not found" unless defined($columns[$colN]); ...
      Thanks for the answer(s).
      definitely the colum is there. It is a middle column and I am getting the error literally thousandtimes (every line I guess).

      The part causing troubles is:

      foreach my $id (@ids) { if ($id eq $NSC) { print MY $line; } }
      and the uninitialized value is *** $id. ***. And it causes the error, I get the result but probably because thousands of error messages it takes longer time. Could the error be caused by the fact that i declared my @ids=()? I will check those libraries definitely, but it would take a begginer some time, so I am still seeking my error. Tx. to all karpatov
        Well I incorported the recommended warn code (and will use the trick latter as well), but it is not executed, and the error is really on every single line :-(.
Re: Uninitialized value in string eq error but I initialized
by hipowls (Curate) on Jan 05, 2008 at 22:37 UTC

    Welcome to the wonderful world of perl;) One of the best features of perl is CPAN http://www.cpan.org. Most common problems will have a solution ready made just waiting for you to find it.

    Have a look at Text::CSV or Text::CSV_XS, the latter uses C for speed but you will need a compiler to install it. They provide all the functionality you need for parsing CSV files, are aware of all the traps and have been heavily tested.

Re: Uninitialized value in string eq error but I initialized
by jwkrahn (Abbot) on Jan 06, 2008 at 00:33 UTC
    Instead of using an array for ids it may be better to use a hash, something like this:
    #! perl -- use strict; use warnings; @ARGV == 3 or die "usage: $0 ID-file CSV-file CSV-output\n"; my $forextract = shift; # IDs to be extracted, 1 per row , further +info tab delimited #header[0] is matched against header of dtp_al +l my $dtp_all = shift; # csv CANCER60xxx.lis.... my $dtp_my = shift; # csv output print "Warning: The file with items to extract has to have a header - +the first column name will be matched.\n"; open my $IDS, '<', $forextract or die "Opening $forextract failed: $!" +; my ( $myID, %ids ); while ( <$IDS> ) { /^\s*([^\t]+?)\s*\t/ or next; if ( $. == 1 ) { $myID = $1; } else { $ids{ $1 } = 1; } } close $IDS; open my $ALL, '<', $dtp_all or die "Opening $dtp_all failed: $!"; open my $MY, '>', $dtp_my or die "Opening $dtp_my failed: $!"; # copy the header my $header = <$ALL>; print $MY $header; my @colnames = split /,/, $header, -1; my $colN; for my $col ( 0 .. $#colnames ) { if ( $colnames[ $col ] eq $myID ) { $colN = $col; last; } } unless ( defined $colN ) { die "Column $myID not found.\n"; } # parse the input line by line, when ids matched write the line into t +he output while ( my $line = <$ALL> ) { my $col = ( split /,/, $line, -1 )[ $colN ]; s/^\s+//, s/\s+$// for $col; if ( exists $ids{ $col } ) { print $MY $line; } } close $MY or die "Not completed $dtp_my\n"; close $ALL;