SixTheCat has asked for the wisdom of the Perl Monks concerning the following question:

Oh monks of the Holy Order of Perl. I bring grave news of blasphemy in my variables! I am new to perl (only a week in) so it's very likely a problem with me but... I'm trying to write a simple script to open a csv file and get two columns which are then used to convert one name to the other. The problem is that while the line is read correctly from the file and seems to separate correctly using the split function, the variables themselves act wonky after. If I print (or say) both of the variables in the same string, if one variable is displayed first the string displays fine but if the other is output first, it doesn't show up. I don't see any possible hidden terminating characters in the CSV file that would cause this problem. Any ideas? The csv file contains the following data:

rs6413438,CYP2C19_10

rs4986910,CYP2C19_20

The output looks something like this

--------- Converting Star Allele references to rs numbers ---------

Current input line is

Index 0 is rs6413438

Index 0 is rs6413438 is stored as rs6413438 <-- Correct display

Index 1 is CYP2C19_10

Index 1 is CYP2C19_10

is stored as CYP2C19_10 <--- WTF, where is the first variable?

Comparing CYP2C19_10

and CYP2C19_10

Comparing CYP2C19_10

and CYP2C19_12

Current input line is

Index 0 is rs4986910

Index 0 is rs4986910 is stored as rs4986910 <-- Correct display

Index 1 is CYP2C19_20

Index 1 is CYP2C19_20

is stored as CYP2C19_20 <--- WTF

Comparing CYP2C19_20

and CYP2C19_10

Comparing CYP2C19_20

and CYP2C19_12

-------------- Done converting Star Allele references -------------

#!perl use strict; use 5.010; my $STARFile; + # File handle to reference file my @Stars; $Stars[0] = "CYP2C19_10"; + # Mock array of values to cross reference $Stars[1] = "CYP2C19_12"; + # if(@Stars==0){return;} + # If no Star Alleles were specified then no n +eed to do this so return to the main body if(! open $STARFile,"<","test.csv"){die "Reference file could not be f +ound or could not be opened.";} # Open the Star reference file to +prepare to convert information and store the file handle to $STARFile +. print "Converting specified Star Designations to SNPs..."; # The conversion table is opened so convert the Star name to rs number +s and then store the rs numbers to the @SNPs array and the correspond +ing Star name to the @Stars array at the same index. my @SNPs; my @StarsCon; my $RefIndex; + # Holds the line in the reference table file my $StarIndex; + # Holds the index of the @Stars Array that is + being checked my $tmpSNPIndex; + # Holds the index in the @SNPs array that we ar +e comparing my $tmpStar; + # Holds the Star Allele name my $tmpRS; + # Holds the SNP's rs number my @tmpConv; + # Holds the split Star and rs numbers say "\n--------- Converting Star Allele references to rs numbers ----- +----"; while (<$STARFile>){ + # Input a line from the database and as long a +s we haven't reached the end of the file chomp; + # Remove the trailing newline say "Current input line is @_"; @tmpConv = split ",",$_; + # Split the CSV line from the reference table s +uch that @tmpConv[0] = Star name and @tmpConv[1] = rs number $tmpStar = $tmpConv [1]; $tmpRS = $tmpConv[0]; say "Index 0 is $tmpConv[0]"; + # Displays correctly say "Index 0 is $tmpConv[0] is stored as $tmpRS"; + # Displays correctly say "Index 1 is $tmpConv[1]"; + # Displays correctly say "Index 1 is $tmpConv[1] is stored as $tmpStar"; + # Displays INcorrectly for($StarIndex=0;$StarIndex<@Stars;$StarIndex++){ say "Comparing $tmpStar and $Stars[$StarIndex]"; if($tmpStar eq $Stars[$StarIndex]){ + # If the current line of the database file c +ontains the Star Allele rs number then $tmpSNPIndex = @SNPs; + # Get the number of entries in the @SNPs array +. say "1. $tmpRS was converted from $tmpStar"; say "2. $tmpStar was converted to $tmpRS"; say "3. $tmpStar was converted to $tmpRS"; say "4. $tmpRS was converted from $tmpStar"; push @StarsCon, $tmpStar; + # Add the Star allele name to the @StarsCon ar +ray push @SNPs, $tmpRS; + # Add the new rs number to the @SNPs array if(@Stars>0){ + # If we have more than one SNP then splice @Stars,$StarIndex,1; + # and @Stars array }else{ + # Otherwise Pop off the last one pop @Stars; + # } last; + # Exit the for loop } } if(! @Stars>0){last;} + # If that was the last entry then stop searchi +ng } say "-------------- Done converting Star Allele references ----------- +--"; if(@Stars>0){ + # If any SNPs have not been found then say "\n"."Conversions not completed: @Stars."; + # Inform the user which ones were not found }else{ + # Otherwise say "\n"."All conversions successful."; + # Inform the user that all were found } close $STARFile; + # Close the reference file print "Done!\n";

Replies are listed 'Best First'.
Re: Variable blasphemy
by BrowserUk (Patriarch) on Jul 24, 2015 at 15:04 UTC

    This: @tmpConv[0] should be this: $tmpConv[0]. (If you had use warnings, it would have told you that.)

    You use @array to refer to the whole array, and $array[i] to refer to the individual scalars it contains.

    Whether that will fix your problem I haven't checked, but its a good start.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      I'll give it a try. Thanks!
      Updated the code in the original post and added the "use warnings". Same output problems though. No warnings given. Sigh.
        Are you reading a windows file on a unix machine ?
        Try changing chomp to s/\s+$//g;
        Update:

        Assuming each RSno has only one StarName and vice versa, it would be simpler to use a hash for the conversion

        #!perl use strict; my @Stars = qw(CYP2C19_10 CYP2C19_12); return if (@Stars == 0); # name,no my $ref_file = "test.csv"; open my $star_FH,'<',$ref_file or die "Could not find reference file $ref_file : $!"; my %RSno=(); my $count=0; print "Reading from ref_file $ref_file .. "; while (<$star_FH>){ s/\s+$//; my ($no,$name) = split ",",$_; $RSno{$name} = $no; ++$count; } close $star_FH; print "$count records read\n"; for my $StarIndex (@Stars){ if (exists $RSno{$StarIndex}){ print "$StarIndex is $RSno{$StarIndex}\n"; } else { print "$StarIndex NO CONVERSION\n"; } }
        poj
Re: Variable blasphemy
by Laurent_R (Canon) on Jul 24, 2015 at 15:52 UTC
    Hi SixTheCat,

    this looks like using under Unix a file prepared under Windows.

    Under Unix, new line characters are \n (line feed). Under Windows, new line is a combination of two characters: \r\n (carriage return and line feed).

    If you chomp a line under Unix (or Linux), it will remove only \n (line feed).

    If the file was prepared under Windows, this will leave the carriage return, meaning that the cursor will go back to the start of the line without doing a line feed, thereby clobbering the beginning of the line.

    If that's your problem, you could change chomp to:

    s/[\r\n]+$//;
    to remove any combination of the two end-of-line characters at line end.
      Thanks! It's all good now. Curse my use of windows!

        Nah, Perl plays Unix and Windows well. If you have both in your environment, it would behoove you to remember the small handful of portability issues you encounter, and make handling them part of your default programming habits.

        Modules often help with this.

Re: Variable blasphemy
by Tux (Canon) on Jul 25, 2015 at 11:49 UTC

    If you used a proper CSV parser like Text::CSV_XS, you would not have seen this problem in the first place.


    Enjoy, Have FUN! H.Merijn
Re: Variable blasphemy
by Monk::Thomas (Friar) on Jul 24, 2015 at 15:51 UTC

    I tried to reproduce this, but I don't get the same output. I have run your script as provided. (Only change: I converted the EOLs to UNIX-style.)

    $ cat test.csv 
    rs6413438,CYP2C19_10
    rs4986910,CYP2C19_20
    

    running your script yields

    Converting specified Star Designations to SNPs...
    --------- Converting Star Allele references to rs numbers ---------
    Current input line is 
    Index 0 is rs6413438
    Index 0 is rs6413438 is stored as rs6413438
    Index 1 is CYP2C19_10
    Index 1 is CYP2C19_10 is stored as CYP2C19_10
    Comparing CYP2C19_10 and CYP2C19_10
    1. rs6413438 was converted from CYP2C19_10
    2. CYP2C19_10 was converted to rs6413438
    3. CYP2C19_10 was converted to rs6413438
    4. rs6413438 was converted from CYP2C19_10
    Current input line is 
    Index 0 is rs4986910
    Index 0 is rs4986910 is stored as rs4986910
    Index 1 is CYP2C19_20
    Index 1 is CYP2C19_20 is stored as CYP2C19_20
    Comparing CYP2C19_20 and CYP2C19_12
    -------------- Done converting Star Allele references -------------
    
    I do NOT get this output:
    Index 1 is CYP2C19_10
    
    is stored as CYP2C19_10
    

    There seem to be some unexpected line breaks - this smells like a 'needs a chomp', but you already do that. Maybe you should have a second look at your csv input file?

    btw. there is a bug and 2 style problems in your code

    Bug: 'Current input line is ' does not actually print the line

    - say "Current input line is @_"; + say "Current input line is $_";

    Style problem 1 - no need to explicitely refer to $_

    - @tmpConv = split ",",$_; + @tmpConv = split ",";

    Style problem 2 - variable scoping

    replace

    my $tmpStar; my $tmpRS; my @tmpConv; while (<$STARFile>){ [...] @tmpConv = split ",",$_; $tmpStar = $tmpConv [1]; $tmpRS = $tmpConv[0];
    with
    while (<$STARFile>){ [...] my ($tmpRS, $tmpStar) = split ",";
      poj identified the problem. It was something about the csv file having been made under windows. It works great now =D