waekit has asked for the wisdom of the Perl Monks concerning the following question:

Hi, please help. I'm very new to Perl and coding, I have a big file with 15 columns and some with missing information. What I'm trying to do, is select column [ 13] with missing information and column [ 14] with no missing information; and then use column [ 4] as a key to map to column [ 2] and print all lines in column [ 2].

Here's how my data file looks like (tab delimited):

BC000 1 1 2 3 F 3 51 51 + BC000 0 2 M 999 BC000 0 3 37 36 F 65 + BC000 0 4 2 3 M 50 +50 BC000 0 5 2 3 F 45 47 + 46 BC000 0 6 2 3 F 3 42 + BC000 0 7 2 3 M 99 +9 BC000 0 8 2 3 F 3 42 + BC000 0 9 2 3 F 1 39 + BC000 0 10 2 3 F 3 35 + BC000 0 11 45 8 M 11 + BC000 0 12 45 8 F 9

Basically: column [ 13] = year of birth column [ 4] = motherID column [ 2] = individualID All motherIDs will have an individualID (the same as motherID) Here's my code:

#!/usr/bin/perl use warnings; use strict; my $filename = 'pedigree_proband_testfile.txt'; open (FILE, "<", $filename) or die "Cannot open file $!"; my @data = <FILE>; my @column; my $motherID; foreach my $line (@data) { @column = split ( /\t/, $line); if ($column[13] eq "" && $column[4] ne "" ) { $motherID = join($column[2],$column[4]); if ("$motherID" eq "$column[2]") { print "$line\t\n"; } } }

I tried this but it doesn't return any values. Can you please help? I'm really new to this, I'm trying to write a script to avoid mapping the columns manually in MS Excel as it's a big file with over 4000 rows. Many many thanks in advance!

Replies are listed 'Best First'.
Re: How to map data from one column based on another column in perl
by NetWallah (Canon) on Jun 06, 2018 at 05:39 UTC
    join($column[2],$column[4])
    Is probably not doing what you think it is doing.

    join EXPR,LIST

        Joins the separate strings of LIST into a single string with fields separated by the value of EXPR, and returns that new string

    In your case, LIST is a single item, so the result of the join will be just $column[4].

    You probably wanted:

    $motherID = "$column[2]$column[4]" # Interpolation # OR (They both do the same thing) $motherID = $column[2] . $column[4]; # Concatenation

                    Memory fault   --   brain fried

Re: How to map data from one column based on another column in perl
by hippo (Archbishop) on Jun 06, 2018 at 08:01 UTC
    What I'm trying to do, is select column [ 13] with missing information and column [ 14] with no missing information

    Unfortunately your line which does this is missing a digit. You have if ($column[13] eq "" && $column[4] ne "" ) which would do this if it said [14] instead of [4]. Is this your first problem?

    #!/usr/bin/perl use warnings; use strict; my $filename = 'pedigree_proband_testfile.txt'; open (FILE, "<", $filename) or die "Cannot open file $!"; my @data = <FILE>; my @column; my $motherID; foreach my $line (@data) { @column = split ( /\t/, $line); if ($column[13] eq "" && $column[14] ne "" ) { print "DEBUG: $line"; # It isn't clear what you are trying to do below here $motherID = join($column[2],$column[4]); if ("$motherID" eq "$column[2]") { print "$line\t\n"; } } }

    See item number 2 on the Basic debugging checklist for the use of debugging aids like this print statement.

Re: How to map data from one column based on another column in perl
by Marshall (Canon) on Jun 06, 2018 at 19:41 UTC
    The description of the problem is not enough for me to figure out what you want as an end objective.

    So, as a clarification step, I wrote code to print the columns of interest (the ones that you mentioned).
    In the code below, I translated the column indices into names based upon your post. Usually it is a good idea to use a meaningful name for a column instead of a number just because that is much easier for humans to understand. Perl is very flexible in how this can be done.

    Please explain what the desired output is for each line below.

    Also, in my opinion a tab delimited file is a difficult critter to deal with because it is difficult to see the difference between space and tab. I guess you are exporting this from an Excel table? Better is to use standard CSV or I often use pipe "|" delimiters for files that may be subsequently modified by humans. Humans cannot be expected to write correct syntax for columns containing embedded commas when using a comma delimited file. However, "|" delimited files work well because this character is not part of names or addresses. This tab delimited stuff is just hard to deal with.

    #/usr/bin/perl use strict; use warnings; $|=1; #turns stdio buffer off so that error line order is correct while (defined (my $line = <DATA>)) { chomp $line; #removes end of line character(s) next unless $line =~ /\S/; # skip blank lines my ($motherId,$yearBirth,$individualId) = (split /\t/,$line)[4,13, +2]; $yearBirth //= ""; # DATA line 6 is a bit weird # this defines $yearBirth as "" if undefined print "UniqId=$individualId Mother=$motherId BirthY=$yearBirth \ +n"; } =Prints UniqId=1 Mother=3 BirthY=51 UniqId=2 Mother= BirthY= UniqId=3 Mother=36 BirthY=65 UniqId=4 Mother=3 BirthY=50 UniqId=5 Mother=3 BirthY=47 UniqId=6 Mother=3 BirthY= UniqId=7 Mother=3 BirthY= UniqId=8 Mother=3 BirthY=42 UniqId=9 Mother=3 BirthY=39 UniqId=10 Mother=3 BirthY=35 UniqId=11 Mother=8 BirthY=11 UniqId=12 Mother=8 BirthY=9 =cut __DATA__ BC000 1 1 2 3 F 3 51 51 + BC000 0 2 M 999 BC000 0 3 37 36 F 65 + BC000 0 4 2 3 M 50 +50 BC000 0 5 2 3 F 45 47 + 46 BC000 0 6 2 3 F 3 42 + BC000 0 7 2 3 M 99 +9 BC000 0 8 2 3 F 3 42 + BC000 0 9 2 3 F 1 39 + BC000 0 10 2 3 F 3 35 + BC000 0 11 45 8 M 11 + BC000 0 12 45 8 F 9
    UPDATE:
    I converted this tab delimited file to a "|" delimited file with my text editor.
    I don't know what col[14] is supposed to mean? Added below as "what?".
    When using a visible column delimiter, we can see issues with say Male vs Female (not a single character).
    #/usr/bin/perl use strict; use warnings; $|=1; #turns stdio buffer off so that error line order is correct while (defined (my $line = <DATA>)) { chomp $line; #removes end of line character(s) next unless $line =~ /\S/; # skip blank lines my ($motherId,$yearBirth,$individualId,$what) = (split /\|/,$line) +[4,13,2,14]; $yearBirth //= ""; # DATA line 6 is a bit weird # this defines $yearBirth as "" if undefined $what //= ""; # I don't know what this is? print "UniqId=$individualId Mother=$motherId BirthY=$yearBirth W +hat=$what\n"; } =Prints UniqId=1 Mother=3 BirthY=51 What= UniqId=2 Mother= BirthY= What=999 UniqId=3 Mother=36 BirthY=65 What= UniqId=4 Mother=3 BirthY=50 What=50 UniqId=5 Mother=3 BirthY=47 What=46 UniqId=6 Mother=3 BirthY= What= UniqId=7 Mother=3 BirthY= What=999 UniqId=8 Mother=3 BirthY=42 What= UniqId=9 Mother=3 BirthY=39 What= UniqId=10 Mother=3 BirthY=35 What= UniqId=11 Mother=8 BirthY=11 What= UniqId=12 Mother=8 BirthY=9 What= =cut __DATA__ BC000|1|1|2|3|F ||3|51|||||51| BC000|0|2|||M |||||||||999 BC000|0|3|37|36|F ||||||||65| BC000|0|4|2|3|M ||||||||50|50 BC000|0|5|2|3|F |||45|||||47|46 BC000|0|6|2|3|F ||3|42|||||| BC000|0|7|2|3|M |||||||||999 BC000|0|8|2|3|F ||3||||||42| BC000|0|9|2|3|F ||1||||||39| BC000|0|10|2|3|F ||3||||||35| BC000|0|11|45|8|M ||||||||11| BC000|0|12|45|8|F ||||||||9|
Re: How to map data from one column based on another column in perl
by NetWallah (Canon) on Jun 06, 2018 at 18:19 UTC
    Here is some idiomatic code to help get you to where I think you are headed:
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $filename = 'pedigree_proband_testfile.txt'; my %n = (birth_year=> 13, motherID=> 4, individualID=> 2, gender => +5); my %mominfo; open my $f, "<", $filename or die "ERROR: Cannot open $filename:$!"; while (<$f>){ chomp; my @c = split /\t/, $_; my $currentMomID = $c[ $n{motherID} ] || $c[ $n{individualID} ] or +die "No Mother ID in record nbr $.=$_;"; my $currentMom = $mominfo{ $currentMomID } ||= {}; # Create an empty + mom if she does not exist if ($currentMomID == $c[ $n{individualID} ]){ print "This is Mom ($currentMomID)'s main record: $_\n"; $currentMom->{birth_year} = $c[ $n{birth_year} ]; }else{ push @{ $currentMom->{CHILDREN} }, {ID=> $c[ $n{individualID} ] , + birth_year => $c[ $n{birth_ye +ar} ], gender => $c[ $n{gender} ]}; } } print Dumper \%mominfo;
    Output:

                    Memory fault   --   brain fried