BioNrd has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to do something which perl was made for. However, it is giving me the run around.
I have 2 text files (represented in the code by RESULT and DATA). I would like to match the name of the file in the two seperate text files (i.e. D:/Program Files/Eclipse/workspace... with its match in the other file). While at the same time manupulating the data contained within RESULT to contain info from DATA.
To put it another way: RESULT has the data I need, but lacks identifing numbers. DATA has the identfing numbers but lacks the data. I want a file that has the data I need and the ID numbers.
I have started to get the idea below, but cant seem to make it the rest of the way. I need to somehow match @my_data to $woot_woot, and then reprint everything to look something like this all spaces are tabbed:
For the first match:
6
4 0.03609 0
3 0.0416887 0.0305891 0
2 0.0281343 0.0229377 0.0346726 0
5 0.0512866 0.0432893 0.0442696 0.0455792 0
Here is the code I have started, it knows where it wants to go, but just cant get there.
open (RESULT, "FSTdist.sum") || die "Can't open FSTdist.sum: $!\n"; #RESULT contains: # Sorry I don't know how to make data happen twice in a script # I tabbed over once from the #, the tab does not exist in the text + file. # # ----------------------------------------------------------------- +-- # Arlequin batch run summary file" # file_name # "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/ +20080503203813/sample_20080422103123_1_final_0.res/sample_20080422103 +123_1_final_0.arp" 0 # 0.03609 0 # 0.0416887 0.0305891 0 # 0.0281343 0.0229377 0.0346726 0 # 0.0512866 0.0432893 0.0442696 0.0455792 0 # p-Values matrix : No of permutations : 110 # # 0 # 0 0 # 0 0 0 # 0 0 0 0 # # "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/ +20080503203813/sample_20080422103123_1_starting_0.res/sample_20080422 +103123_1_starting_0.arp" 0 # 0.0100529 0 # 0.00927136 0.00858208 0 # 0.00938612 0.0144204 0.0112178 0 # 0.00852316 0.00662521 0.00921253 0.0148837 0 # 0.00552453 0.00510669 0.0145038 0.0130359 0.0141625 + 0 # p-Values matrix : No of permutations : 110 # # 0.234234 # 0.324324 0.567568 # 0.36036 0.027027 0.216216 # 0.432432 0.792793 0.396396 0.00900901 # 0.900901 0.936937 0.027027 0.0900901 0.045045 # # "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/ +20080503203813/sample_20080422103123_1_final_0.res/sample_20080422103 +123_1_final_0.arp" 0 # 0.03609 0 # 0.0416887 0.0305891 0 # 0.0281343 0.0229377 0.0346726 0 # 0.0512866 0.0432893 0.0442696 0.0455792 0 # p-Values matrix : No of permutations : 110 # # 0 # 0 0 # 0 0 0 # 0 0 0 0 # # "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/ +20080503203813/sample_20080422103123_1_starting_0.res/sample_20080422 +103123_1_starting_0.arp" 0 # 0.0100529 0 # 0.00927136 0.00858208 0 # 0.00938612 0.0144204 0.0112178 0 # 0.00852316 0.00662521 0.00921253 0.0148837 0 # 0.00552453 0.00510669 0.0145038 0.0130359 0.0141625 + 0 # p-Values matrix : No of permutations : 110 # # 0.297297 # 0.423423 0.414414 # 0.378378 0.027027 0.234234 # 0.459459 0.756757 0.45045 0.027027 # 0.882883 0.918919 0.036036 0.0990991 0.027027 while (<RESULT>){ @my_data=<RESULT>; print "@my_data\n" } my $HEADERS = 2; my @prev_cells; while (<DATA>) { # Input data is really from a different file. chomp; my @cells = split /\"/, $_; # Remove redundant headers. my @display_cells = @cells; if (@prev_cells) { for (@display_cells[ 0 .. $HEADERS-1 ]) { if ($_ ne shift @prev_cells and $_ =~ m/D:/) { $woot_woot = $_; } $_ = ''; } print "$woot_woot\n"; print "$display_cells[3]\n"; # I am thinking I need a hash with the $woot_woot as the key, an +d all values # of $display cells[3] as the value of the key. # But I can't figure how to do that. } @prev_cells = @cells; } __DATA__ ------------------------------------------------------------------- Arlequin batch run summary file" Gene diversity indices: file_name pop_name nb_gene_copies nb_haplotypes [orig_nb_h +aplotypes] num_loci _nb_usable_loci _nb_pol_sites gene_di +versity "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "6" 60 58 10 10 10 0.998870 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "4" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "3" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "2" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "5" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "6" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "4" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "1" 60 59 10 10 10 0.999435 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "3" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "2" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "5" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "6" 60 58 10 10 10 0.998870 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "4" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "3" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "2" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_final_0.res/sample_20080422103123_1 +_final_0.arp" "5" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "6" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "4" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "1" 60 59 10 10 10 0.999435 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "3" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "2" 60 60 10 10 10 1.000000 "D:/Program Files/Eclipse/workspace/Converter_PC/recoded_pop_gen/20080 +503203813/sample_20080422103123_1_starting_0.res/sample_2008042210312 +3_1_starting_0.arp" "5" 60 60 10 10 10 1.000000 __END__
UPDATE: I wanted to share my final solution. Here the gen_div.sum file is the __DATA__ from above, and the pairdist.sum file is the file contents that are commented above. Thanks for the advice on processing files, both to plot out my course, and for the mock up code. I took it, and ran with both.
my %data; open(A, '<', 'gen_div.sum'); while (<A>) { chomp; my @cells = split(/\"/, $_); if ($cells[1] =~ m/:\//) { push @{$data{$cells[1]}}, $cells[3]; } } close(A); open(B, '<', 'pairdist.sum'); while (<B>) { chomp; my @cells = split(/\"/, $_); if ($cells[1] =~ m/:\//) { for my $key_finder (keys %data) { if ($key_finder eq $cells[1]) { $unlocked = $key_finder; @holder_inside = @{$data{$unlocked}}; @holder_sig = @holder_inside; my $pop = shift @holder_inside; shift @holder_sig; print "$key_finder FINDER\n"; print "$pop\t0\n"; } } } else { if (exists $holder_inside[0]) { my $pop_value = shift @holder_inside; print "$pop_value\t@cells\n"; } else { @sig_check = split(/\t/, $cells[0]); if ($sig_check[0] =~ /[0-9]/ and not m/p-Value/) { #/[0- +9]+\..*\.output/ my $pop_value = shift @holder_sig; print "$pop_value\t"; foreach my $item (@sig_check) { if ($item =~ m/\d/ and $item <= 0.05) { print "+\t"; } elsif ($item =~ m/\d/ and $item >= 0.05) { print "-\t"; } } print "\n"; } } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Combine files, while parsing info.
by pc88mxer (Vicar) on May 04, 2008 at 07:27 UTC | |
|
Re: Combine files, while parsing info.
by Erez (Priest) on May 04, 2008 at 07:03 UTC |