in reply to find shortest path for each query from a CSV file

The following looks to do the shortest path variant:

#!/usr/bin/perl use warnings; use strict; my %group = ( # Hash table/dictionary for all the groups 'P' => 'I_1', 'Pl' => 'I_2', 'P.P' => 'I_3', 'P.Pl' => 'I_4', 'Pl.P' => 'I_5', 'Pl.Pl' => 'I_6', 'P.P.P' => 'I_7', 'P.P.Pl' => 'I_8', 'P.Pl.P' => 'I_9', 'P.Pl.Pl' => 'I_10', 'Pl.P.P' => 'I_11', 'Pl.P.Pl' => 'I_12', 'Pl.Pl.P' => 'I_13', 'Pl.Pl.Pl' => 'I_14', 'E' => 'II_15', 'P.E' => 'II_16', 'Pl.E' => 'II_17', 'P.P.E' => 'II_18', 'P.Pl.E' => 'II_19', 'Pl.P.E' => 'II_20', 'Pl.Pl.E' => 'II_21', 'E.P' => 'III_22', 'E.Pl' => 'III_23', 'P.E.P' => 'III_24', 'P.E.Pl' => 'III_25', 'Pl.E.P' => 'III_26', 'Pl.E.Pl' => 'III_27', 'E.P.P' => 'III_28', 'E.P.Pl' => 'III_29', 'E.Pl.P' => 'III_30', 'E.Pl.Pl' => 'III_31', 'E.E' => 'IV_32', 'P.E.E' => 'IV_33', 'Pl.E.E' => 'IV_34', 'E.P.E' => 'IV_35', 'E.Pl.E' => 'IV_36', 'E.E.P' => 'IV_37', 'E.E.Pl' => 'IV_38', 'E.E.E' => 'IV_39', ); <DATA>; # Skip the headers (first row). my %tree; while (<DATA>) { # parse through the input data and fill in our tree data structure chomp; my ($child, $parent, $prob) = split /\t/; if ($child eq 'Q') { push @{$tree{$child}}, {parent => '', prob => $prob, dist => 0 +}; next; } if ($parent eq 'Q') { push @{$tree{$child}}, {parent => $parent, prob => $prob, dist + => 1}; next; } for my $opt (@{$tree{$parent}}) { my $dist = $opt->{dist} + 1; push @{$tree{$child}}, {parent => $parent, prob => $prob, dist => $dist}; } } for my $child (sort {length $a <=> length $b or $a cmp $b} keys %tree) + { my @bestPath = findBestPath($child, \%tree); my $probs = join '.', map {$_->{prob}} @bestPath; printf "%-5s ", "$child:"; # Join the likelihood path. Then if group is found for a likelihoo +d #from the group hash table then print it, else quit print join '<-', $child, grep {$_} map {$_->{parent}} @bestPath; print ", $probs"; print ", $group{$probs}" if exists $group{$probs}; print "\n"; } sub findBestPath { my ($child, $tree) = @_; return $tree->{Q}[0] if $child eq 'Q'; my @alts = sort {$a->{dist} <=> $b->{dist}} @{$tree->{$child}}; return $alts[0], findBestPath($alts[0]{parent}, $tree); } __DATA__ child, Parent, likelihood M7 Q P M54 M7 Pl M213 M54 E M206 M54 E M194 M54 E ...

Prints (in part):

Q: Q, E, II_15 M6: M6<-Q, E.E, IV_32 M7: M7<-Q, P.E, II_16 M10: M10<-Q, E.E, IV_32 M13: M13<-M7<-Q, E.P.E, IV_35 M17: M17<-Q, P.E, II_16 M18: M18<-Q, E .E M22: M22<-Q, E.E, IV_32 M23: M23<-Q, E.E, IV_32 M28: M28<-M6<-Q, P.E.E, IV_33 M33: M33<-M28<-M6<-Q, E.P.E.E
True laziness is hard work

Replies are listed 'Best First'.
Re^2: find shortest path for each query from a CSV file
by zing (Beadle) on Nov 22, 2013 at 12:07 UTC
    Sorry but Im getting this error (even though I have tried download link under your code) :-
    Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M7: M7<-Q, P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M54: M54<-M7<-Q, Pl.P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M194: M194<-M54<-M7<-Q, E.Pl.P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M206: M206<-M54<-M7<-Q, E.Pl.P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M213: M213<-M54<-M7<-Q, E.Pl.P.

      I truncated the data in the code I posted to reduce the number of uninteresting lines. The '...' is an ellipsis and is used to indicate missing data. If you substitute the data from Re^2: find shortest path for each query from a CSV file the code runs correctly without warnings.

      True laziness is hard work
Re^2: find shortest path for each query from a CSV file
by Anonymous Monk on Nov 22, 2013 at 19:52 UTC
    Please help Im still getting this error :-
    Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M7: M7<-Q, P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M54: M54<-M7<-Q, Pl.P.

      Just glancing at the code GrandFather posted I see a couple of spots where term 'prob' appears to be a bare word. At some point should it be a variable? I could be wrong here. There are also three dots in the __DATA__ portion that should probably be deleted too.

Re^2: find shortest path for each query from a CSV file
by zing (Beadle) on Nov 25, 2013 at 06:05 UTC
    Sorry but theres a problem with the code. For example consider 2nd line of you output. Its giving double probablity in third column (E.E) M6:   M6<-Q, E.E, IV_32 Whereas according to the __DATA__ M6 is coming directly from Q

    M6    Q    E

    Thus correct output should be :-  M6:   M6<-Q, E, II_15

    To give you an intuition this line in data means that the probability of M6 coming from Q is 'E'. That is what I want in third column. If suppose M6 were coming from M76 which in turn comes from Q  M6<-M76<-Q and the input data for these were

    __DATA__ M6 M76 E M76 Q E
    Then in this case M6: M6<-M76<-Q, E.E, IV_32 would have been a correct output.

    So basically the third column is giving wront output, due to which fourth column is also giving incorrect results as it is based on third for its input.

      Sorry, but any problem with the code is now your problem. A trivial examination of the output and thinking about it will tell you why it is as it as. Feel free to correct the code as you see fit.

      True laziness is hard work