find shortest path for each query from a CSV file

Replies are listed 'Best First'.
Re: find shortest path for each query from a CSV file by marinersk (Priest) on Nov 22, 2013 at 06:37 UTC
I'd recommend `Text::CSV_XS` for reading the CSV file, though the structure seems reliable enough to simply use `split` if you wish. I'd start by storing the data in a hash of hashes, but perhaps hash of arrays will be needed if multiple children nodes are possible. There might be some useful tools in `List::Utils`, but if not, I'd probably store a sister element with depth as I walked through the tree (more efficient to build that whilst assembling the master tree, I should think). From there it is likely to be straight subtraction of the depth values and select the smallest one(s). Would likely expect a tree walk for that part. Probably build a reverse path hash on the way in, report on it on the way out. For more details, post some code and show what isn't working as expected.	[reply] [d/l] [select]
Re^2: find shortest path for each query from a CSV file by zing (Beadle) on Nov 22, 2013 at 09:15 UTC
The code as requested. But its printing only the longest path. Read more... (10 kB)	[reply] [d/l]
Re: find shortest path for each query from a CSV file by Tux (Canon) on Nov 22, 2013 at 07:18 UTC
Read the data with Text::CSV_XS (or Text::CSV) and then model into a Graph where you can use methods like `SPT_Dijkstra`, `SPT_Bellman_Ford`, `APSP_Floyd_Warshall` and their variants. Searching CPAN might help too with various solutions to shortest-path problems. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: find shortest path for each query from a CSV file by GrandFather (Saint) on Nov 22, 2013 at 11:53 UTC
The following looks to do the shortest path variant: #!/usr/bin/perl use warnings; use strict; my %group = ( # Hash table/dictionary for all the groups 'P' => 'I_1', 'Pl' => 'I_2', 'P.P' => 'I_3', 'P.Pl' => 'I_4', 'Pl.P' => 'I_5', 'Pl.Pl' => 'I_6', 'P.P.P' => 'I_7', 'P.P.Pl' => 'I_8', 'P.Pl.P' => 'I_9', 'P.Pl.Pl' => 'I_10', 'Pl.P.P' => 'I_11', 'Pl.P.Pl' => 'I_12', 'Pl.Pl.P' => 'I_13', 'Pl.Pl.Pl' => 'I_14', 'E' => 'II_15', 'P.E' => 'II_16', 'Pl.E' => 'II_17', 'P.P.E' => 'II_18', 'P.Pl.E' => 'II_19', 'Pl.P.E' => 'II_20', 'Pl.Pl.E' => 'II_21', 'E.P' => 'III_22', 'E.Pl' => 'III_23', 'P.E.P' => 'III_24', 'P.E.Pl' => 'III_25', 'Pl.E.P' => 'III_26', 'Pl.E.Pl' => 'III_27', 'E.P.P' => 'III_28', 'E.P.Pl' => 'III_29', 'E.Pl.P' => 'III_30', 'E.Pl.Pl' => 'III_31', 'E.E' => 'IV_32', 'P.E.E' => 'IV_33', 'Pl.E.E' => 'IV_34', 'E.P.E' => 'IV_35', 'E.Pl.E' => 'IV_36', 'E.E.P' => 'IV_37', 'E.E.Pl' => 'IV_38', 'E.E.E' => 'IV_39', ); <DATA>; # Skip the headers (first row). my %tree; while (<DATA>) { # parse through the input data and fill in our tree data structure chomp; my ($child, $parent, $prob) = split /\t/; if ($child eq 'Q') { push @{$tree{$child}}, {parent => '', prob => $prob, dist => 0 +}; next; } if ($parent eq 'Q') { push @{$tree{$child}}, {parent => $parent, prob => $prob, dist + => 1}; next; } for my $opt (@{$tree{$parent}}) { my $dist = $opt->{dist} + 1; push @{$tree{$child}}, {parent => $parent, prob => $prob, dist => $dist}; } } for my $child (sort {length $a <=> length $b or $a cmp $b} keys %tree) + { my @bestPath = findBestPath($child, \%tree); my $probs = join '.', map {$_->{prob}} @bestPath; printf "%-5s ", "$child:"; # Join the likelihood path. Then if group is found for a likelihoo +d #from the group hash table then print it, else quit print join '<-', $child, grep {$_} map {$_->{parent}} @bestPath; print ", $probs"; print ", $group{$probs}" if exists $group{$probs}; print "\n"; } sub findBestPath { my ($child, $tree) = @_; return $tree->{Q}[0] if $child eq 'Q'; my @alts = sort {$a->{dist} <=> $b->{dist}} @{$tree->{$child}}; return $alts[0], findBestPath($alts[0]{parent}, $tree); } __DATA__ child, Parent, likelihood M7 Q P M54 M7 Pl M213 M54 E M206 M54 E M194 M54 E ... [download] Prints (in part): `Q: Q, E, II_15 M6: M6<-Q, E.E, IV_32 M7: M7<-Q, P.E, II_16 M10: M10<-Q, E.E, IV_32 M13: M13<-M7<-Q, E.P.E, IV_35 M17: M17<-Q, P.E, II_16 M18: M18<-Q, E .E M22: M22<-Q, E.E, IV_32 M23: M23<-Q, E.E, IV_32 M28: M28<-M6<-Q, P.E.E, IV_33 M33: M33<-M28<-M6<-Q, E.P.E.E` [download] True laziness is hard work	[reply] [d/l] [select]
Re^2: find shortest path for each query from a CSV file by zing (Beadle) on Nov 22, 2013 at 12:07 UTC
Sorry but Im getting this error (even though I have tried download link under your code) :- Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M7: M7<-Q, P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M54: M54<-M7<-Q, Pl.P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M194: M194<-M54<-M7<-Q, E.Pl.P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M206: M206<-M54<-M7<-Q, E.Pl.P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M213: M213<-M54<-M7<-Q, E.Pl.P. [download]	[reply] [d/l]
Re^3: find shortest path for each query from a CSV file by GrandFather (Saint) on Nov 22, 2013 at 23:15 UTC
I truncated the data in the code I posted to reduce the number of uninteresting lines. The '...' is an ellipsis and is used to indicate missing data. If you substitute the data from Re^2: find shortest path for each query from a CSV file the code runs correctly without warnings. True laziness is hard work	[reply]
Re^2: find shortest path for each query from a CSV file by Anonymous Monk on Nov 22, 2013 at 19:52 UTC
Please help Im still getting this error :- `Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M7: M7<-Q, P. Use of uninitialized value in join or string at check_22nov_metabolite +_pred_2.pl line 74, <DATA> line 6. M54: M54<-M7<-Q, Pl.P.` [download]	[reply] [d/l]
Re^3: find shortest path for each query from a CSV file by VincentK (Beadle) on Nov 22, 2013 at 21:33 UTC
Just glancing at the code GrandFather posted I see a couple of spots where term 'prob' appears to be a bare word. At some point should it be a variable? I could be wrong here. There are also three dots in the __DATA__ portion that should probably be deleted too.	[reply]
Re^2: find shortest path for each query from a CSV file by zing (Beadle) on Nov 25, 2013 at 06:05 UTC
Sorry but theres a problem with the code. For example consider 2nd line of you output. Its giving double probablity in third column (E.E) `M6: M6<-Q, E.E, IV_32` Whereas according to the __DATA__ M6 is coming directly from Q `M6 Q E` Thus correct output should be :- `M6: M6<-Q, E, II_15` To give you an intuition this line in data means that the probability of M6 coming from Q is 'E'. That is what I want in third column. If suppose M6 were coming from M76 which in turn comes from Q `M6<-M76<-Q` and the input data for these were `__DATA__ M6 M76 E M76 Q E` [download] Then in this case M6: M6<-M76<-Q, E.E, IV_32 would have been a correct output. So basically the third column is giving wront output, due to which fourth column is also giving incorrect results as it is based on third for its input.	[reply] [d/l] [select]
Re^3: find shortest path for each query from a CSV file by GrandFather (Saint) on Nov 25, 2013 at 06:29 UTC
Sorry, but any problem with the code is now your problem. A trivial examination of the output and thinking about it will tell you why it is as it as. Feel free to correct the code as you see fit. True laziness is hard work	[reply]
Re: find shortest path for each query from a CSV file by hdb (Monsignor) on Nov 22, 2013 at 10:11 UTC
I could not resist but I will leave the testing to you. I also suspect you have circles in your data. Being lazy, I have also used some global variables. use warnings; no warnings 'recursion'; use strict; <DATA>; # Skip the headers (first row). my %edges; my %kids; while (<DATA>) { chomp; my ($child, $parent, $likelihood) = split /\t/; $edges{$parent}{$child} = $likelihood; $kids{$child} = 1; } my $nkids = scalar keys %kids; my %pathes = ( Q => { path => [ 'Q' ], lh => [] } ); sub shortest_pathes { my( $end, $level ) = @_; return if $level > $nkids; return unless $edges{$end}; my @lh = @{$pathes{$end}{lh}}; my @path = @{$pathes{$end}{path}}; my @pre = keys %{$edges{$end}}; for( @pre ) { if( ( !exists $pathes{$_} ) or ( scalar @{$pathes{$_}{ +path}} > 1 + scalar @path ) ) { $pathes{$_} = { path => [ $_, @path ], lh => [ + $edges{$end}{$_}, @lh ] }; } } shortest_pathes( $_, $level+1 ) for @pre; } shortest_pathes 'Q', 0; delete $pathes{Q}; for (sort {substr($a,1) <=> substr($b,1)} keys %pathes) { print "$_\t"; print join "<-", @{$pathes{$_}{path}}; print "\t"; print join ".", @{$pathes{$_}{lh}}; print "\n"; } __DATA__ child, Parent, likelihood M7 Q P M54 M7 Pl M213 M54 E M206 M54 E M194 M54 E M53 M7 Pl M186 M53 Pl M194 M53 Pl M187 M53 E M204 M53 E M201 M53 E M202 M53 E M179 M53 E M13 M53 E M157 M53 E M173 M53 E M205 M53 E M195 M53 E M196 M53 E M197 M53 E M198 M53 E M57 M7 E M44 M7 E M61 M7 E M13 M7 E M50 M7 E M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E M160 M50 E M161 M50 E M51 M7 E M174 M51 Pl M50 M51 Pl M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E M160 M50 E M161 M50 E M173 M51 Pl M175 M51 E M176 M51 E M62 M7 E M55 M7 E M17 Q P M83 M17 Pl M54 M17 Pl M213 M54 E M206 M54 E M194 M54 E M84 M17 E M101 M17 E M98 M17 E M99 M17 E M76 M17 E M206 M76 P M278 M76 E M289 M76 E M13 M76 E M282 M76 E M283 M76 E M294 M76 E M285 M76 E M13 M17 E M88 M17 E M158 M88 P M306 M88 Pl M307 M88 Pl M317 M88 E M318 M88 E M319 M88 E M320 M88 E M282 M88 E M299 M88 E M311 M88 E M312 M88 E M313 M88 E M314 M88 E M315 M88 E M316 M88 E M89 M17 E M174 M89 P M88 M89 P M158 M88 P M306 M88 Pl M307 M88 Pl M317 M88 E M318 M88 E M319 M88 E M320 M88 E M282 M88 E M299 M88 E M311 M88 E M312 M88 E M313 M88 E M314 M88 E M315 M88 E M316 M88 E M326 M89 Pl M333 M89 E M335 M89 E M336 M89 E M283 M89 E M330 M89 E M331 M89 E M332 M89 E M102 M17 E M92 M17 E M93 M17 E M94 M17 E M95 M17 E M22 Q E M6 Q E M28 M6 P M115 M28 Pl M114 M28 Pl M112 M28 E M117 M28 E M113 M28 E M107 M28 E M33 M28 E M50 M28 E M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E M160 M50 E M161 M50 E M51 M28 E M174 M51 Pl M50 M51 Pl M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E M160 M50 E M161 M50 E M173 M51 Pl M175 M51 E M176 M51 E M7 M28 E M54 M7 Pl M213 M54 E M206 M54 E M194 M54 E M53 M7 Pl M186 M53 Pl M194 M53 Pl M187 M53 E M204 M53 E M201 M53 E M202 M53 E M179 M53 E M13 M53 E M157 M53 E M173 M53 E M205 M53 E M195 M53 E M196 M53 E M197 M53 E M198 M53 E M57 M7 E M44 M7 E M61 M7 E M13 M7 E M50 M7 E M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E M160 M50 E M161 M50 E M51 M7 E M174 M51 Pl M50 M51 Pl M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E M160 M50 E M161 M50 E M173 M51 Pl M175 M51 E M176 M51 E M62 M7 E M55 M7 E M36 M6 P M126 M36 Pl M115 M36 Pl M131 M36 E M132 M36 E M127 M36 E M138 M36 E M139 M36 E M120 M36 E M33 M36 E M88 M36 E M158 M88 P M306 M88 Pl M307 M88 Pl M317 M88 E M318 M88 E M319 M88 E M320 M88 E M282 M88 E M299 M88 E M311 M88 E M312 M88 E M313 M88 E M314 M88 E M315 M88 E M316 M88 E M89 M36 E M174 M89 P M88 M89 P M158 M88 P M306 M88 Pl M307 M88 Pl M317 M88 E M318 M88 E M319 M88 E M320 M88 E M282 M88 E M299 M88 E M311 M88 E M312 M88 E M313 M88 E M314 M88 E M315 M88 E M316 M88 E M326 M89 Pl M333 M89 E M335 M89 E M336 M89 E M283 M89 E M330 M89 E M331 M89 E M332 M89 E M134 M36 E M135 M36 E M136 M36 E M17 M36 E M83 M17 Pl M54 M17 Pl M213 M54 E M206 M54 E M194 M54 E M84 M17 E M101 M17 E M98 M17 E M99 M17 E M76 M17 E M206 M76 P M278 M76 E M289 M76 E M13 M76 E M282 M76 E M283 M76 E M294 M76 E M285 M76 E M13 M17 E M88 M17 E M158 M88 P M306 M88 Pl M307 M88 Pl M317 M88 E M318 M88 E M319 M88 E M320 M88 E M282 M88 E M299 M88 E M311 M88 E M312 M88 E M313 M88 E M314 M88 E M315 M88 E M316 M88 E M89 M17 E M174 M89 P M88 M89 P M158 M88 P M306 M88 Pl M307 M88 Pl M317 M88 E M318 M88 E M319 M88 E M320 M88 E M282 M88 E M299 M88 E M311 M88 E M312 M88 E M313 M88 E M314 M88 E M315 M88 E M316 M88 E M326 M89 Pl M333 M89 E M335 M89 E M336 M89 E M283 M89 E M330 M89 E M331 M89 E M332 M89 E M102 M17 E M92 M17 E M93 M17 E M94 M17 E M95 M17 E M34 M6 E M35 M6 E Q M6 E M10 Q E M46 M10 P M216 M46 P M189 M46 P M244 M46 E M246 M46 E M247 M46 E M248 M46 E M249 M46 E M243 M46 E M70 M10 P M257 M70 Pl M216 M70 Pl M267 M70 E M265 M70 E M266 M70 E M250 M70 E M268 M70 E M269 M70 E M270 M70 E M261 M70 E M262 M70 E M263 M70 E M264 M70 E M72 M10 E M75 M10 E M71 M10 E M23 Q E M18 Q E [download]	[reply] [d/l]
Re^2: find shortest path for each query from a CSV file by zing (Beadle) on Nov 22, 2013 at 10:53 UTC
Thanks pal. What if I also wanted a 4th column for corresponding groups for each likelihood combination in the output. For example , which would then take values from this hash :- my %group = ( # Hash table/dictionary for all the group +s 'P'=> 'I_1', 'Pl'=>'I_2', 'P.P'=>'I_3', 'P.Pl'=>'I_4', 'Pl.P'=>'I_5', 'Pl.Pl'=>'I_6', 'P.P.P'=>'I_7', 'P.P.Pl'=>'I_8', 'P.Pl.P'=>'I_9', 'P.Pl.Pl'=>'I_10', 'Pl.P.P'=>'I_11', 'Pl.P.Pl'=>'I_12', 'Pl.Pl.P'=>'I_13', 'Pl.Pl.Pl'=>'I_14', 'E'=> 'II_15', 'P.E' => 'II_16', 'Pl.E' => 'II_17', 'P.P.E'=>'II_18', 'P.Pl.E'=>'II_19', 'Pl.P.E'=>'II_20', 'Pl.Pl.E'=>'II_21', 'E.P'=> 'III_22', 'E.Pl'=>'III_23', 'P.E.P'=>'III_24', 'P.E.Pl'=>'III_25', 'Pl.E.P'=>'III_26', 'Pl.E.Pl'=>'III_27', 'E.P.P'=>'III_28', 'E.P.Pl'=>'III_29', 'E.Pl.P'=>'III_30', 'E.Pl.Pl'=>'III_31', 'E.E'=>'IV_32', 'P.E.E'=>'IV_33', 'Pl.E.E'=>'IV_34', 'E.P.E'=>'IV_35', 'E.Pl.E'=>'IV_36', 'E.E.P'=>'IV_37', 'E.E.Pl'=>'IV_38', 'E.E.E'=>'IV_39', ); [download] So if my third column "in the output" has value P.P.P, fourth column will have the group_id (taken from the "group" hash) as I_7, and so on.	[reply] [d/l]
Re: find shortest path for each query from a CSV file by hdb (Monsignor) on Nov 22, 2013 at 07:12 UTC
I would pick one of the solutions from constructing a tree from csv file and then discard all but the shortest pathes.	[reply]
Re^2: find shortest path for each query from a CSV file by zing (Beadle) on Nov 22, 2013 at 09:06 UTC
Actually that solution is showing only the longest path. I dont know how to tweak it now !!!!!	[reply]
Re: find shortest path for each query from a CSV file by LanX (Saint) on Nov 22, 2013 at 23:31 UTC
Did you try good old Dijkstra's algorithm ? Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re: find shortest path for each query from a CSV file by oiskuu (Hermit) on Nov 23, 2013 at 16:57 UTC
Probably rather inefficient, but it might do. #! /usr/bin/perl my %DEF = ( I => [qw( P Pl P.P P.Pl Pl.P Pl.Pl P.P.P P.P.Pl P.Pl.P P.Pl.Pl Pl. +P.P Pl.P.Pl Pl.Pl.P Pl.Pl.Pl )], II => [qw( E P.E Pl.E P.P.E P.Pl.E Pl.P.E Pl.Pl.E )], III => [qw( E.P E.Pl P.E.P P.E.Pl Pl.E.P Pl.E.Pl E.P.P E.P.Pl E.Pl.P + E.Pl.Pl )], IV => [qw( E.E P.E.E Pl.E.E E.P.E E.Pl.E E.E.P E.E.Pl E.E.E )] ); my @rank = map @$_, @DEF{qw(I II III IV)}; my %rank = map {$rank[$_-1] => $_} 1..@rank; my @group = map {($_) x @{$DEF{$_}}} qw(I II III IV); my %group = map {$rank[$_-1] => $group[$_-1]."_".$_} 1..@group; sub rank { $rank{$a->[2]} <=> $rank{$b->[2]} } my %T; sub oh { map values %$_, @_ } sub ab { my ($b, $a) = @_; [$b->[0], $a->[1], qq($a->[2].$b->[2]), qq($b->[3]<-$a->[3])] } sub xtend { my $a = shift; map {ab $_, $a} oh @{$T{$a->[0]}}{@_} } sub ins { $T{$_[3] //= $_[1]}{$_[2]}{$_[0]} = \@_ } ins split /,\s*/ for <DATA>; ins @$_ for map {xtend $_, qw(P E Pl)} (oh oh oh \%T); ins @$_ for map {xtend $_, qw(P E Pl)} (oh oh oh \%T); for (sort {rank} grep {$_->[1] eq 'Q'} (oh oh oh \%T)) { printf "%-4s: %20s, %-8s %6s\n", $_->[0], qq($_->[0]<-$_->[3]), $_->[2], $group{$_->[2]}; } __DATA__ M7, Q, P, M7, M28, E, M28, M6, E, M6, Q, Pl, [download]	[reply] [d/l]
Re^2: find shortest path for each query from a CSV file by zing (Beadle) on Nov 25, 2013 at 06:12 UTC
Thanks but your code doesnt print anything on this data `__DATA__ M7 Q P M54 M7 Pl M213 M54 E M206 M54 E M194 M54 E M53 M7 Pl M186 M53 Pl M194 M53 Pl M187 M53 E M204 M53 E M201 M53 E M202 M53 E M179 M53 E M13 M53 E M157 M53 E M173 M53 E M205 M53 E M195 M53 E M196 M53 E M197 M53 E M198 M53 E M57 M7 E M44 M7 E M61 M7 E M13 M7 E M50 M7 E M158 M50 P M157 M50 P M153 M50 Pl M162 M50 E M164 M50 E M165 M50 E M147 M50 E M159 M50 E` [download]	[reply] [d/l]

Text::CSV_XS

split

I'd start by storing the data in a hash of hashes, but perhaps hash of arrays will be needed if multiple children nodes are possible.

There might be some useful tools in List::Utils, but if not, I'd probably store a sister element with depth as I walked through the tree (more efficient to build that whilst assembling the master tree, I should think).

From there it is likely to be straight subtraction of the depth values and select the smallest one(s). Would likely expect a tree walk for that part.

Probably build a reverse path hash on the way in, report on it on the way out.

For more details, post some code and show what isn't working as expected.

[reply]
[d/l]
[select]

Read more... (10 kB)

[reply]
[d/l]

Read the data with Text::CSV_XS (or Text::CSV) and then model into a Graph where you can use methods like SPT_Dijkstra, SPT_Bellman_Ford, APSP_Floyd_Warshall and their variants.

Searching CPAN might help too with various solutions to shortest-path problems.

Enjoy, Have FUN! H.Merijn

[reply]
[d/l]
[select]

The following looks to do the shortest path variant:

#!/usr/bin/perl
use warnings;
use strict;

my %group = (    # Hash table/dictionary for all the  groups
    'P'        => 'I_1',
    'Pl'       => 'I_2',
    'P.P'      => 'I_3',
    'P.Pl'     => 'I_4',
    'Pl.P'     => 'I_5',
    'Pl.Pl'    => 'I_6',
    'P.P.P'    => 'I_7',
    'P.P.Pl'   => 'I_8',
    'P.Pl.P'   => 'I_9',
    'P.Pl.Pl'  => 'I_10',
    'Pl.P.P'   => 'I_11',
    'Pl.P.Pl'  => 'I_12',
    'Pl.Pl.P'  => 'I_13',
    'Pl.Pl.Pl' => 'I_14',
    'E'        => 'II_15',
    'P.E'      => 'II_16',
    'Pl.E'     => 'II_17',
    'P.P.E'    => 'II_18',
    'P.Pl.E'   => 'II_19',
    'Pl.P.E'   => 'II_20',
    'Pl.Pl.E'  => 'II_21',
    'E.P'      => 'III_22',
    'E.Pl'     => 'III_23',
    'P.E.P'    => 'III_24',
    'P.E.Pl'   => 'III_25',
    'Pl.E.P'   => 'III_26',
    'Pl.E.Pl'  => 'III_27',
    'E.P.P'    => 'III_28',
    'E.P.Pl'   => 'III_29',
    'E.Pl.P'   => 'III_30',
    'E.Pl.Pl'  => 'III_31',
    'E.E'      => 'IV_32',
    'P.E.E'    => 'IV_33',
    'Pl.E.E'   => 'IV_34',
    'E.P.E'    => 'IV_35',
    'E.Pl.E'   => 'IV_36',
    'E.E.P'    => 'IV_37',
    'E.E.Pl'   => 'IV_38',
    'E.E.E'    => 'IV_39',
);

<DATA>;    # Skip the headers (first row).

my %tree;
while (<DATA>) {
    # parse through the input data and fill in our tree data structure
    chomp;
    my ($child, $parent, $prob) = split /\t/;

    if ($child eq 'Q') {
        push @{$tree{$child}}, {parent => '', prob => $prob, dist => 0
+};
        next;
    }

    if ($parent eq 'Q') {
        push @{$tree{$child}}, {parent => $parent, prob => $prob, dist
+ => 1};
        next;
    }

    for my $opt (@{$tree{$parent}}) {
        my $dist = $opt->{dist} + 1;
        push @{$tree{$child}},
            {parent => $parent, prob => $prob, dist => $dist};
    }
}

for my $child (sort {length $a <=> length $b or $a cmp $b} keys %tree)
+ {
    my @bestPath = findBestPath($child, \%tree);
    my $probs = join '.', map {$_->{prob}} @bestPath;
    printf "%-5s ", "$child:";

    # Join the likelihood path. Then if group is found for a likelihoo
+d
    #from the group hash table then print it, else quit
    print join '<-', $child, grep {$_} map {$_->{parent}} @bestPath;
    print ", $probs";
    print ", $group{$probs}" if exists $group{$probs};
    print "\n";
}

sub findBestPath {
    my ($child, $tree) = @_;

    return $tree->{Q}[0] if $child eq 'Q';

    my @alts = sort {$a->{dist} <=> $b->{dist}} @{$tree->{$child}};
    return $alts[0], findBestPath($alts[0]{parent}, $tree);
}

__DATA__
child,    Parent,    likelihood
M7    Q    P
M54    M7    Pl
M213    M54    E
M206    M54    E
M194    M54    E
...
[download]

Prints (in part):

Q:    Q, E, II_15
M6:   M6<-Q, E.E, IV_32
M7:   M7<-Q, P.E, II_16
M10:  M10<-Q, E.E, IV_32
M13:  M13<-M7<-Q, E.P.E, IV_35
M17:  M17<-Q, P.E, II_16
M18:  M18<-Q, E .E
M22:  M22<-Q, E.E, IV_32
M23:  M23<-Q, E.E, IV_32
M28:  M28<-M6<-Q, P.E.E, IV_33
M33:  M33<-M28<-M6<-Q, E.P.E.E
[download]

True laziness is hard work

[reply]
[d/l]
[select]

Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M7:   M7<-Q, P.
Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M54:  M54<-M7<-Q, Pl.P.
Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M194: M194<-M54<-M7<-Q, E.Pl.P.
Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M206: M206<-M54<-M7<-Q, E.Pl.P.
Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M213: M213<-M54<-M7<-Q, E.Pl.P.
[download]

[reply]
[d/l]

I truncated the data in the code I posted to reduce the number of uninteresting lines. The '...' is an ellipsis and is used to indicate missing data. If you substitute the data from Re^2: find shortest path for each query from a CSV file the code runs correctly without warnings.

True laziness is hard work

[reply]

Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M7:   M7<-Q, P.
Use of uninitialized value in join or string at check_22nov_metabolite
+_pred_2.pl line 74, <DATA> line 6.
M54:  M54<-M7<-Q, Pl.P.
[download]

[reply]
[d/l]

Just glancing at the code GrandFather posted I see a couple of spots where term 'prob' appears to be a bare word. At some point should it be a variable? I could be wrong here. There are also three dots in the __DATA__ portion that should probably be deleted too.

[reply]

M6: M6<-Q, E.E, IV_32

M6 Q E

M6: M6<-Q, E, II_15

To give you an intuition this line in data means that the probability of M6 coming from Q is 'E'. That is what I want in third column. If suppose M6 were coming from M76 which in turn comes from Q M6<-M76<-Q and the input data for these were

__DATA__
M6   M76   E
M76   Q    E
[download]

[reply]
[d/l]
[select]

Sorry, but any problem with the code is now your problem. A trivial examination of the output and thinking about it will tell you why it is as it as. Feel free to correct the code as you see fit.

True laziness is hard work

[reply]

I could not resist but I will leave the testing to you. I also suspect you have circles in your data. Being lazy, I have also used some global variables.

use warnings;
no warnings 'recursion';
use strict;

<DATA>; # Skip the headers (first row).

my %edges;
my %kids;
while (<DATA>) {
    chomp;
    my ($child, $parent, $likelihood) = split /\t/;
        $edges{$parent}{$child} = $likelihood;
        $kids{$child} = 1;
} 
my $nkids = scalar keys %kids;

my %pathes = ( Q => { path => [ 'Q' ], lh => [] } );
sub shortest_pathes {
        my( $end, $level ) = @_;
        return if $level > $nkids;
        return unless $edges{$end};
        my @lh   = @{$pathes{$end}{lh}};
        my @path = @{$pathes{$end}{path}};
        my @pre  = keys %{$edges{$end}};
        for( @pre ) {
                if( ( !exists $pathes{$_} ) or ( scalar @{$pathes{$_}{
+path}} > 1 + scalar @path ) ) {
                        $pathes{$_} = { path => [ $_, @path ], lh => [
+ $edges{$end}{$_}, @lh ] };
                }
        }
        shortest_pathes( $_, $level+1 ) for @pre;
}

shortest_pathes 'Q', 0;
delete $pathes{Q};

for (sort {substr($a,1) <=> substr($b,1)} keys %pathes) {
        print "$_\t";
        print join "<-", @{$pathes{$_}{path}};
        print "\t";
        print join ".", @{$pathes{$_}{lh}};
        print "\n";
}

__DATA__
child,    Parent,    likelihood
M7    Q    P
M54    M7    Pl
M213    M54    E
M206    M54    E
M194    M54    E
M53    M7    Pl
M186    M53    Pl
M194    M53    Pl
M187    M53    E
M204    M53    E
M201    M53    E
M202    M53    E
M179    M53    E
M13    M53    E
M157    M53    E
M173    M53    E
M205    M53    E
M195    M53    E
M196    M53    E
M197    M53    E
M198    M53    E
M57    M7    E
M44    M7    E
M61    M7    E
M13    M7    E
M50    M7    E
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
M160    M50    E
M161    M50    E
M51    M7    E
M174    M51    Pl
M50    M51    Pl
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
M160    M50    E
M161    M50    E
M173    M51    Pl
M175    M51    E
M176    M51    E
M62    M7    E
M55    M7    E
M17    Q    P
M83    M17    Pl
M54    M17    Pl
M213    M54    E
M206    M54    E
M194    M54    E
M84    M17    E
M101    M17    E
M98    M17    E
M99    M17    E
M76    M17    E
M206    M76    P
M278    M76    E
M289    M76    E
M13    M76    E
M282    M76    E
M283    M76    E
M294    M76    E
M285    M76    E
M13    M17    E
M88    M17    E
M158    M88    P
M306    M88    Pl
M307    M88    Pl
M317    M88    E
M318    M88    E
M319    M88    E
M320    M88    E
M282    M88    E
M299    M88    E
M311    M88    E
M312    M88    E
M313    M88    E
M314    M88    E
M315    M88    E
M316    M88    E
M89    M17    E
M174    M89    P
M88    M89    P
M158    M88    P
M306    M88    Pl
M307    M88    Pl
M317    M88    E
M318    M88    E
M319    M88    E
M320    M88    E
M282    M88    E
M299    M88    E
M311    M88    E
M312    M88    E
M313    M88    E
M314    M88    E
M315    M88    E
M316    M88    E
M326    M89    Pl
M333    M89    E
M335    M89    E
M336    M89    E
M283    M89    E
M330    M89    E
M331    M89    E
M332    M89    E
M102    M17    E
M92    M17    E
M93    M17    E
M94    M17    E
M95    M17    E
M22    Q    E
M6    Q    E
M28    M6    P
M115    M28    Pl
M114    M28    Pl
M112    M28    E
M117    M28    E
M113    M28    E
M107    M28    E
M33    M28    E
M50    M28    E
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
M160    M50    E
M161    M50    E
M51    M28    E
M174    M51    Pl
M50    M51    Pl
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
M160    M50    E
M161    M50    E
M173    M51    Pl
M175    M51    E
M176    M51    E
M7    M28    E
M54    M7    Pl
M213    M54    E
M206    M54    E
M194    M54    E
M53    M7    Pl
M186    M53    Pl
M194    M53    Pl
M187    M53    E
M204    M53    E
M201    M53    E
M202    M53    E
M179    M53    E
M13    M53    E
M157    M53    E
M173    M53    E
M205    M53    E
M195    M53    E
M196    M53    E
M197    M53    E
M198    M53    E
M57    M7    E
M44    M7    E
M61    M7    E
M13    M7    E
M50    M7    E
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
M160    M50    E
M161    M50    E
M51    M7    E
M174    M51    Pl
M50    M51    Pl
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
M160    M50    E
M161    M50    E
M173    M51    Pl
M175    M51    E
M176    M51    E
M62    M7    E
M55    M7    E
M36    M6    P
M126    M36    Pl
M115    M36    Pl
M131    M36    E
M132    M36    E
M127    M36    E
M138    M36    E
M139    M36    E
M120    M36    E
M33    M36    E
M88    M36    E
M158    M88    P
M306    M88    Pl
M307    M88    Pl
M317    M88    E
M318    M88    E
M319    M88    E
M320    M88    E
M282    M88    E
M299    M88    E
M311    M88    E
M312    M88    E
M313    M88    E
M314    M88    E
M315    M88    E
M316    M88    E
M89    M36    E
M174    M89    P
M88    M89    P
M158    M88    P
M306    M88    Pl
M307    M88    Pl
M317    M88    E
M318    M88    E
M319    M88    E
M320    M88    E
M282    M88    E
M299    M88    E
M311    M88    E
M312    M88    E
M313    M88    E
M314    M88    E
M315    M88    E
M316    M88    E
M326    M89    Pl
M333    M89    E
M335    M89    E
M336    M89    E
M283    M89    E
M330    M89    E
M331    M89    E
M332    M89    E
M134    M36    E
M135    M36    E
M136    M36    E
M17    M36    E
M83    M17    Pl
M54    M17    Pl
M213    M54    E
M206    M54    E
M194    M54    E
M84    M17    E
M101    M17    E
M98    M17    E
M99    M17    E
M76    M17    E
M206    M76    P
M278    M76    E
M289    M76    E
M13    M76    E
M282    M76    E
M283    M76    E
M294    M76    E
M285    M76    E
M13    M17    E
M88    M17    E
M158    M88    P
M306    M88    Pl
M307    M88    Pl
M317    M88    E
M318    M88    E
M319    M88    E
M320    M88    E
M282    M88    E
M299    M88    E
M311    M88    E
M312    M88    E
M313    M88    E
M314    M88    E
M315    M88    E
M316    M88    E
M89    M17    E
M174    M89    P
M88    M89    P
M158    M88    P
M306    M88    Pl
M307    M88    Pl
M317    M88    E
M318    M88    E
M319    M88    E
M320    M88    E
M282    M88    E
M299    M88    E
M311    M88    E
M312    M88    E
M313    M88    E
M314    M88    E
M315    M88    E
M316    M88    E
M326    M89    Pl
M333    M89    E
M335    M89    E
M336    M89    E
M283    M89    E
M330    M89    E
M331    M89    E
M332    M89    E
M102    M17    E
M92    M17    E
M93    M17    E
M94    M17    E
M95    M17    E
M34    M6    E
M35    M6    E
Q    M6    E
M10    Q    E
M46    M10    P
M216    M46    P
M189    M46    P
M244    M46    E
M246    M46    E
M247    M46    E
M248    M46    E
M249    M46    E
M243    M46    E
M70    M10    P
M257    M70    Pl
M216    M70    Pl
M267    M70    E
M265    M70    E
M266    M70    E
M250    M70    E
M268    M70    E
M269    M70    E
M270    M70    E
M261    M70    E
M262    M70    E
M263    M70    E
M264    M70    E
M72    M10    E
M75    M10    E
M71    M10    E
M23    Q    E
M18    Q    E
[download]

[reply]
[d/l]

my %group = (               # Hash table/dictionary for all the  group
+s
             'P'=> 'I_1',
             'Pl'=>'I_2',
             'P.P'=>'I_3',
             'P.Pl'=>'I_4',
             'Pl.P'=>'I_5',
             'Pl.Pl'=>'I_6',
             'P.P.P'=>'I_7',
             'P.P.Pl'=>'I_8',
             'P.Pl.P'=>'I_9',
             'P.Pl.Pl'=>'I_10',
             'Pl.P.P'=>'I_11',
             'Pl.P.Pl'=>'I_12',
             'Pl.Pl.P'=>'I_13',
             'Pl.Pl.Pl'=>'I_14',
             'E'=> 'II_15',
             'P.E'  => 'II_16',
             'Pl.E' => 'II_17',
             'P.P.E'=>'II_18',
             'P.Pl.E'=>'II_19',
             'Pl.P.E'=>'II_20',
             'Pl.Pl.E'=>'II_21',
             'E.P'=> 'III_22',
             'E.Pl'=>'III_23',
             'P.E.P'=>'III_24',
             'P.E.Pl'=>'III_25',
             'Pl.E.P'=>'III_26',
             'Pl.E.Pl'=>'III_27',
             'E.P.P'=>'III_28',
             'E.P.Pl'=>'III_29',
             'E.Pl.P'=>'III_30',
             'E.Pl.Pl'=>'III_31',
             'E.E'=>'IV_32',
             'P.E.E'=>'IV_33',
             'Pl.E.E'=>'IV_34',
             'E.P.E'=>'IV_35',
             'E.Pl.E'=>'IV_36',
             'E.E.P'=>'IV_37',
             'E.E.Pl'=>'IV_38',
             'E.E.E'=>'IV_39',
            );
[download]

[reply]
[d/l]

I would pick one of the solutions from constructing a tree from csv file and then discard all but the shortest pathes.

[reply]

Actually that solution is showing only the longest path. I dont know how to tweak it now !!!!!

[reply]

Dijkstra's algorithm

Cheers Rolf

( addicted to the Perl Programming Language)

[reply]

#! /usr/bin/perl

my %DEF = (
  I   => [qw( P Pl P.P P.Pl Pl.P Pl.Pl P.P.P P.P.Pl P.Pl.P P.Pl.Pl Pl.
+P.P Pl.P.Pl Pl.Pl.P Pl.Pl.Pl )],
  II  => [qw( E P.E Pl.E P.P.E P.Pl.E Pl.P.E Pl.Pl.E )],
  III => [qw( E.P E.Pl P.E.P P.E.Pl Pl.E.P Pl.E.Pl E.P.P E.P.Pl E.Pl.P
+ E.Pl.Pl )],
  IV  => [qw( E.E P.E.E Pl.E.E E.P.E E.Pl.E E.E.P E.E.Pl E.E.E )]
);
my @rank = map @$_, @DEF{qw(I II III IV)};
my %rank = map {$rank[$_-1] => $_} 1..@rank;
my @group = map {($_) x @{$DEF{$_}}} qw(I II III IV);
my %group = map {$rank[$_-1] => $group[$_-1]."_".$_} 1..@group;
sub rank { $rank{$a->[2]} <=> $rank{$b->[2]} }

my %T;

sub oh { map values %$_, @_ }
sub ab {
  my ($b, $a) = @_;
  [$b->[0], $a->[1], qq($a->[2].$b->[2]), qq($b->[3]<-$a->[3])]
}
sub xtend {
  my $a = shift;
  map {ab $_, $a} oh @{$T{$a->[0]}}{@_}
}
sub ins { $T{$_[3] //= $_[1]}{$_[2]}{$_[0]} = \@_ }

ins split /,\s*/ for <DATA>;
ins @$_ for map {xtend $_, qw(P E Pl)} (oh oh oh \%T);
ins @$_ for map {xtend $_, qw(P E Pl)} (oh oh oh \%T);

for (sort {rank} grep {$_->[1] eq 'Q'} (oh oh oh \%T)) {
  printf "%-4s: %20s,  %-8s %6s\n",
      $_->[0], qq($_->[0]<-$_->[3]), $_->[2], $group{$_->[2]};
}

__DATA__
M7,     Q,      P,
M7,     M28,    E,
M28,    M6,     E,
M6,     Q,      Pl,
[download]

[reply]
[d/l]


__DATA__
M7    Q    P
M54    M7    Pl
M213    M54    E
M206    M54    E
M194    M54    E
M53    M7    Pl
M186    M53    Pl
M194    M53    Pl
M187    M53    E
M204    M53    E
M201    M53    E
M202    M53    E
M179    M53    E
M13    M53    E
M157    M53    E
M173    M53    E
M205    M53    E
M195    M53    E
M196    M53    E
M197    M53    E
M198    M53    E
M57    M7    E
M44    M7    E
M61    M7    E
M13    M7    E
M50    M7    E
M158    M50    P
M157    M50    P
M153    M50    Pl
M162    M50    E
M164    M50    E
M165    M50    E
M147    M50    E
M159    M50    E
[download]

[reply]
[d/l]