in reply to Text manipulation on a file with multiple entries, obo format
I am not aware of the OBO data set , but i was just checking the data set and thought i can share some technical inputs on it.
Your requirement was clear enough to understand in that you wish to build a data structure like a hash of arrays, or any such data structure from which you can readily extract your data based on the id key.
However as i mentioned your data set has some interesting features. I hope the OBO Parser solves your problems, coz if you had to work with such data sets and write code from scratch it is hardly extensible.
But in any case, i was able to find a way to structure your dataset using the space delimiter and the '!' character delimiter. Actually if you consider it , it might not be a great approach, but again that is your data set :D
If you were to practically wish to achieve this you would need a thorough understanding of the data structures in Perl. But not to worry, you can read the docs and figure it out . perldata, perlreftut to begin with.
The key feature i would say i found that was needed for me to make an extraction , was the use of anonymous arrays and references.
Now even though you might say the code works, i can only surmise it is hardly extensible in case your requirement changes, and if you were asked to analyse a dataset of a million or so records, i think it is best you have someone use the standard module (like OBO::Parser )
Note - I created a file of your data in the OP and passed it as an argument to this script below
Output#!/usr/bin/perl use strict; my (%hash,$hash_id); my $isa_array_ref; open(my $fh,"<",$ARGV[0]) || die "$0: can't open $ARGV[0] for reading: $!"; LINE: while(<$fh>){ chomp($_); next LINE if ($_ eq "Term"); #split on first blank space my @TermRow = split(/ /,$_,2); if($TermRow[0] eq 'id:'){ $hash_id = $TermRow[1]; $isa_array_ref = undef; } elsif($TermRow[0] eq 'is_a:'){ my @TermISAText = split(/!/,$TermRow[1]); #checking if anonymous array reference already exists if($isa_array_ref){ my @temp_array = @{$isa_array_ref}; push(@temp_array,$TermISAText[1]); $isa_array_ref = \@temp_array; $hash{$hash_id} = $isa_array_ref; } else{ #creating an anonymous array reference $isa_array_ref= [$TermISAText[1]]; $hash{$hash_id} = $isa_array_ref; } } } close($fh); print "Result of Extraction:\n "; my @id_keys = keys %hash; foreach(@id_keys){ print "key : $_"; print "list of values \n"; foreach(@{$hash{$_}}){ print $_,"\n"; } print "\n"; }
XXXXXX:progs$ perl term_reader.pl ./term.txt Result of Extraction: key : HP:0000008list of values Abnormal internal genitalia Abnormality of the female genitalia key : HP:0000007list of values Mode of inheritance
|
|---|