Hi

I am not aware of the OBO data set , but i was just checking the data set and thought i can share some technical inputs on it. Your requirement was clear enough to understand in that you wish to build a data structure like a hash of arrays, or any such data structure from which you can readily extract your data based on the id key.
However as i mentioned your data set has some interesting features. I hope the OBO Parser solves your problems, coz if you had to work with such data sets and write code from scratch it is hardly extensible.
But in any case, i was able to find a way to structure your dataset using the space delimiter and the '!' character delimiter. Actually if you consider it , it might not be a great approach, but again that is your data set :D
If you were to practically wish to achieve this you would need a thorough understanding of the data structures in Perl. But not to worry, you can read the docs and figure it out . perldata, perlreftut to begin with.
The key feature i would say i found that was needed for me to make an extraction , was the use of anonymous arrays and references. Now even though you might say the code works, i can only surmise it is hardly extensible in case your requirement changes, and if you were asked to analyse a dataset of a million or so records, i think it is best you have someone use the standard module (like OBO::Parser )
Note - I created a file of your data in the OP and passed it as an argument to this script below

#!/usr/bin/perl use strict; my (%hash,$hash_id); my $isa_array_ref; open(my $fh,"<",$ARGV[0]) || die "$0: can't open $ARGV[0] for reading: $!"; LINE: while(<$fh>){ chomp($_); next LINE if ($_ eq "Term"); #split on first blank space my @TermRow = split(/ /,$_,2); if($TermRow[0] eq 'id:'){ $hash_id = $TermRow[1]; $isa_array_ref = undef; } elsif($TermRow[0] eq 'is_a:'){ my @TermISAText = split(/!/,$TermRow[1]); #checking if anonymous array reference already exists if($isa_array_ref){ my @temp_array = @{$isa_array_ref}; push(@temp_array,$TermISAText[1]); $isa_array_ref = \@temp_array; $hash{$hash_id} = $isa_array_ref; } else{ #creating an anonymous array reference $isa_array_ref= [$TermISAText[1]]; $hash{$hash_id} = $isa_array_ref; } } } close($fh); print "Result of Extraction:\n "; my @id_keys = keys %hash; foreach(@id_keys){ print "key : $_"; print "list of values \n"; foreach(@{$hash{$_}}){ print $_,"\n"; } print "\n"; }
Output
XXXXXX:progs$ perl term_reader.pl ./term.txt Result of Extraction: key : HP:0000008list of values Abnormal internal genitalia Abnormality of the female genitalia key : HP:0000007list of values Mode of inheritance

The Great Programmer is one who inspires others to code, not just one who writes great code

In reply to Re: Text manipulation on a file with multiple entries, obo format by perlron
in thread Text manipulation on a file with multiple entries, obo format by Sakti

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.