kungfoo,monkee has asked for the wisdom of the Perl Monks concerning the following question:
Hi!
So, I admit I am a little new to perl, but I think I can communicate what I need to be done just fine, so be a little patient with me. And I am sorry it's a little long.
What I am trying to do is successfully parse an XML file. I figured that I can do it with XML::Simple module and together a friend and I have successfully put together something that does just that, but it's a little messy. So here's where we'd like to go next.
1) So, the XML file is 2 gigs. And to grab information from it, it needs to through line by line. I know XML::Simple puts everything into a hash, but it's behaving very poorly. (I'll show why below). What I want to do, is to be able to jump to a specific line in a file. So, for example, I get input A, I need somehow to know that more information about Input A is located at some line in the file, which I will call B. So, what I want to know is the byte location of line B. I know how to find the line that I want using XML::Parser and handlers, but I don't know how to get this byte location and later how to jump to it.
B) If that's not possible, then here's what I mean by the code being messy. This in a excerpt.
# read XML file $data = $xml->XMLin($contents, keyattr => {property => 'type'}); # finding protein names @names = (); $names_ref = $data->{entry}->{protein}->{name}; if (ref($names_ref) eq 'ARRAY') ## more than one name { @nameArray = @$names_ref; ## so derefrence to array and + step through foreach $nameA_ref (@nameArray) { if (ref($nameA_ref) eq 'HASH') ## it shouldn't be a has +h, but sometimes it is { %nameTable = %$nameA_ref; push (@names, $nameTable{"content"}); } else ## it is a friendly scalar { push (@names, $nameA_ref); } } } else ## only one name, so $names_ref is probably a scalar { if (ref($names_ref) eq 'HASH') ## it shouldn't be a hash, b +ut sometimes it is { %namesTable = %$names_ref; push (@names, $namesTable{"content"}); } else ## it is a friendly scalar { push (@names, $names_ref); } }
This is how the data is being processed in teh file. I am not sure why a 'HASH' or sclar suddenly comes up. I've been trying to figure out it ForceArray does anything, and kinda how to use it. So far it's only given errors, even though I think I've been using it right.
Anyway the above method does seem to work, but it's just not very nice. I can't change the XML in anyway, so maybe it's not suppose to be very nice to grab info out and maybe our method is right. I appreciate any help. If curious, a sample of the XML format is here, http://beta.uniprot.org/uniprot/P15455.xml
Thanks!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Jumping to a location in a file
by Limbic~Region (Chancellor) on May 12, 2008 at 23:16 UTC | |
|
Re: Jumping to a location in a file
by dragonchild (Archbishop) on May 13, 2008 at 00:48 UTC | |
by Anonymous Monk on May 13, 2008 at 01:41 UTC | |
|
Re: Jumping to a location in a file
by scorpio17 (Canon) on May 13, 2008 at 13:31 UTC |