ad23 has asked for the wisdom of the Perl Monks concerning the following question:
Hello all,
I have multiple XML files in a directory. They have the following information:
<?xml version="1.0"?> <t_volume> <info> <info_name>FZGA34177.b1</info_name> <center_project>4085729</center_project> <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b +ase_file> <qual_file>SETARIA_ITALICA/JGI/qscore/FZGA34177.b1.qscore< +/qual_file> <it_flank_left>AATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGG +ATCCCAC</it_flank_left> <it_flank_right>GTGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGAG +TATTCTAT</it_flank_right> <it_size>35000</it_size> <it_stdev>3500</it_stdev> <plate_id>357</plate_id> <program_id>KB 1.3.0</program_id> <seq_lib_id>FZGA</seq_lib_id> <ncbi_project_id>32913</ncbi_project_id> <ncbi_info_archive> <ti>2167749207</ti> <taxid>4555</taxid> <basecall_length>899</basecall_length> <load_date>Nov 26 2008 4:06PM</load_date> <state>active</state> </ncbi_info_archive> </info> <info> <info_name>FZGA34178.b1</info_name> <center_project>4085729</center_project> <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b +ase_file> <qual_file>SETARIA_ITALICA/JGI/qscore/FZGA34177.b1.qscore< +/qual_file> <it_flank_left>AATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGG +ATCCCAC</it_flank_left> <it_flank_right>GTGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGAG +TATTCTAT</it_flank_right> <it_size>12000</it_size> <it_stdev>1200</it_stdev> <plate_id>357</plate_id> <program_id>KB 1.3.0</program_id> <seq_lib_id>FZGA</seq_lib_id> <ncbi_project_id>32913</ncbi_project_id> <ncbi_info_archive> <ti>2167749207</ti> <taxid>4555</taxid> <basecall_length>899</basecall_length> <load_date>Nov 26 2008 4:06PM</load_date> <state>active</state> </ncbi_info_archive> </info> <info> <info_name>FZGA34179.b1</info_name> <center_project>4085729</center_project> <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b +ase_file> <qual_file>SETARIA_ITALICA/JGI/qscore/FZGA34177.b1.qscore< +/qual_file> <it_flank_left>AATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGG +ATCCCAC</it_flank_left> <it_flank_right>GTGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGAG +TATTCTAT</it_flank_right> <it_size>7000</it_size> <it_stdev>700</it_stdev> <plate_id>357</plate_id> <program_id>KB 1.3.0</program_id> <seq_lib_id>FZGA</seq_lib_id> <ncbi_project_id>32913</ncbi_project_id> <ncbi_info_archive> <ti>2167749207</ti> <taxid>4555</taxid> <basecall_length>899</basecall_length> <load_date>Nov 26 2008 4:06PM</load_date> <state>active</state> </ncbi_info_archive> </info> <info> <info_name>FZGA34180.b1</info_name> <center_project>4085729</center_project> <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b +ase_file> <qual_file>SETARIA_ITALICA/JGI/qscore/FZGA34177.b1.qscore< +/qual_file> <it_flank_left>AATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGG +ATCCCAC</it_flank_left> <it_flank_right>GTGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGAG +TATTCTAT</it_flank_right> <it_size>3000</it_size> <it_stdev>300</it_stdev> <plate_id>357</plate_id> <program_id>KB 1.3.0</program_id> <seq_lib_id>FZGA</seq_lib_id> <ncbi_project_id>32913</ncbi_project_id> <ncbi_info_archive> <ti>2167749207</ti> <taxid>4555</taxid> <basecall_length>899</basecall_length> <load_date>Nov 26 2008 4:06PM</load_date> <state>active</state> </ncbi_info_archive> </info> <info> <info_name>FZGA34181.b1</info_name> <center_project>4085729</center_project> <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b +ase_file> <qual_file>SETARIA_ITALICA/JGI/qscore/FZGA34177.b1.qscore< +/qual_file> <it_flank_left>AATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGG +ATCCCAC</it_flank_left> <it_flank_right>GTGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGAG +TATTCTAT</it_flank_right> <it_size>7000</it_size> <it_stdev>700</it_stdev> <plate_id>357</plate_id> <program_id>KB 1.3.0</program_id> <seq_lib_id>FZGA</seq_lib_id> <ncbi_project_id>32913</ncbi_project_id> <ncbi_info_archive> <ti>2167749207</ti> <taxid>4555</taxid> <basecall_length>899</basecall_length> <load_date>Nov 26 2008 4:06PM</load_date> <state>active</state> </ncbi_info_archive> </info>
I want to retrieve some information from these files as a hash (key and value).
I am trying to parse this XML file as :
#!/usr/bin/perl use XML::Simple; use Data::Dumper; $xml = new XML::Simple(KeyAttr=>[]); $data = $xml -> XMLin("InfoFile.xml"); #print Dumper($info); print "XML read in\n"; foreach $e (@{$data->{info}}) { print $e->{info_name},"\n"; print "it Size: ", $e->{it_size}, "\n"; print "\n"; }
My code is reading the XML file, but is not printing in foreach loop. Can someone please suggest me as where I am going wrong?
Furthermore, I have multiple fasta files and I want to compare the XML key information (i.e. <info_name>...</info_name>) with the Fasta header name. The example of a fasta file is:
>FZGA34177.b1 bg_2167749207 CATAACAGGAGAGTAAACATGTAACTCCTATAACTCGCGGGGTGTGCTGTTATTACCTCCTTGGTGGAAC AGGAAACCTGGGAAACGCTTGTTCAGATATTCGTCTGTTTCCCATGTTGCTTCATCTTCAGTGTGGTTTA >FZGA34178.b1 bg_2167749208 ACTCTCTTGAGGCATTCACCGGATTGACCGGCGGTGTCCTGGAAGGAGGTGTCCTTCAGGCCTCGTTCAG TAGCATAGGATTGGCACTAGACCAAATTTTGATCATGGTCAGGATCGAGTGGATCCTGTTTTCTCATTGA AACTTGGTGACTAATCATTCCTCCCCAGGATCAAAACCATTGATTCAAAAGCAGTGTTTGGCTGGAGAGG AAAGAAAACAGGGGATCAAATAGAGCTGTACTAGAAAGCAATGAACAGAGCTGGCTAGGATCCAGAGCCA >FZGA34179.b1 bg_2167749209 CAGCCTTGGCCGACAGGCCCGGGTAATCTTGGGAAATTTCATCGTGATGGGGATAGATCATTGCAATTGT TGGTCTTCAACGAGGAATGCCTAGTAAGCGCGAGTCATCAGCTCGCGTTGACTACGTCCCTGCCCTTTGT ACACACCGCCCGTCGCTCCTACCGATTGAATGGTCCGGTGAAGTGTTCGGATCGCGGCGACGGAGGCGGT
If the header element of fasta (eg: FZGA34177.b1) matches <info_name> (eg: <info_name>FZGA34177.b1</info_name>), it will check the hash value (eg: <it_size>35000</it_size>) and write the fasta sequence (header and sequence both), to a new file (eg: 35000.fasta.output). Similarly, there will be various other files corresponding to "it_size". The issue is these XML files and fasta files are multiple files, and I thus need to read all of them all together in order to find the sequence corresponding to <it_size>.
Can someone please guide me as to how to go about this problem??
Thanks.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parse XML and compare with Fasta in Perl
by toolic (Bishop) on Jul 06, 2010 at 20:10 UTC | |
by ad23 (Acolyte) on Jul 06, 2010 at 20:41 UTC | |
by toolic (Bishop) on Jul 06, 2010 at 20:50 UTC | |
|
Re: Parse XML and compare with Fasta in Perl
by graff (Chancellor) on Jul 07, 2010 at 02:45 UTC | |
|
Re: Parse XML and compare with Fasta in Perl
by graff (Chancellor) on Jul 07, 2010 at 03:05 UTC |