matth has asked for the wisdom of the Perl Monks concerning the following question:
My program goes through this file (above) with a while statement. It seeks to remove duplicate nodes so that it does not go back to the root nodes of species, sequence etc. for each gene tag.But it does not work. I have a subroutine along the lines of: (just add a few more lines dealing with more XML nodes)1<species xxx = "sp"> 1 <sequence xx = "" xxxxx = "xxxxxxx"> 1 <genome_xxxxxx = "CDS" xxxxxx = "" xxxxxxx = "" xxxxxxxxx = " "> 1 <gene xx = "xxxxxxxxxxx" xxxxxx = "x"> 1 <gene_seq xxxxxxx = "" xxxxxx = "" xxxxxxx = "2" xxxxxxxxx = + "" xxxxx = "5999" xxxx = "6318" xxxxxxx = "" xxxxxxx = "" xxxxx +x = "F"> 1 </gene_seq> 1 </gene> 1 </genome_feature> 1 </sequence> 1</species> 2<species xxx = "sp"> 2 <sequence xx = "" xxxxx = "xxxxxxx"> 2 <genome_xxxxxx = "CDS" xxxxxx = "" xxxxxxx = "" xxxxxxxxx = " "> 2 <gene xx = "xxxxxxxxxxx" xxxxxx = "x"> 2 <gene_seq xxxxxxx = "" xxxxxx = "" xxxxxxx = "2" xxxxxxxxx = + "" xxxxx = "5999" xxxx = "6318" xxxxxxx = "" xxxxxxx = "" xxxxx +x = "F"> 2 </gene_seq> 2 </gene> 2 </genome_feature> 2 </sequence> 2</species> etc......................................... (xxxxxxs substitute real words)
The output produced from this is :sub deal_with_xml_line_by_line($){ $final_out = "new_out_again.txt"; open (OUTPUT_SLIMED, "+>>$final_out"); my ($XML_line) = @_; $XML_class_node_X_old = $XML_class_node_X; $XML_class_first_node_old = $XML_class_first_node; if ($XML_process_line =~ /^(\d{1,10})([\%|\<].{1,1000}\>)/){ print "\nhereF\n"; print "\n$1\n"; #exit; $XML_class_node_X = "$1.$2"; if ($XML_class_node_X_old == $XML_class_node_X){ #do nothing } else{ print OUTPUT_SLIMED "$XML_class_node_X\n"; return $XML_class_node_X; } } if ($XML_process_line =~ /^(\d{1,10})(\s[\%|\<].{1,1000}\>)/){ print "\nhereF\n"; print "\n$1.$2\n"; #exit; $XML_class_first_node = $1.$2; # print ":$XML_class_fist_node\n"; if ($XML_class_first_node_old == $XML_class_first_node){ #do nothing } else{ print OUTPUT_SLIMED "$XML_class_first_node\n"; return $XML_class_first_node; } } }
This is not what I want. Given the time I expect that I could solve this problem. But I have to go to bed now. Any suggestions?1 <species xxx = "sp"> 1 <sequence xx = "" xxxxx = "xxxxxxx"> 1 <genome_xxxxxx = "CDS" xxxxxx = "" xxxxxxx = "" xxxxxxxxx = " "> 1 <gene xx = "xxxxxxxxxxx" xxxxxx = "x"> 1 <gene_seq xxxxxxx = "" xxxxxx = "" xxxxxxx = "2" xxxxxxxxx = + "" xxxxx = "5999" xxxx = "6318" xxxxxxx = "" xxxxxxx = "" xxxxx +x = "F"> 2 <species xxx = "sp"> 2 <sequence xx = "" xxxxx = "xxxxxxx"> 2 <genome_xxxxxx = "CDS" xxxxxx = "" xxxxxxx = "" xxxxxxxxx = " "> 2 <gene xx = "xxxxxxxxxxx" xxxxxx = "x"> 2 <gene_seq xxxxxxx = "" xxxxxx = "" xxxxxxx = "2" xxxxxxxxx = + "" xxxxx = "5999" xxxx = "6318" xxxxxxx = "" xxxxxxx = "" xxxxx +x = "F">
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Removing duplicate subtrees from XML
by dingus (Friar) on Dec 03, 2002 at 08:28 UTC | |
|
Re: Removing duplicate subtrees from XML
by elusion (Curate) on Dec 03, 2002 at 01:55 UTC | |
by matth (Monk) on Dec 03, 2002 at 11:27 UTC | |
|
Re: Removing duplicate subtrees from XML
by dakkar (Hermit) on Dec 03, 2002 at 11:51 UTC | |
by matth (Monk) on Dec 03, 2002 at 12:49 UTC | |
by mirod (Canon) on Dec 03, 2002 at 12:58 UTC | |
|
Re: Removing duplicate subtrees from XML
by Zaxo (Archbishop) on Dec 03, 2002 at 02:15 UTC | |
|
Re: Removing duplicate subtrees from XML
by mirod (Canon) on Dec 03, 2002 at 12:41 UTC | |
by matth (Monk) on Dec 09, 2002 at 16:16 UTC |