comment on

I am trying to parse an XML file, with Perl regex (I know about the XML::Parse and ::Twig modules, but using regex is a requirement).

My XML document looks like this:

<?xml version="1.0"?>
<t_volume>
        <info>
            <info_name>FZGA34177.b1</info_name>
            <center_project>4085729</center_project>
            <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b
+ase_file>
            <it_size>35000</it_size>
            <it_stdev>3500</it_stdev>
            <plate_id>357</plate_id>
            <program_id>KB 1.3.0</program_id>
            <seq_lib_id>FZGA</seq_lib_id>
            <project_id>32913</project_id>
            <info_archive>
                <ti>2167749207</ti>
                <taxid>4555</taxid>
        <basecall_length>899</basecall_length>
                <state>active</state>
            </info_archive>
        </info>
<info>
            <info_name>FZGA34177.b1</info_name>
            <center_project>4085729</center_project>
            <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b
+ase_file>
            <it_size>35000</it_size>
            <it_stdev>3500</it_stdev>
            <plate_id>357</plate_id>
            <program_id>KB 1.3.0</program_id>
            <seq_lib_id>FZGA</seq_lib_id>
            <project_id>32913</project_id>
            <info_archive>
                <ti>2167749207</ti>
                <taxid>4555</taxid>
        <basecall_length>899</basecall_length>
                <state>active</state>
            </info_archive>
        </info>
<info>
            <info_name>FZGA34177.b1</info_name>
            <center_project>4085729</center_project>
            <base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</b
+ase_file>
            <it_size>35000</it_size>
            <it_stdev>3500</it_stdev>
            <plate_id>357</plate_id>
            <program_id>KB 1.3.0</program_id>
            <seq_lib_id>FZGA</seq_lib_id>
            <project_id>32913</project_id>
            <info_archive>
                <ti>2167749207</ti>
                <taxid>4555</taxid>
        <basecall_length>899</basecall_length>
                <state>active</state>
            </info_archive>
        </info>
<t_volume>
[download]

I have written the following code so far:

#!/usr/bin/perl
my @files = glob('/abc*/info.xml')

foreach my $xmlname(@xml)
{

    open XML, $xmlname or die "Cannot open $xmlname for reading: $!\n"
+;
    
    while($line=<XML>){
    
    if($line=~ /\<info_name\>/i){
        $info_name = $line =~ /\<info_name\>(\S+)\<\/info_name\>/i;
    }
    if($line=~ /\<it_size\>/i){
        $it_size = $line =~ /\<it_size\>(\S+)\<\/it_size\>/i;
    }
    
    }
    print "$info_name : $it_size\n";
}
[download]

I want to get these values as a hash, with the data in <info_name> as key and that in <it_size> as value??

How to go about creating a hash for this??

Thanks in advance!

In reply to Parse XML with Perl regex by ad23

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.