sandip has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have xml file and want to parse it to flat fileI am new to perl an don't know to much. The sample data is like

<FinInstnCdtTrf> <GrpHdr> <MsgId>0000003714</MsgId> <CreDtTm>2013-03-04T16:01:57</CreDtTm> <NbOfTxs>1</NbOfTxs> <TtlIntrBkSttlmAmt Ccy="INR">234.00</TtlIntrBkSttlmAmt> <IntrBkSttlmDt>2013-02-05</IntrBkSttlmDt> <SttlmInf> <SttlmMtd>CLRG</SttlmMtd> </SttlmInf> <InstgAgt> <FinInstnId> <ClrSysMmbId> <MmbId>HOMEMEMEBER</MmbId> </ClrSysMmbId> </FinInstnId> </InstgAgt> <InstdAgt> <FinInstnId> <ClrSysMmbId> <MmbId>FOREIGNMEMBER</MmbId> </ClrSysMmbId> </FinInstnId> </InstdAgt> </GrpHdr> </FinInstnCdtTrf>

The utput should be

GrpHdr::MsgId::0000003714 GrpHdr::CreDtTm::2013-03-04T16:01:57 GrpHdr::NbOfTxs::1 GrpHdr::TtlIntrBkSttlmAmt Ccy="INR"::234.00 GrpHdr::IntrBkSttlmDt::2013-02-05 GrpHdr::SttlmInf::SttlmMtd::CLRG GrpHdr::InstgAgt::FinInstnId::ClrSysMmbId::MmbId::HOMEMEMEBER GrpHdr::InstdAgt::FinInstnId::ClrSysMmbId::MmbId::HOMEMEMEBER

Replies are listed 'Best First'.
Re: XML parsing
by Jenda (Abbot) on Mar 25, 2013 at 09:45 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { _default => sub { my ($tag,$attrs,$context) = @_; if (%$attrs) { my @tags = (@$context[1..$#$context], $tag); my $content = delete $attrs->{_content}; if (%$attrs) { foreach my $attr (keys %$attrs) { $tags[-1] .= " $attr=\"$attrs->{$attr}\""; } } if (defined $content) { print join('::',@tags),"::",$content,"\n"; } else { print join('::',@tags),"\n"; } } return; } } ); $parser->parse(\*DATA); __DATA__ <FinInstnCdtTrf> <GrpHdr> <MsgId>0000003714</MsgId> ...

    The code installs and then runs an unnamed subroutine for each tag encountered in the file. The subroutine checks whether there is any text content or attributes, prepares a list containing all parent tags except the root plus the current tag, appends attributes (if any) and then prints this list joined by '::' and appends the text content (if any) and returns nothing. The $parser->parse() in this case gets the data from the filehandle DATA (reads the stuff after __DATA__ in the script, but it can accept the XML in a scalar or from a file. Check the docs.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re: XML parsing
by hdb (Monsignor) on Mar 25, 2013 at 08:39 UTC

    This does nearly what you want:

    use strict; use XML::Simple; my $xml = <<EOX; <FinInstnCdtTrf> <GrpHdr> <MsgId>0000003714</MsgId> <CreDtTm>2013-03-04T16:01:57</CreDtTm> <NbOfTxs>1</NbOfTxs> <TtlIntrBkSttlmAmt Ccy="INR">234.00</TtlIntrBkSttlmAmt> <IntrBkSttlmDt>2013-02-05</IntrBkSttlmDt> <SttlmInf> <SttlmMtd>CLRG</SttlmMtd> </SttlmInf> <InstgAgt> <FinInstnId> <ClrSysMmbId> <MmbId>HOMEMEMEBER</MmbId> </ClrSysMmbId> </FinInstnId> </InstgAgt> <InstdAgt> <FinInstnId> <ClrSysMmbId> <MmbId>FOREIGNMEMBER</MmbId> </ClrSysMmbId> </FinInstnId> </InstdAgt> </GrpHdr> </FinInstnCdtTrf> EOX sub parsetree { my $ref = shift; my $txt = shift; foreach my $key ( keys %{$ref} ) { if( ref( $$ref{$key} ) =~ /HASH/ ) { parsetree( $$ref{$key}, $txt."::".$key ); } else { print $txt."::".$key."::".$$ref{$key}."\n"; } } } parsetree(XMLin($xml));

    Output:

    ::GrpHdr::InstgAgt::FinInstnId::ClrSysMmbId::MmbId::HOMEMEMEBER ::GrpHdr::IntrBkSttlmDt::2013-02-05 ::GrpHdr::NbOfTxs::1 ::GrpHdr::SttlmInf::SttlmMtd::CLRG ::GrpHdr::TtlIntrBkSttlmAmt::Ccy::INR ::GrpHdr::TtlIntrBkSttlmAmt::content::234.00 ::GrpHdr::CreDtTm::2013-03-04T16:01:57 ::GrpHdr::InstdAgt::FinInstnId::ClrSysMmbId::MmbId::FOREIGNMEMBER ::GrpHdr::MsgId::0000003714

      A quick and dirty fix is to just strip off any leading "::" before printing:

      sub parsetree { my $ref = shift; my $txt = shift; foreach my $key ( keys %{$ref} ) { if( ref( $$ref{$key} ) =~ /HASH/ ) { parsetree( $$ref{$key}, $txt."::".$key ); } else { my $out = $txt . '::' . $key . '::' . $$ref{$key} . "\n"; $out =~ s/^:://; print $out; } } }

      This will produce (Update: closer to) the requested output, so long as "order" doesn't matter.


      Dave

        There is one more difference to the requested output. The currency (INR) and the amount (234.00) are given as separate lines. This results from the way XML::Simple deals with <tag name=value>.