jmno has asked for the wisdom of the Perl Monks concerning the following question:

I want to convert an xml file to csv. I found some code online and modified to my purpose, but I dont know enough about perl to further fix it.
use strict; use XML::Rules; use Text::CSV_XS; use FileHandle; my $csv = Text::CSV_XS->new({eol => "\n"}); my $parser = XML::Rules->new( rules => [ _default => 'content', Class => sub { $csv->print( $_[4]->{parameters}, [ map {$_[1]->{$_}} qw( Subject Course Title Descripti +on Prequisites Corequisites Requisites LectureHours LaboratoryHours C +reditHours Flags ) ]); return; } ] ); open my $FH, '>&STDOUT'; open my $File, 'alpha31.xml'; print $FH "Subject,Course,Title,Description,Prequisites,Corequisites,L +ectureHours,LaboratoryHours,CreditHours,Flags\n"; $parser->parse( $File, $FH);
And the brief contents of alpha31.xml
<Class> <Subject>AAH</Subject> <Course>119</Course> <Title>History of World Architecture I</Title> <Description>Comprehensive background as well as concentration on indi +vidual cultures and their architects from ancient to modern times. Di +scussion of architectures from around the world. Specific details and + expressions of more generalized theories and strategies will be explo +red.</Description> <LectureHours>3</LectureHours> <LaboratoryHours>0</LaboratoryHours> <CreditHours>3</CreditHours> <LectureHours>3</LectureHours> <LaboratoryHours>0</LaboratoryHours> <CreditHours>3</CreditHours> <Flags>(H)(C)</Flags> </Class> <Class> <Subject>AAH</Subject> <Course>120</Course> <Title>History of World Architecture II</Title> <Description>Comprehensive background as well as concentration on indi +vidual cultures and their architects from ancient to modern times. Di +scussion of architectures from around the world.Specific details and expressions of more generalized theories and strategies will be explor +ed.</Description> <LectureHours>3</LectureHours> <LaboratoryHours>0</LaboratoryHours> <CreditHours>3</CreditHours> <LectureHours>3</LectureHours> <LaboratoryHours>0</LaboratoryHours> <CreditHours>3</CreditHours> <Flags>(H)(C) <Flags> </Class> <Class> <Subject>AAH</Subject> <Course>301</Course> <Title>Thinking About Art</Title> <Description>A course designed for those who find art pleasing, meanin +gful or significant and who want to extend the range of their sensibi +lities. Theories of art will be studied for insight, as well as for h + istorical interest and continuity. Works of art will be studied for th +eir intrinsic value, for their relation to ideas and events, and as c +ultural artifacts. Regular visits to area museums and galleries will be required.</Description> <Prerequisite>HUM 102, 104, or 106.</Prerequisite> <LectureHours>3</LectureHours> <LaboratoryHours>0</LaboratoryHours> <CreditHours>3</CreditHours> <Flags>(H)(C)</Flags> </Class> <Class> <Subject>AAH</Subject> <Course>322</Course> <Title>19th Century American Art and Culture</Title> <Description>This course explores the artistic history of the United S +tates, from an agrarian society that developed into an industrialized + nation with a distinguished national art. This broad chronological s + urvey begins with the colonial art of Copley, Peale, West and Stuart, +followed by the nation building iconography of the Hudson River Schoo +l. The art of Mount and Bingham reflect antebellum culture, followed by Johnson in post-Civil War America on the eve of the Gilded Age. Fin +ally, the course examines the realism of Homer and Eakins, defining a + truly American iconography.</Description> <Prerequisite>HUM 102, 104, or 106.</Prerequisite> <LectureHours>3</LectureHours> <LaboratoryHours>0</LaboratoryHours> <CreditHours>3</CreditHours> <Flags>(H)(C)</Flags> </Class>
It converts it to xml the way I want to, but it only does a single line and then complains about garbage after the </class> tag. Could anyone help me have this work on the entire file. I have posted 4 entries from the xml file which should print out 4 csv rows.

Replies are listed 'Best First'.
Re: xml to csv
by Anonymous Monk on Nov 04, 2009 at 02:38 UTC
    It converts it to xml the way I want to, but it only does a single line and then complains about garbage after the </class> tag.

    Google XML JUNK AFTER TAG, basically it means you do not have XML (if it is not properly formatted XML, it is not XML).

Re: xml to csv
by Jenda (Abbot) on Nov 04, 2009 at 14:19 UTC

    You've encountered the silly "each XML document must have one and only one root tag" restriction. Let's see what do the XML::Rules docs say about that:

    If you need to parse a XML file without the top-most tag (something that each and any sane person would allow, but the XML comitee did not), you can parse
    <!DOCTYPE doc [<!ENTITY real_doc SYSTEM "$the_file_name">]><doc>&rea +l_doc;</doc>
    instead.

    And the

    map {$_[1]->{$_}} qw( Subject Course Title Description Prequisites Corequisites Requisites LectureHours LaboratoryHours CreditHours Flags )
    is better written
    @{$_[1]}{qw( Subject Course Title Description Prequisites Corequisites Requisites LectureHours LaboratoryHours CreditHours Flags )}
    No need to map(), just slice the hash.

    You may also want to add the stripspaces => 3 to ensure the whitespace around the <Class> tags is not being accumulated in the $_[1]->{_content} for the handler of the root tag.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      #!/usr/bin/perl -- use warnings; use strict; use XML::Rules; use Text::CSV_XS; Main(@ARGV); exit(0); sub Main { use autodie; open my $File, shift || 'alpha31.xml'; RuleCvs( $File, \*STDOUT ); #~ use FileHandle; #~ RuleCvs( FileHandle->new(shift // 'alpha31.xml'), \*STDOUT ); use + 5.10.0; #~ RuleCvs( FileHandle->new(shift || 'alpha31.xml'), \*STDOUT ); #~ RuleCvs( FileHandle->new(shift), \*STDOUT ); } ## end sub Main BEGIN { my (@Heads) = qw( Subject Course Title Description Prequisites Corequisites Requisites LectureHours LaboratoryHours CreditHours Flags ); sub RuleCvs { my ( $InFh, $OutFh ) = @_; my $csv = Text::CSV_XS->new( { eol => "\n" } ); my $parser = XML::Rules->new( rules => [ _default => 'content', Class => sub { $csv->print( $_[4]->{parameters}, [ @{ $_[1] }{@Heads} ] ); return; } ] ); print $OutFh join( ',', @Heads ), "\n"; $parser->parse( $InFh, $OutFh ); return; } ## end sub RuleCvs } ## end BEGIN __END__
Re: xml to csv
by llancet (Friar) on Nov 04, 2009 at 01:59 UTC
    I hate such kind of XML parsers. They are weird. I think something like XML::XPath is much easier to use and understand...

      I hate a lot of things. And I think something like XML::XPath is much harder to use and understand.

      Show us your code that does the same and doesn't run out of memory for huge files!

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.