dmsparts has asked for the wisdom of the Perl Monks concerning the following question:

HI, I am trying to write a script that will convert an XML file with details of many Cars into a CSV file per car. I have managed to get it to work for one vehicle at a time by specifying which vehicle tag to use. I would like to process the whole file at once. i thought by using a 'foreach' statement, but i cannot work out how to do this. Here is the code i have that will work for one vehicle,
#!/usr/bin/perl use XML::Simple; ######XML PARSE########## $xml = 'e:\cars\input.xml'; $config = XMLin($xml); $exportid = "$config->{Header}->{ExportID}"; $numveh = "$config->{Summary}->{NumberOfVehicles}"; $memid = "$config->{Header}->{MemberID}"; $auctionid = "$config->{Vehicle}[0]->{AuctionID}"; $make = "$config->{Vehicle}[0]->{Manufacturer}"; $model = "$config->{Vehicle}[0]->{Model}"; $reg = "$config->{Vehicle}[0]->{RegNo}"; $year = "$config->{Vehicle}[0]->{RegYear}"; $colour = "$config->{Vehicle}[0]->{Colour}"; $fueltype = "$config->{Vehicle}[0]->{Fuel}"; $damage = "$config->{Vehicle}[0]->{Damage}"; $doors = "$config->{Vehicle}[0]->{Doors}"; $cc = "$config->{Vehicle}[0]->{CC}"; $mileage = "$config->{Vehicle}[0]->{Speedo}"; $transpeed = "$config->{Vehicle}[0]->{TransSpeed}"; $trantype = "$config->{Vehicle}[0]->{TransType}"; $trim = "$config->{Vehicle}[0]->{TrimLevel}"; $abicat = "$config->{Vehicle}[0]->{Cat}"; $reserve = "$config->{Vehicle}[0]->{Reserveprice}"; $vat = "$config->{Vehicle}[0]->{HasVAT}"; if ($vat eq 1){$hasvat = "Yes"} else {$hasvat = "No"}; $xmlstarts = "$config->{Vehicle}[0]->{Starts}"; if ($xmlstarts eq 1){$starts = "Starts"} else {$starts = "Does Not Sta +rt"}; $xmldrives = "$config->{Vehicle}[0]->{Drives}"; if ($xmldrives eq 1){$drives = "Drives"} else {$drives = "Does Not Dri +ve"}; $xmlkeys = "$config->{Vehicle}[0]->{Keys}"; if ($xmlkeys eq 1){$keyssupplied = "yes"} else {$keyssupplied = "No"}; $image1 = "$config->{Vehicle}[0]->{Images}->{Image_1}"; $image2 = "$config->{Vehicle}[0]->{Images}->{Image_2}"; $image3 = "$config->{Vehicle}[0]->{Images}->{Image_3}"; $image4 = "$config->{Vehicle}[0]->{Images}->{Image_4}"; $image5 = "$config->{Vehicle}[0]->{Images}->{Image_5}"; $image6 = "$config->{Vehicle}[0]->{Images}->{Image_6}"; $image7 = "$config->{Vehicle}[0]->{Images}->{Image_7}"; $image8 = "$config->{Vehicle}[0]->{Images}->{Image_8}"; $image9 = "$config->{Vehicle}[0]->{Images}->{Image_9}"; $image10 = "$config->{Vehicle}[0]->{Images}->{Image_10}"; $image11 = "$config->{Vehicle}[0]->{Images}->{Image_11}"; $image12 = "$config->{Vehicle}[0]->{Images}->{Image_12}"; ###########Write CSV############ $opath = "e:/out/" ; $ipath = "e:/in/" ; $add='0'; $add2='xx.csv'; $add5=''; $lineno = 10; $add5 = $exportid.$lineno ; $add3 = $opath.$exportid.$lineno.$add2 ; $add6 = "xx"; $mileage =~ tr/,//d ; print "Content-type: text/html\n"; print "\n"; open (FILENAME,">$add3"); print FILENAME "Ref No,Make,Model,Trim,Vehicle Sub Class,Colour,Engine + Size,Fuel Type,Transmission,Year,Mileage,Registration No,Keys Suppli +ed,Damage Report,VAT Applicable,Condition,FSH,ABI Category,Reserve,Ch +assis No,Vehicle Source,Agent,Region,Vehicle Location,Source Name,Dat +e Approved\n"; print FILENAME "$add5,$make,$model,$trim,$doors,$colour,$cc,$fueltype, +$transpeed $trantypr,$year,$mileage,$reg,$keyssupplied,$damage,$hasva +t,$starts $drives,No,$abicat,$reserve,na,$memid,$memid,$memid,$memid, +$auctionid,na,\n"; close FILENAME ; print "Done!";
A sample of the XML i am using
<?xml version="1.0"?> <!--DMS Auction Export - Version 1.2--> <root> <Header> <MemberID>005</MemberID> <ExportID>9376</ExportID> </Header> <Vehicle> <AuctionID>25020</AuctionID> <VehicleID>155809</VehicleID> <Ref>1477980</Ref> <Manufacturer>FORD</Manufacturer> <Model>FIESTA</Model> <RegNo>MV04xxx</RegNo> <RegYear>2004</RegYear> <Colour>SILVER/BLACK</Colour> <Fuel>Petrol</Fuel> <Damage>FRONT AND O/S</Damage> <Doors>3</Doors> <Body>Hatchback</Body> <CC>1388</CC> <Speedo>39,260</Speedo> <TransSpeed>5 Speed</TransSpeed> <TransType>Manual</TransType> <TrimLevel>LEATHER</TrimLevel> <Engine></Engine> <Cat>C</Cat> <ReservePrice>886</ReservePrice> <StartPrice>0</StartPrice> <HasVAT>0</HasVAT> <Keys>1</Keys> <Starts>1</Starts> <Drives>1</Drives> <Stereo>0</Stereo> <VINPlate>1</VINPlate> <LogBook>0</LogBook> <DateAuction>24.07.2009</DateAuction> <Location>BURSCOUGH</Location> <CanBeViewed>YES</CanBeViewed> <CostExVAT>0</CostExVAT> <PAV>0</PAV> <Images> <Image_1>742281.jpg</Image_1> <Image_2>742285.jpg</Image_2> <Image_3>742286.jpg</Image_3> <Image_4>742282.jpg</Image_4> <Image_5>742283.jpg</Image_5> <Image_6>742284.jpg</Image_6> <Image_7>742287.jpg</Image_7> <Image_8></Image_8> <Image_9></Image_9> <Image_10></Image_10> <Image_11></Image_11> <Image_12></Image_12> </Images></Vehicle> <Vehicle> <AuctionID>25019</AuctionID> <VehicleID>156171</VehicleID> <Ref>1480714</Ref> <Manufacturer>TOYOTA</Manufacturer> <Model>RAV 4</Model> <RegNo>DG52xxx</RegNo> <RegYear>2002</RegYear> <Colour>Blue</Colour> <Fuel>Petrol</Fuel> <Damage>FRONT</Damage> <Doors>3</Doors> <Body>4X4</Body> <CC>1794</CC> <Speedo>66,876</Speedo> <TransSpeed>5 Speed</TransSpeed> <TransType>Manual</TransType> <TrimLevel>CLOTH</TrimLevel> <Engine></Engine> <Cat>U</Cat> <ReservePrice>954</ReservePrice> <StartPrice>0</StartPrice> <HasVAT>0</HasVAT> <Keys>1</Keys> <Starts>1</Starts> <Drives>1</Drives> <Stereo>0</Stereo> <VINPlate>1</VINPlate> <LogBook>0</LogBook> <DateAuction>24.07.2009</DateAuction> <Location>BURSCOUGH</Location> <CanBeViewed>YES</CanBeViewed> <CostExVAT>0</CostExVAT> <PAV>0</PAV> <Images> <Image_1>742273.jpg</Image_1> <Image_2>742277.jpg</Image_2> <Image_3>742278.jpg</Image_3> <Image_4>742274.jpg</Image_4> <Image_5>742275.jpg</Image_5> <Image_6>742276.jpg</Image_6> <Image_7>742280.jpg</Image_7> <Image_8></Image_8> <Image_9></Image_9> <Image_10></Image_10> <Image_11></Image_11> <Image_12></Image_12> </Images></Vehicle> <Vehicle> <Summary> <NumberOfVehicles>39</NumberOfVehicles> <DateExport>20.07.2009 12:52:09</DateExport> </Summary> </root>
My Knowledge of PERL is very basic and have learned by altering old scripts that were in use before i started here. Many Thanks, Michael

Replies are listed 'Best First'.
Re: Parse XML into CSV Files
by dHarry (Abbot) on Jul 21, 2009 at 11:59 UTC

    I haven taken a look at the code and the XML document and tried to run it. I see many points for improvements, so many that I don't quite know where to start;)

    • Let's start by adding:
      use strict; use warnings;
      at the top of the file. This can help you catch common mistakes, unsafe programming constructs etc.
    • Next think about the variable you want to use and declare them with "my" for example
      my $xml = "e:\cars\input.xml";
      There are variables in your program which are never used.
    • You might want to use arrays instead of variables $image1...$image12 or $add, $add2...$add6.
    • use the 3 argument version of open function, see hereunder.
    • Test open, close and the print statements for failure, for example
      open (FILENAME,">", $add3); or die "Can't open $add3 $!";
    • The sample XML file you provided is not well-formed. Apart from that some of the tag names, e.g. the <Images> construct looks a bit strange, i.e. the <Image_1> to <Image_11> tags.

    While your approach should work I am going to suggest a different approach all together. XML to CSV sounds like a standard application of XSLT to me. See Sal Mangano's XSLT Cookbook, it has a recipe for transforming XML into delimited data. There are modules on CPAN to do XSLT processing.

    HTH
    Harry

      Thanks for the Help,
      I have no control over the XML. it is exported from a vehicle management program in order to put the details of vehicles onto a website.

      I have generated a couple of variables to use later on in the script, once the Vehicle processing has happened.

      Do i have to use the "my" before variables, i have not in the past and things seemed to work, however i have seen some scripts that do use it

      I will have a look at the XSLT, but i might have found the answer to my problem in another post on here.

      just need to work out what to do if the XML field is Blank.

      Cheers
        Do i have to use the "my" before variables

        No it's not mandatory, but it often makes sense to do so, i.e. explicitly declare variables that you want to use and give them a specific scope. See perldoc for documentation. Using strict and warnings is also not mandatory but using them in combination with my would for example have given you the information that some of the variables are never used.

Re: Parse XML into CSV Files
by apl (Monsignor) on Jul 21, 2009 at 12:29 UTC
    Continuing in the spirit of dHarry:
    • Break your code up into subroutines. This will enable you to look at a logically related block of code at one time, demonstrate its correctness, and ultimately determine why it works on one record, but fails when it's executed twice.
    • Consider replacing this construction
      if ($vat eq 1){$hasvat = "Yes"} else {$hasvat = "No"};
      with
      $hasvat = ($vat eq 1) ? "Yes" : "No";
Re: Parse XML into CSV Files
by toolic (Bishop) on Jul 21, 2009 at 16:26 UTC
    Here is how I would approach this problem using XML::Twig.

    This reads in the XML file, and for each 'Vehicle' element, calls a subroutine named 'vehicle'. This sub opens a new output file whose name is the value of 'AuctionID' (since that seems unique) with a '.csv' extension. It outputs a comma-separated header line, then a CSV line. I only showed 4 values, but you could easily extend the code to add more values, in the order you require. This could be something you can build upon.

    The code creates 2 output files: 25019.csv and 25020.csv.

    Note: the XML you posted is invalid because it is missing a closing 'Vehicle' tag.

    use strict; use warnings; use XML::Twig; my $twig= new XML::Twig( twig_handlers => { Vehicle => \&vehicle } ); $twig->parsefile('in.xml'); exit; sub vehicle { my ($twig, $car) = @_; return if $car->first_child('Summary'); my $file = $car->first_child('AuctionID')->text() . '.csv'; open my $fh, '>', $file or die "can not open $file: $!"; print $fh "AuctionID,Ref,VehicleID,Manufacturer\n"; print $fh join ',' , $car->first_child('AuctionID' )->text() , $car->first_child('Ref' )->text() , $car->first_child('VehicleID' )->text() , $car->first_child('Manufacturer')->text() ; print $fh "\n"; close $fh; }

    Here are the contents of output file 25019.csv:

    AuctionID,Ref,VehicleID,Manufacturer 25019,1480714,156171,TOYOTA

    Since some of your data does contain commas, it might be prudent to also use a CSV-type CPAN module such as Text::CSV_XS.

Re: Parse XML into CSV Files
by alexm (Chaplain) on Jul 21, 2009 at 12:01 UTC
    Could you please show us the code that gives you trouble with the foreach?
      I was not sure where or how to declare the "foreach" statement for the contents of the XML, Hence my queery. Regards

        Once you remove the last <Vehicle> tag from the XML, the XMLin parser should read the file and then you can iterate over the vehicle array like this:

        my @vehicle_list = @{ $config->{Vehicle} }; foreach my $vehicle ( @vehicle_list ) { print $vehicle->{AuctionID} . "\n"; }
A reply falls below the community's threshold of quality. You may see it by logging in.