Hi Monks,

Purpose: I've hundreds of compressed (.gz) xml files from which I've to create distribution of values of 'EstimatedCPC' tags from xml files. Issue: I'm not sure how to read gzipped file in xml::twig; I don't want to unzip each file then read. Is there any way to read directly? Please help. Also code:2 is throwing "Wide character in print at /usr/local/share/perl/5.22.1/XML/Twig.pm line 8628." this error. How to fix this?

Perl Code 1:

my $file = 'file.xml'; my $twig = new XML::Twig; ## Get twig object $twig->parsefile($file); ## parse the file to build twig my $root = $twig->root; ## Get the root element of twig my @elements = $root->children; ## Get elements list of twig foreach my $e (sort @elements){ my $cpc = ($e->first_child('EstimatedCPC')->text)*100; print $cpc,"\n"; }
Perl Code 2:
$twig->parsefile( "file.xml"); # build the twig my $root= $twig->root; # get the root of the twig (stats) my @players= $root->children; # get the player list # sort it on the text of the field my @sorted= sort { $b->first_child( $field)->text <=> $a->first_child( $field)->text } @players; print '<?xml version="1.0"?>'; # print the XML declaration print '<!DOCTYPE stats SYSTEM "stats.dtd" []>'; print '<stats>'; # then the root element start tag foreach my $player (@sorted) # the sorted list { $player->print; # print the xml content of the elemen +t print "\n"; } print "</stats>\n"; # close the document
Example of xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <CatalogListings> <Offer id="af94bdd18ff9ffbf66afb5286dcb68fa"> <Command>new</Command> <Title>Puma Pitch Shorts</Title> <Description>Pitch Shorts: Let your football team look like pr +os and play like pros with these lightweight shorts from PUMA. Highly + functional materials draw sweat away from your skin and help keep yo +u dry and comfortable during exercise. Get ready for dry with dryCELL +. Bio-based wicking finish to keep you dry.</Description> <EstimatedCPC>0.0434</EstimatedCPC> <LastModified>2017-02-15 21:31:41</LastModified> <Images> <Image available="true"> <Url>http://r.kelkoo.com/r/uk/11210623/100353523/90/90 +/http%3A%2F%2Fpumaecom.scene7.com%2Fis%2Fimage%2FPUMAECOM%2F702075_25 +_01_EEA%3F%24PUMA_GRID%24/d4qCltxt.0XARAbgLGRcGsAKxgSY3iHhaVcF_7bEuPg +-</Url> </Image> </Images> <Url>http://ecs-uk.kelkoo.co.uk/ctl/go/offersearchGo?.ts=14872 +45527057&amp;.sig=Ch1dMBKSr5hhrL8bNhlNkv_GMSg&amp;catId=100353523&amp +;localCatId=100353523&amp;comId=11210623&amp;offerId=af94bdd18ff9ffbf +66afb5 286dcb68fa&amp;searchId=null&amp;affiliationId=96951977&amp;country=uk +&amp;wait=true&amp;contextLevel=2&amp;service=11</Url> <MobileFriendly>false</MobileFriendly> <Merchant id="11210623"/> <Category id="100353523"> <Name>Miscellaneous</Name> </Category> <Price currency="GBP"> <Price>20.0</Price> <DeliveryCost>3.95</DeliveryCost> <TotalPrice>23.95</TotalPrice> </Price> <ProductClass>0</ProductClass> <Availability>1</Availability> <OffensiveContent>false</OffensiveContent> <Ean>4055261425365</Ean> <MerchantCategory>Male|Mens Sports Football Pants &amp; Sho +rts</MerchantCategory> <Brand>Puma</Brand> <BrandId>2571</BrandId> <Model>Pitch Shorts</Model> <Currency>GBP</Currency> </Offer> </CatalogListings>

In reply to How to read compressed (gz) file in xml::twig by CSharma

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.