Saner has asked for the wisdom of the Perl Monks concerning the following question:

Hiya.
I have a XMLTV file that I want to parse to import some info into a mysql database. I had someone help me with the basics, but I am still running into problems.
Original file - http://paste.ubuntu.com/11109433/ (Large File!)
from this file I want to strip some of the nonsense so I can get an output of
UPDATE channel SET channum="101",xmltvid="9601.dvb.guide" WHERE callsi +gn="RTE One");
I have this so far.
#!/usr/bin/perl use v5.14; use warnings; until(eof()) { my ($id, $chan) = <> =~ /id="([^"]*)".*number="(\d+)"/; my ($sign) = <> =~ />(.*)</; <>; # Skip </channel> say qq(UPDATE channel SET channum="$chan",xmltvid="$id" WHERE call +sign="$sign"); }
but when I run it, it never ends it just seems to hang. If I press enter I get some uninitialized warnings (obviously I can stop these with no warnings 'uninitialized'; ) but it seems to get to this line and then stop.
channel id="9001.dvb.guide" <!-- number="65535" type="0x95" flags="0xf +fff" bouquet="4109" region="0" sid="9001" -->> <display-name>[65535.9001.(null)]</display-name>
Any pointers as to where I am going wrong would be great.
Thanks in advance.
S.

Replies are listed 'Best First'.
Re: Pasing an XML file to generate a certain output.
by Corion (Patriarch) on May 13, 2015 at 07:58 UTC

    Why not just use a real XML parser?

    use XML::Twig; # at most one div will be loaded in memory my $twig=XML::Twig->new( twig_handlers => { channel => sub { my( $node )= @_; my( $id )= $node->{'att'}->{id}; my( $chan )= $node->{'att'}->{chan}; my( $sign )= $node->text; say qq(UPDATE channel SET channum="$chan",xmltvid="$id" WHERE +callsign="$sign"); $node->purge; }, }, ); $twig->parsefile( 'my_big.xml');
      eh, i was trying the same solution but with the code below (your's) and with my attempt too i get not well-formed (invalid token) at line 1, column 29, byte 29 at.. tht seems to be the comment.
      If i entirely remove the comment it runs fine. i'm playng with comments => "process" as the authors said, but with no luck.

      #!/usr/bin/perl use XML::Twig; # at most one div will be loaded in memory my $twig=XML::Twig->new( twig_handlers => { channel => sub { my( $node )= @_; my( $id )= $node->{'att'}->{id}; my( $chan )= $node->{'att'}->{chan}; my( $sign )= $node->text; print qq(UPDATE channel SET channum="$chan",xmltvid="$id" WHER +E callsign="$sign"); $node->purge; }, }, ); $twig->parse(<DATA>); __DATA__ <channel id="6945.dvb.guide" <!-- number="84" type="0x19" flags="0xf" +bouquet="4110" region="2000000e" sid="6945" --> > <display-name>6945</display-name> </channel>

      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

        If i entirely remove the comment it runs fine. i'm playng with comments => "process" as the authors said, but with no luck.

        Hmm, xml parsers are required to reject invalid xml -- comments inside tags are invalid xml :)

        I wouldn't play anymore :D

      Thanks for the speedy reply!
      I am new to this and had not heard of this method, I had tried a few others, and someone suggested perl so I got caught up in regexs and stuff. I didnt even know of this method.
      I feel I am close with this, but I get an error on line 11.
      perl xml2 syntax error at xml2 line 11, near "say qq(UPDATE channel SET channum= +"$chan",xmltvid="$id" WHERE callsign="$sign")" Execution of xml2 aborted due to compilation errors.
      Any pointers ?
        try adding use feature qw/ say /;
      Cheers everyone, I managed between all the help here to get it working. Its not the most elegant solution (I strip out all the junk and then parse it) but it works.
      As to where the file came from, it is a dump from SkyUK's OpenTV system. by default (by design i think) I can only get Now / Next , using this setup I can get a full 7 days on my PVR.
      Thanks once again for qll your input.
Re: Parsing an XML file to generate a certain output.
by hippo (Archbishop) on May 13, 2015 at 08:27 UTC

    Your code as it stands reads from STDIN. If it "just seems to hang" it is possible that you are not feeding it the data on STDIN. In that case, either send it the data via STDIN or amend your code to read from a file or URL or some other channel instead.

Re: Pasing an XML file to generate a certain output.
by locked_user sundialsvc4 (Abbot) on May 13, 2015 at 13:39 UTC

    Where in the heck did this file come from?   If it’s not valid XML, then maybe there’s a rather serious bug in the program that generated it, and that ought to be the place to begin.   If a generated-output file isn’t valid XML, then I quite frankly would not trust the rest of its content, either.

    Beyond that, yes:   use some XML parser.   There are two types:   those which slurp the entire file into memory, and those that don’t.   Either way, this frees you from trying to monkey with regular-expressions yourself.

    Given that your available-memory is probably much larger than the file, strongly consider using “XPath expressions,” which are a sort of query-language for XML files, implemented e.g. by XML::LibXML.   This technique avoids you having to write program-logic that mimics the structure of the file you wish to process.   You simply write an XPath string that describes what you are looking for, and the library returns them all to you as a [Perl ...] data-structure.