Re: Extracting elements from an XML chunk leads to crash

I think I'm in the "if you see an easier way" camp.

You seem to be working too hard. An RSS feed isn't designed to be over-written completely every time you update it. That's why items have a GUID. Just add your new items, and leave the old ones there too, for as long as you want.

Obviously the file will get too big if you leave them there for weeks, but you seem to be addressing the wrong problem -- "a certain RSS reader has a certain lookup rate and I need to match that" -- RSS is designed for agents to come and look at the feed whenever it suits them, and to figure out what's new for themselves, based on pubDates and GUIDs.

Nobody says perl looks like line-noise any more
kids today don't know what line-noise IS ...

Comment on Re: Extracting elements from an XML chunk leads to crash

Replies are listed 'Best First'.
Re^2: Extracting elements from an XML chunk leads to crash by Hue-Bond (Priest) on Oct 24, 2007 at 08:36 UTC
Don't you need a call to `XML::LibXML::XPathContext->new` there somewhere, passing the fragment node created by parse_balanced_chunk? Honestly I don't know, since I'm beginning with all this `XML::LibXML` stuff. Just add your new items, and leave the old ones there too, for as long as you want. [...] "a certain RSS reader has a certain lookup rate and I need to match that" Of course I wasn't going to match Google Reader's polling time, but take the opportunity to implement an expiry time of a couple of weeks or so. I just discovered I can use a DOM-like interface, and quickly went on this road. However I found another segfault condition. I'm going to notify the module's author but first I would like to share the offending code with you monks, so I don't send them a suboptimal snippet. This is it: use warnings; use strict; use XML::LibXSLT; use XML::LibXML; my $parser = XML::LibXML->new; my $xslt = XML::LibXSLT->new; ## original web page my $xml_src = $parser->parse_html_string (<<'EOT'); <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>foo</title> </head> <body> <p>foo</p> </body> </html> EOT ## which we will "transform" my $stylesheet = $xslt->parse_stylesheet ($parser->parse_string (<<'EO +T')); <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" indent="yes" xmlns:xsl="http://www.w3.or +g/1999/XSL/Transform"> <xsl:template match="/"> <channel> <item> <title>one item</title> </item> </channel> </xsl:template> </xsl:stylesheet> EOT my $parsed = $stylesheet->transform ($xml_src); ## we'll move this item to another document my $item = ($parsed->getElementsByTagName ('item'))[0]; $item->unbindNode; ## this is the other document: my $saved = $parser->parse_string (<<'EOT'); <?xml version="1.0"?> <channel> <item> <title>other item</title> </item> </channel> EOT my $ch = ($saved->findnodes ('/channel'))[0]; my $addto = ($ch->findnodes ('item'))[0]; $ch->insertBefore ($item, $addto); print "going to boom...\n"; END { print "unreached\n"; } [download] In particular I'm concerned about those `(findnodes)[0]`. I tried a couple of other approaches but didn't find anything that returned only the first element. The XSLT transformation does nothing interesting but is actually needed to crash the program. -- David Serrano	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Extracting elements from an XML chunk leads to crash
by Hue-Bond (Priest) on Oct 24, 2007 at 08:36 UTC

Don't you need a call to XML::LibXML::XPathContext->new there somewhere, passing the fragment node created by parse_balanced_chunk?

Honestly I don't know, since I'm beginning with all this XML::LibXML stuff.

Just add your new items, and leave the old ones there too, for as long as you want. [...] "a certain RSS reader has a certain lookup rate and I need to match that"

Of course I wasn't going to match Google Reader's polling time, but take the opportunity to implement an expiry time of a couple of weeks or so.

I just discovered I can use a DOM-like interface, and quickly went on this road. However I found another segfault condition. I'm going to notify the module's author but first I would like to share the offending code with you monks, so I don't send them a suboptimal snippet. This is it:

use warnings;
use strict;
use XML::LibXSLT;
use XML::LibXML;

my $parser = XML::LibXML->new;
my $xslt = XML::LibXSLT->new;

## original web page
my $xml_src = $parser->parse_html_string (<<'EOT');
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <title>foo</title>
</head>
<body>
  <p>foo</p>
</body>
</html>
EOT

## which we will "transform"
my $stylesheet = $xslt->parse_stylesheet ($parser->parse_string (<<'EO
+T'));
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" indent="yes" xmlns:xsl="http://www.w3.or
+g/1999/XSL/Transform">
  <xsl:template match="/">
    <channel>
      <item>
        <title>one item</title>
      </item>
    </channel>
  </xsl:template>
</xsl:stylesheet>
EOT

my $parsed = $stylesheet->transform ($xml_src);
## we'll move this item to another document
my $item = ($parsed->getElementsByTagName ('item'))[0];
$item->unbindNode;

## this is the other document:
my $saved = $parser->parse_string (<<'EOT');
<?xml version="1.0"?>
<channel>
  <item>
    <title>other item</title>
  </item>
</channel>
EOT

my $ch = ($saved->findnodes ('/channel'))[0];
my $addto = ($ch->findnodes ('item'))[0];

$ch->insertBefore ($item, $addto);

print "going to boom...\n";
END { print "unreached\n"; }
[download]

In particular I'm concerned about those (findnodes)[0]. I tried a couple of other approaches but didn't find anything that returned only the first element. The XSLT transformation does nothing interesting but is actually needed to crash the program.

--
David Serrano

[reply]
[d/l]
[select]