To ease the auto-discovery of trackback ping data and therefore make it easier to send trackback pings to blogs, many blogs contain blocks of RDF embedded in the HTML which contain information on how to ping.
The Net::Trackback module uses regexes to extract the relevant data from the html so that a user can format a ping request. That's one way of doing it. Here's another. This code, uses a regex to extract the RDF blocks from the html (this is not perfect, but there is no canonical way of doing this at the moment. XSLT would be best but I couldn't find an implementation - you never know I might get round to it myself one day :) ).
Once we have the RDF, this code uses RDF::Redland to parse it and extract the RDF graphs using RDQL. The graphs are put into a hash and then into an array (of hash refs), with the aim that they could then be passed to some other perl process/program/subroutine/whatever for more munging.
#!/usr/bin/perl use strict; use warnings; use LWP::Simple; use RDF::Redland; my $url = $ARGV[0] || die "Usage: $0 <url>\n\n"; my $html = get($url) || die "Whoops! Couldn't get $url, $!"; # place to store all our ping data my @trackback_data; # use a regex to get data out of html while ($html =~ /(<\s*rdf:RDF.*?<\s*\/rdf:RDF\s*>)/sg) { my $rdf = $1; my $storage = RDF::Redland::Storage->new("memory") || die "unable to + create storage"; my $model = RDF::Redland::Model->new($storage, "") || die "unable to + create model"; my $base_uri = RDF::Redland::URI->new($url); my $parser = RDF::Redland::Parser->new("rdfxml", "application/rdf+xm +l"); my $stream; # some data might be corrupt so we use an eval block in case the # parser chokes on it eval { $stream = $parser->parse_string_as_stream($rdf, $base_uri); }; ## If the parser chokes we still want the next bit of RDF if ($@) { print $@; next; } while (!$stream->end) { $model->add_statement($stream->current); $stream->next; } ## The RDQL query - a condition of a trackback graph is that the ## subject should be the same as the Dublin Core identifier object my $string = <<RDQL; SELECT ?identifier, ?ping, ?title WHERE (?what dc:identifier ?identifier) (?what trackback:ping ?ping) (?what dc:title ?title) AND ?what == ?identifier USING dc for <http://purl.org/dc/elements/1.1/> trackback for <http://madskills.com/public/xml/rss/module/trackback/ +> RDQL my $query = RDF::Redland::Query->new($string); my $results = $model->query_execute($query); my %graph; while (!$results->finished) { for (my $i=0; $i < $results->bindings_count(); $i++) { my $name=$results->binding_name($i); my $value=$results->binding_value($i); $graph{$name} = $value->as_string; } $results->next_result; push @trackback_data, \%graph; } } foreach my $graph (@trackback_data) { while (my ($key, $value) = each %$graph) { print "$key: $value\n"; } print "\n"; } print "Number of trackback ping records: " . @trackback_data . "\n"; exit 0;
In reply to Discover trackback ping data by Nomad
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |