LanX has asked for the wisdom of the Perl Monks concerning the following question:
Hi
Since many of you are bored and I'm rusty in parsing XML (well choosing a module in the first place) here little challenge in the benefit of the monastery.
Once we have a script, adding new channels will be easy and ppl will have a lower entrace barrier to comment on talks.
So what's your most elegant approach? :)
INPUT:
TPC2022 recordings https://www.youtube.com/feeds/videos.xml?playlist_id=PLA9_Hq3zhoFyOpb-U3DMU7OT93dPUdtpE
TASK:
Parse it and produce HTML Output for Perlmonks. Extra points for maintainability of the HTML output templated
OUTPUT:
<h3> TPC 2022 in Houston </h3>
<b>Conference in the Cloud! A Perl and Raku Conf</b>
<ul>
<li> [https://www.youtube.com/watch?v=waHAGThlRH8|Raku -Ofun for Every
+one - Daniel Sockwell] <p>
<i>Description: Rakoons like to say that Raku is -Ofun (optimized for
+fun). This talk examines SHORTENED </i></li>
<li> ... etc </li>
</ul>
PREVIEW
TPC 2022 in Houston
Conference in the Cloud! A Perl and Raku Conf
- Raku -Ofun for Everyone - Daniel Sockwell
Description: Rakoons like to say that Raku is -Ofun (optimized for fun). This talk examines several of the steps that the Raku community has taken to deliver on this promise, from our documentation and code of conduct to our commitment to code examples and mentoring. For each example, we’ll discuss the current progress Raku has made as well as how you could help. https://sched.co/11nfM
- ... etc
UPDATE
wrote a first solution in the meantime, result can be seen here. No code yet, don't wanna spoil the challenge :)
Re: Challenge: Parse XML Feed for Youtube channel -- XML::Twig
by Discipulus (Canon) on Jun 28, 2022 at 11:29 UTC
|
Hello LanX,
oh gee I forgot how ugly can be to handle XML! So as I was used to XML::Twig (last update 2016! fear!) I propose you a quite old style brute force attack approach. It works as expected. Here is too hot to also venturing into template systems.. heredoc are enough :)
use strict;
use warnings;
use LWP::UserAgent;
use XML::Twig;
binmode(STDOUT, "encoding(UTF-8)");
my $xml = LWP::UserAgent->new->get('https://www.youtube.com/feeds/vide
+os.xml?playlist_id=PLA9_Hq3zhoFyOpb-U3DMU7OT93dPUdtpE')->decoded_cont
+ent;
my ($title_title, $title_name, @elements);
my $twig= XML::Twig->new(
twig_handlers=>{
'/feed/title' =>sub{ $title_title = $_[1]->t
+ext },
'/feed/author/name' =>sub{ $title_name = $_[1]->t
+ext },
'/feed/entry' =>sub{
my @found;
push @found,
$_[1]->first_child('title'
+)->text,
( split ':',$_[1]->first_c
+hild('id')->text)[2];
# damn schema prefix i cant get rid of
+ it without this ugly...
for ($_[1]->descendants ){push @found,
+ $_->text if $_->gi eq 'media:description'}
push @elements,\@found;
+
},
}
);
$twig->parse($xml);
print <<"EOT";
<h3> $title_title </h3>
<b>$title_name</b>
<ul>
EOT
foreach my $ele (@elements){
print <<"EOT";
<li> [https://www.youtube.com/watch?v=$ele->[1]|$ele->[0]] <p>
<i>$ele->[2]</i></li>
EOT
}
print '</ul>';
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [Watch: Dir/Any] [d/l] |
|
> . heredoc are enough :)
Heredocs are fine, they are just built-in templating. I used them too.
NB: Unfortunately it turned out that the YT feed is incomplete, it only lists 13 talks in the playlist and I have no idea where to get a better source. :(
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
Re: Challenge: Parse XML Feed for Youtube channel -- XML::XSH2
by choroba (Cardinal) on Jun 28, 2022 at 12:59 UTC
|
register-namespace a http://www.w3.org/2005/Atom ;
register-namespace y http://www.youtube.com/xml/schemas/2015 ;
register-namespace m http://search.yahoo.com/mrss/ ;
my $html := create html ;
open videos.xml ;
my $head := insert element head into $html/html;
my $title_text = /a:feed/a:title/text() ;
insert text $title_text into &{ insert element title into $head } ;
my $body := insert element body into $html/html ;
insert text $title_text into &{ insert element h3 into $body } ;
copy /a:feed/a:author/a:name/text() into &{ insert element b into $bod
+y } ;
my $list := insert element ul into $body ;
for my $entry in /a:feed/a:entry {
my $item := insert element li into $list ;
my $anchor := insert element a into $item ;
copy $entry/a:title/text() into $anchor ;
copy $entry/a:link/@href into &{ insert attribute 'href=""' into $
+anchor } ;
my $desc = $entry/m:group/m:description/text() ;
$desc ||= '--' ;
my $para := insert element p into $item ;
copy $desc into &{ insert element i into $para } ;
}
save :F html :r $html ;
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Challenge: Parse XML Feed for Youtube channel -- XML::LibXML + Template
by choroba (Cardinal) on Jun 28, 2022 at 18:27 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Template;
my $xpc = 'XML::LibXML::XPathContext'->new;
$xpc->registerNs(a => 'http://www.w3.org/2005/Atom');
$xpc->registerNs(y => 'http://www.youtube.com/xml/schemas/2015');
$xpc->registerNs(m => 'http://search.yahoo.com/mrss/');
my $dom = 'XML::LibXML'->load_xml(location => 'videos.xml');
my $title = $xpc->findvalue('/a:feed/a:title', $dom);
my $name = $xpc->findvalue('/a:feed/a:author/a:name', $dom);
my $entries;
for my $entry ($xpc->findnodes('/a:feed/a:entry', $dom)) {
my $title = $xpc->findvalue('a:title', $entry);
my $url = $xpc->findvalue('a:link/@href', $entry);
my $description = $xpc->findvalue('m:group/m:description', $entry)
+;
push @$entries, {url => $url,
title => $title,
description => $description || '--'};
}
binmode *DATA, ':encoding(UTF-8)';
my $template = do { local $/; <DATA> };
my $tt = 'Template'->new;
binmode *STDOUT, ':encoding(UTF-8)';
$tt->process(\$template, {title => $title,
name => $name,
entries => $entries});
__DATA__
<h3>[% title %]</h3>
<b>[% name %]</b>
<ul>
[% FOR entry IN entries %]
<li> [[% entry.url %]|[% entry.title | html %]]
<p><i>[% entry.description | html %]
</i></p></li>
[% END %]
</ul>
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks.
When I said "template" I meant anything maintainable with logical blocks of HTML. This includes here-docs when well organized and doesn't necessarily involve a template system or even MVC pattern.
Or negatively expressed: I really despise° print statements with single tags.
Not sure how to say that without the word "template". Suggestions welcome.
°) wrong word, it makes me cringe.
| [reply] [Watch: Dir/Any] |
Re: Challenge: Parse XML Feed for Youtube channel -- Mojo::DOM
by LanX (Saint) on Jun 28, 2022 at 14:04 UTC
|
Here my solution with Mojo::DOM
(my first steps, I'm sure there is room for improvement)
FWIW: I shortened the input in DATA to two entries to keep it short.
use v5.12;
use warnings;
use Mojo::DOM;
my $data = join "", <DATA>;
my $dom = Mojo::DOM->new($data);
my $title = $dom->at('title')->text;
my $name = $dom->at('name')->text;
say <<__HTML__;
<h3>$title</h3>
<b>$name</b>
<ul>
@{[
entries()
]}
</ul>
__HTML__
sub entries {
my @res;
for my $entry ( $dom->find('entry')->each ) {
my $title = $entry->at("title")->text;
my $href = $entry->at("link")->attr('href');
my $desc = $entry->at('media\:group > media\:description')->t
+ext;
push @res, <<__HTML__;
<li>[$href|$title]<p>
$desc
</li>
__HTML__
}
return @res;
}
OUTPUT
<h3>TPC 2022 in Houston</h3>
<b>Conference in the Cloud! A Perl and Raku Conf</b>
<ul>
<li>[https://www.youtube.com/watch?v=waHAGThlRH8|Raku -Ofun for Ev
+eryone - Daniel Sockwell]<p>
Rakoons like to say that Raku is -Ofun (optimized for fun). Thi
+s talk ... YADDA YADDA ...
</li>
<li>[https://www.youtube.com/watch?v=3BYlObnzuKQ|Introducing Perl
+Data Types - Will Braswell]<p>
Data types are hints to a computer language, telling the langua
+ge’s ... YADDA YADDA ...
</li>
</ul>
update
shortened description to ... YADDA YADDA ... for line wrap | [reply] [Watch: Dir/Any] [d/l] [select] |
|
use v5.12;
use warnings;
use Mojo::DOM;
my $data = join "", <DATA>;
my $dom = Mojo::DOM->new($data);
say &t::page(
TITLE => $dom->at('title'),
NAME => $dom->at('name'),
ENTRIES => \&entries,
);
sub entries {
map {
t::entry
(
TITLE => $_->at("title"),
HREF => $_->at("link")->{href},
DESC => $_->at('media\:group > media\:description')
);
} $dom->find('entry')->each ;
}
package t; # poor man's templates
sub page { my %p = @_; << "____" }
<h3>$p{ TITLE }</h3>
<b>$p{ NAME }</b>
<ul>
@{[ $p{ ENTRIES }->() ]}
</ul>
____
sub entry { my %p = @_; << "____" }
<li>[$p{ HREF }|$p{ TITLE }]<p>
$p{ DESC }
</li>
____
package main;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Challenge: Parse XML Feed for Youtube channel
by Aldebaran (Curate) on Jun 28, 2022 at 08:04 UTC
|
Thanks for posting, LanX, I needed something to help me ignore the Christofascists stomping on the US. I find it heartening that perl could take a lecturn in Texas, a cauldron of freak faith, obscene violence, and nauseating law enforcement. Playlist for TRPC 2022 starts the playlist. Let there be logic....
| [reply] [Watch: Dir/Any] |
|
You know, I just saw the great keynote from Ruth and realized that I could never give such a speech.
And it wasn't the language barrier. This talk was heart warming, inspirational and somehow lacked content. Like in church.
You guys have a knack for sermons which goes against my nature. (Maybe because of anti-clerical traditions and restrictions in Europe after all those confessional wars.)
At the same time the level of extreme black&white simplistic rhetoric in US politics and society is frightening.
I was in Florida° for the 2016 YAPC - i.e. pre-Trump - and experienced it first-hand. (I could repeat stark slogans from both sides)
But maybe it's just the internet, this new church which will also radicalize Europe, and I'm just arrogant and delusional.
°) OTOH I liked New York a lot.
| [reply] [Watch: Dir/Any] |
|
|