Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Challenge: Parse XML Feed for Youtube channel

by LanX (Saint)
on Jun 27, 2022 at 13:31 UTC ( [id://11145112]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

Since many of you are bored and I'm rusty in parsing XML (well choosing a module in the first place) here little challenge in the benefit of the monastery.

Once we have a script, adding new channels will be easy and ppl will have a lower entrace barrier to comment on talks.

So what's your most elegant approach? :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

INPUT:

TPC2022 recordings https://www.youtube.com/feeds/videos.xml?playlist_id=PLA9_Hq3zhoFyOpb-U3DMU7OT93dPUdtpE

TASK:

Parse it and produce HTML Output for Perlmonks. Extra points for maintainability of the HTML output templated

OUTPUT:

<h3> TPC 2022 in Houston </h3> <b>Conference in the Cloud! A Perl and Raku Conf</b> <ul> <li> [https://www.youtube.com/watch?v=waHAGThlRH8|Raku -Ofun for Every +one - Daniel Sockwell] <p> <i>Description: Rakoons like to say that Raku is -Ofun (optimized for +fun). This talk examines SHORTENED </i></li> <li> ... etc </li> </ul>

PREVIEW

TPC 2022 in Houston

Conference in the Cloud! A Perl and Raku Conf
  • Raku -Ofun for Everyone - Daniel Sockwell

    Description: Rakoons like to say that Raku is -Ofun (optimized for fun). This talk examines several of the steps that the Raku community has taken to deliver on this promise, from our documentation and code of conduct to our commitment to code examples and mentoring. For each example, we’ll discuss the current progress Raku has made as well as how you could help. https://sched.co/11nfM

  • ... etc

UPDATE

wrote a first solution in the meantime, result can be seen here. No code yet, don't wanna spoil the challenge :)

Replies are listed 'Best First'.
Re: Challenge: Parse XML Feed for Youtube channel -- XML::Twig
by Discipulus (Canon) on Jun 28, 2022 at 11:29 UTC
    Hello LanX,

    oh gee I forgot how ugly can be to handle XML! So as I was used to XML::Twig (last update 2016! fear!) I propose you a quite old style brute force attack approach. It works as expected. Here is too hot to also venturing into template systems.. heredoc are enough :)

    use strict; use warnings; use LWP::UserAgent; use XML::Twig; binmode(STDOUT, "encoding(UTF-8)"); my $xml = LWP::UserAgent->new->get('https://www.youtube.com/feeds/vide +os.xml?playlist_id=PLA9_Hq3zhoFyOpb-U3DMU7OT93dPUdtpE')->decoded_cont +ent; my ($title_title, $title_name, @elements); my $twig= XML::Twig->new( twig_handlers=>{ '/feed/title' =>sub{ $title_title = $_[1]->t +ext }, '/feed/author/name' =>sub{ $title_name = $_[1]->t +ext }, '/feed/entry' =>sub{ my @found; push @found, $_[1]->first_child('title' +)->text, ( split ':',$_[1]->first_c +hild('id')->text)[2]; # damn schema prefix i cant get rid of + it without this ugly... for ($_[1]->descendants ){push @found, + $_->text if $_->gi eq 'media:description'} push @elements,\@found; + }, } ); $twig->parse($xml); print <<"EOT"; <h3> $title_title </h3> <b>$title_name</b> <ul> EOT foreach my $ele (@elements){ print <<"EOT"; <li> [https://www.youtube.com/watch?v=$ele->[1]|$ele->[0]] <p> <i>$ele->[2]</i></li> EOT } print '</ul>';

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      > . heredoc are enough :)

      Heredocs are fine, they are just built-in templating. I used them too.

      NB: Unfortunately it turned out that the YT feed is incomplete, it only lists 13 talks in the playlist and I have no idea where to get a better source. :(

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Hm. The feed for the complete channel also seems to contain only 15 elements, which certainly isn't complete, either…, although the channel (https://www.youtube.com/c/YAPCNA/videos) contains about 42 videos from between 2022-06-24 and 2022-06-24 (plus many more older ones).

        Perhaps we need another challenge first: generate an RSS feed from the Youtube webpage for a playlist (or channel) 😛

        (Update: biggest part of this second challenge probably is to overcome dynamic scrolling, and perhaps this is the challenge where YT's own feed generator fails)
Re: Challenge: Parse XML Feed for Youtube channel -- XML::XSH2
by choroba (Cardinal) on Jun 28, 2022 at 12:59 UTC
    Using XML::XSH2:

    register-namespace a http://www.w3.org/2005/Atom ; register-namespace y http://www.youtube.com/xml/schemas/2015 ; register-namespace m http://search.yahoo.com/mrss/ ; my $html := create html ; open videos.xml ; my $head := insert element head into $html/html; my $title_text = /a:feed/a:title/text() ; insert text $title_text into &{ insert element title into $head } ; my $body := insert element body into $html/html ; insert text $title_text into &{ insert element h3 into $body } ; copy /a:feed/a:author/a:name/text() into &{ insert element b into $bod +y } ; my $list := insert element ul into $body ; for my $entry in /a:feed/a:entry { my $item := insert element li into $list ; my $anchor := insert element a into $item ; copy $entry/a:title/text() into $anchor ; copy $entry/a:link/@href into &{ insert attribute 'href=""' into $ +anchor } ; my $desc = $entry/m:group/m:description/text() ; $desc ||= '--' ; my $para := insert element p into $item ; copy $desc into &{ insert element i into $para } ; } save :F html :r $html ;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Challenge: Parse XML Feed for Youtube channel -- XML::LibXML + Template
by choroba (Cardinal) on Jun 28, 2022 at 18:27 UTC
    Using XML::LibXML and Template:
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; use Template; my $xpc = 'XML::LibXML::XPathContext'->new; $xpc->registerNs(a => 'http://www.w3.org/2005/Atom'); $xpc->registerNs(y => 'http://www.youtube.com/xml/schemas/2015'); $xpc->registerNs(m => 'http://search.yahoo.com/mrss/'); my $dom = 'XML::LibXML'->load_xml(location => 'videos.xml'); my $title = $xpc->findvalue('/a:feed/a:title', $dom); my $name = $xpc->findvalue('/a:feed/a:author/a:name', $dom); my $entries; for my $entry ($xpc->findnodes('/a:feed/a:entry', $dom)) { my $title = $xpc->findvalue('a:title', $entry); my $url = $xpc->findvalue('a:link/@href', $entry); my $description = $xpc->findvalue('m:group/m:description', $entry) +; push @$entries, {url => $url, title => $title, description => $description || '--'}; } binmode *DATA, ':encoding(UTF-8)'; my $template = do { local $/; <DATA> }; my $tt = 'Template'->new; binmode *STDOUT, ':encoding(UTF-8)'; $tt->process(\$template, {title => $title, name => $name, entries => $entries}); __DATA__ <h3>[% title %]</h3> <b>[% name %]</b> <ul> [% FOR entry IN entries %] <li> [[% entry.url %]|[% entry.title | html %]] <p><i>[% entry.description | html %] </i></p></li> [% END %] </ul>

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thanks.

      When I said "template" I meant anything maintainable with logical blocks of HTML. This includes here-docs when well organized and doesn't necessarily involve a template system or even MVC pattern.

      Or negatively expressed: I really despise° print statements with single tags.

      Not sure how to say that without the word "template". Suggestions welcome.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      °) wrong word, it makes me cringe.

Re: Challenge: Parse XML Feed for Youtube channel -- Mojo::DOM
by LanX (Saint) on Jun 28, 2022 at 14:04 UTC
    Here my solution with Mojo::DOM

    (my first steps, I'm sure there is room for improvement)

    FWIW: I shortened the input in DATA to two entries to keep it short.

    use v5.12; use warnings; use Mojo::DOM; my $data = join "", <DATA>; my $dom = Mojo::DOM->new($data); my $title = $dom->at('title')->text; my $name = $dom->at('name')->text; say <<__HTML__; <h3>$title</h3> <b>$name</b> <ul> @{[ entries() ]} </ul> __HTML__ sub entries { my @res; for my $entry ( $dom->find('entry')->each ) { my $title = $entry->at("title")->text; my $href = $entry->at("link")->attr('href'); my $desc = $entry->at('media\:group > media\:description')->t +ext; push @res, <<__HTML__; <li>[$href|$title]<p> $desc </li> __HTML__ } return @res; }
    OUTPUT
    <h3>TPC 2022 in Houston</h3> <b>Conference in the Cloud! A Perl and Raku Conf</b> <ul> <li>[https://www.youtube.com/watch?v=waHAGThlRH8|Raku -Ofun for Ev +eryone - Daniel Sockwell]<p> Rakoons like to say that Raku is -Ofun (optimized for fun). Thi +s talk ... YADDA YADDA ... </li> <li>[https://www.youtube.com/watch?v=3BYlObnzuKQ|Introducing Perl +Data Types - Will Braswell]<p> Data types are hints to a computer language, telling the langua +ge’s ... YADDA YADDA ... </li> </ul>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    shortened description to ... YADDA YADDA ... for line wrap

      Here a new approach with Mojo::DOM and taking benefit from it's overloading to shorten ->text and ->attr methods away.

      Also using an own rolled "template system" for readability

      The explicit map in the callback might also be easier done with an Mojo::DOM method ... maybe?

      use v5.12; use warnings; use Mojo::DOM; my $data = join "", <DATA>; my $dom = Mojo::DOM->new($data); say &t::page( TITLE => $dom->at('title'), NAME => $dom->at('name'), ENTRIES => \&entries, ); sub entries { map { t::entry ( TITLE => $_->at("title"), HREF => $_->at("link")->{href}, DESC => $_->at('media\:group > media\:description') ); } $dom->find('entry')->each ; } package t; # poor man's templates sub page { my %p = @_; << "____" } <h3>$p{ TITLE }</h3> <b>$p{ NAME }</b> <ul> @{[ $p{ ENTRIES }->() ]} </ul> ____ sub entry { my %p = @_; << "____" } <li>[$p{ HREF }|$p{ TITLE }]<p> $p{ DESC } </li> ____ package main;

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Challenge: Parse XML Feed for Youtube channel
by Aldebaran (Curate) on Jun 28, 2022 at 08:04 UTC

    Thanks for posting, LanX, I needed something to help me ignore the Christofascists stomping on the US. I find it heartening that perl could take a lecturn in Texas, a cauldron of freak faith, obscene violence, and nauseating law enforcement. Playlist for TRPC 2022 starts the playlist. Let there be logic....

      You know, I just saw the great keynote from Ruth and realized that I could never give such a speech.

      And it wasn't the language barrier. This talk was heart warming, inspirational and somehow lacked content. Like in church.

      You guys have a knack for sermons which goes against my nature. (Maybe because of anti-clerical traditions and restrictions in Europe after all those confessional wars.)

      At the same time the level of extreme black&white simplistic rhetoric in US politics and society is frightening.

      I was in Florida° for the 2016 YAPC - i.e. pre-Trump - and experienced it first-hand. (I could repeat stark slogans from both sides)

      But maybe it's just the internet, this new church which will also radicalize Europe, and I'm just arrogant and delusional.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      °) OTOH I liked New York a lot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11145112]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-03-29 12:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found