Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Challenge: Parse XML Feed for Youtube channel -- XML::Twig

by Discipulus (Canon)
on Jun 28, 2022 at 11:29 UTC ( [id://11145134]=note: print w/replies, xml ) Need Help??


in reply to Challenge: Parse XML Feed for Youtube channel

Hello LanX,

oh gee I forgot how ugly can be to handle XML! So as I was used to XML::Twig (last update 2016! fear!) I propose you a quite old style brute force attack approach. It works as expected. Here is too hot to also venturing into template systems.. heredoc are enough :)

use strict; use warnings; use LWP::UserAgent; use XML::Twig; binmode(STDOUT, "encoding(UTF-8)"); my $xml = LWP::UserAgent->new->get('https://www.youtube.com/feeds/vide +os.xml?playlist_id=PLA9_Hq3zhoFyOpb-U3DMU7OT93dPUdtpE')->decoded_cont +ent; my ($title_title, $title_name, @elements); my $twig= XML::Twig->new( twig_handlers=>{ '/feed/title' =>sub{ $title_title = $_[1]->t +ext }, '/feed/author/name' =>sub{ $title_name = $_[1]->t +ext }, '/feed/entry' =>sub{ my @found; push @found, $_[1]->first_child('title' +)->text, ( split ':',$_[1]->first_c +hild('id')->text)[2]; # damn schema prefix i cant get rid of + it without this ugly... for ($_[1]->descendants ){push @found, + $_->text if $_->gi eq 'media:description'} push @elements,\@found; + }, } ); $twig->parse($xml); print <<"EOT"; <h3> $title_title </h3> <b>$title_name</b> <ul> EOT foreach my $ele (@elements){ print <<"EOT"; <li> [https://www.youtube.com/watch?v=$ele->[1]|$ele->[0]] <p> <i>$ele->[2]</i></li> EOT } print '</ul>';

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: Challenge: Parse XML Feed for Youtube channel -- XML::Twig
by LanX (Saint) on Jun 28, 2022 at 11:53 UTC
    > . heredoc are enough :)

    Heredocs are fine, they are just built-in templating. I used them too.

    NB: Unfortunately it turned out that the YT feed is incomplete, it only lists 13 talks in the playlist and I have no idea where to get a better source. :(

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Hm. The feed for the complete channel also seems to contain only 15 elements, which certainly isn't complete, either…, although the channel (https://www.youtube.com/c/YAPCNA/videos) contains about 42 videos from between 2022-06-24 and 2022-06-24 (plus many more older ones).

      Perhaps we need another challenge first: generate an RSS feed from the Youtube webpage for a playlist (or channel) 😛

      (Update: biggest part of this second challenge probably is to overcome dynamic scrolling, and perhaps this is the challenge where YT's own feed generator fails)
        > (Update: biggest part of this second challenge probably is to overcome dynamic scrolling, and perhaps this is the challenge where YT's own feed generator fails)

        I wouldn't be surprised if the dynamic scrolling was implemented by polling some (inoffical) JSON structure.

        I think YT is deliberately neglecting to update the feed.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11145134]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-04-16 11:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found