Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Another YouTube Video Downloader using GtkWebkit

by zentara (Archbishop)
on Feb 12, 2010 at 19:04 UTC ( [id://822913]=CUFP: print w/replies, xml ) Need Help??

This is a very crude hack I just whipped together to save You-Tube videos. It uses Gtk2::WebKit, and a hack to pull the cached flv files out of the /tmp directory. My hack to detect files is very crude, and is full of possible pitfalls, but it works as a simple demo. What happens is that Gtk2::Webkit will respect the javascript, and the cws flash loader file, and automatically load the flash video. The problem comes in detecting the writing of the cached video file in /tmp.

Anyways... it is getting harder and harder to automatically save videos from You-Tube servers, but this works. Gtk2::WebKit::Download would seem to be a better way to do it more directly, but it dosn't seem to respect the javascript or flash loader.... but someone else may know a way.

The Webkit browser will delete the cache files automatically upon exit, so I copy it to the pwd.

UPDATE: As an after thought, I did very little testing of this on big video downloads. So my method of saving the video as soon as the cache file is detected, may get you some truncated videos, as that cache file probably grows as buffering occurs. The answer of course, is maybe to just have a save button that lets you save the video after you have seen it fully load once.... or maybe tap into the progress activity somehow, as in Perl Web Browser using Gtk2 and WebKit..... all I wanted to mention is that this script works as is for small videos, and the javascript and cws loader involved, make it quite a complex beast to tame. :-)

#!/usr/bin/perl use strict; use warnings; use Gtk2 -init; use Gtk2::WebKit; use File::Copy; my $url = 'http://www.youtube.com/watch?v=9uDgJ9_H0gg'; my $pwd = `pwd`; chomp $pwd; #where the flash will be cached temporarily by the browser my $tmpdir = Glib::get_tmp_dir(); print "$tmpdir\n"; # find names like FlashGRsDw9 opendir(my $dir, $tmpdir) or die("ack: $!"); my @files = grep /^Flash[A-Za-z0-9]{6}/, readdir $dir; closedir $dir; print "initial files in /tmp-> ", @files, "\n"; my $view = Gtk2::WebKit::WebView->new; $view->signal_connect( 'notify::progress' => \&notify_progress, undef +); $view->signal_connect( 'load_finished' => \&load_finished, undef ); my $sw = Gtk2::ScrolledWindow->new; $sw->add($view); my $win = Gtk2::Window->new; $win->set_default_size(800, 600); $win->signal_connect(destroy => sub { Gtk2->main_quit }); $win->add($sw); $win->show_all; $view->open($url ); Gtk2->main; sub notify_progress{ my $load_progress = $view->get('progress'); print "$load_progress\n"; } sub load_finished{ print "load complete\n"; find_save(); return 0; } sub find_save{ my $timer = Glib::Timeout->add (1000, sub { # check for flash file every second # to allow flash loader to load file print "test for file\n"; opendir(my $dir, $tmpdir) or die("ack: $!"); my @files = grep /^Flash[A-Za-z0-9]{6}/, readdir $dir; closedir $dir; print "files-> ", @files, "\n"; print scalar @files, "\n"; if( scalar @files >= 1 ){ print "success\n"; foreach my $flash(@files){ print "$flash\n"; copy("$tmpdir/$flash","$pwd/$0-$flash".'.flv') or warn "Can not copy$!\n"; } return 0; # end timer }else{ return 1 } # keep timer going }); }

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku

Replies are listed 'Best First'.
Re: Another YouTube Video Downloader using GtkWebkit
by zentara (Archbishop) on Feb 13, 2010 at 14:45 UTC
    Here is an improved version that will grab an entire video by testing for stat file size increases. It may truncate videos if on a slow dialup line, because I use a 3 second timer in the file stat.... it works on slow dsl though :-)
    #!/usr/bin/perl use strict; use warnings; use Gtk2 -init; use Gtk2::WebKit; use File::Copy; my $url = 'http://www.youtube.com/watch?v=EAtBki0PsC0'; #'9uDgJ9_H0gg +'; my $pwd = `pwd`; chomp $pwd; #where the flash will be cached temporarily by the browser my $tmpdir = Glib::get_tmp_dir(); print "$tmpdir\n"; # stat to monitor buffering my $old_size = 0 ; # find names like FlashGRsDw9 opendir(my $dir, $tmpdir) or die("ack: $!"); my @files = grep /^Flash[A-Za-z0-9]{6}/, readdir $dir; closedir $dir; print "initial files in /tmp-> ", @files, "\n"; my $view = Gtk2::WebKit::WebView->new; $view->signal_connect( 'notify::progress' => \&notify_progress, undef +); $view->signal_connect( 'load_finished' => \&load_finished, undef ); $view->signal_connect( 'notify::load-status' => \&notify_load_status, +undef ); my $sw = Gtk2::ScrolledWindow->new; $sw->add($view); my $win = Gtk2::Window->new; $win->set_default_size(800, 600); $win->signal_connect(destroy => sub { Gtk2->main_quit }); $win->add($sw); $win->show_all; $view->open($url ); Gtk2->main; sub notify_progress{ my $load_progress = $view->get('progress'); print "$load_progress\n"; } sub load_finished{ print "load complete\n"; find_save(); return 0; } sub notify_load_status{ print $view->get('load_status'),"\n"; } sub find_save{ # this is just to detect the cache filename my $timer = Glib::Timeout->add (1000, sub { # check for file every second # to allow flash loader to load file print "test for file\n"; my @files; opendir(my $dir, $tmpdir) or die("ack: $!"); @files = grep /^Flash[A-Za-z0-9]{6}/, readdir $dir; closedir $dir; print "files-> ", @files, "\n"; print scalar @files, "\n"; if( scalar @files >= 1 ){ print "found file success\n"; watch_save( $files[0] ); return 0; # end timer }else{ return 1 } # keep timer going }); } sub watch_save{ my $file = shift; my $watch = "$tmpdir/$file"; print "watching file-> $watch\n"; # this is the problemsome point, if you are on a slow # network, greater than 3 second delays may occur and trip a false EOF # but it works on my slow dsl my $timer = Glib::Timeout->add (3000, sub{ my $new_size = (stat $watch )[7]; $new_size ||= 0; print "new size-> $new_size\n"; #see perldoc -f stat #print lstat $file,"\n";; if( $old_size != $new_size ){ print "1 still buffering $old_size $new_size \n", ; $old_size = $new_size; return 1; #keep timer going }else{ #done unless there is a network hang print "flash done writing $pwd/$0-$file".'.flv',"\n"; copy("$watch" ,"$pwd/$0-$file".'.flv') or warn "Can not copy $!\n"; return 0; #end timer } }); }

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku
Re: Another YouTube Video Downloader using GtkWebkit
by SuicideJunkie (Vicar) on Feb 12, 2010 at 21:07 UTC
    Re: Progress Activity

    What if you check the size of the file every 1.0 seconds or so, and grab it once it stops growing?

    Or alternatively; scan in a tighter loop, but only grab a copy if the new file is larger than the last copy you made.

      Yeah, stat'ing the file in a tight loop until no-growth is attained, is the way I would do it. But to be honest, I wanted to leave the code purposely ineffective so as not to teach people how to do DoS attacks. :-)

      The node probably should be renamed "A way to save YouTube videos to disk".

      I suspect the engineers at YouTube are probably already working on a way to stream to multiple cache files for a single video, so as to render this method obsolete. But, for the time being, it does offer a way to locally save youtube videos to disk, which is useful for archiving and offline viewing. I will post an example with a save button later, after I test how the file buffering goes.

      I'm sure someone smarter than me out there, would know how to detect the progress of the on_load javascript request, as well as the cws file they send as a pre-loader, to the actual video. There seems to be alot of signals which one can tap into.


      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku
Re: Another YouTube Video Downloader using GtkWebkit
by Anonymous Monk on Mar 07, 2010 at 08:20 UTC
    Instead of trying to time it just right and copying off the file, what if you created a hard link to the file as soon as you know which one it is. Then you will have all the time in the world. You could wait till tomorrow if you wanted.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://822913]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-25 17:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found