soulrain has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to teach myself some perl and thought I would make a small project for myself

My first project was simply parsing some html and downloading the videos

#!/usr/bin/perl -w use WWW::Mechanize; $url='https://class.coursera.org/automata-002/lecture/index'; $m = WWW::Mechanize->new(); $m->get($url); my @links = $m->find_all_links(url_regex => qr/lecture_id/i ); for my $link(@links){ printf "link: %s\n",$link->url; printf("%s", $link->url); }

So I just piped this into a file and used wget -i to read the url's from the file but I cannot figure out how to do this all inside perl. Any help or links would be appreciated. My goal is simply to dld and name the vid links.

Thanks in advance!

Replies are listed 'Best First'.
Re: Trying to download video from URL links
by Tanktalus (Canon) on Dec 22, 2013 at 02:36 UTC

    Did you read the WWW::Mechanize docs? A simple perusal gives a few obvious choices, plus, if the files are small enough to fit in memory, you can just $m_>get($link->url) and then save the $m->content to a file of your choosing.

    There are many other options as well where you can get lower level and be able to deal with the file in chunks as it comes in, or save directly to disk, with other HTTP client modules on CPAN.

    You can then get even more advanced and try to pull down multiple files in parallel, either inside a single thread (using event clients, such as POE or AnyEvent) or multi-threaded (using threads and queues). And then, more advanced, is to ensure you're only downloading some files at a time - because if you download too many it'll go slower than if you download only a few at a time. At least on most home internet connections.

    Best of luck!

      Reading through WWW::MEchanize again did the trick! Thanks for the obvious but very helpful suggestion!

      Here is what I ended up with:

      #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $filename = "fileblahblahblah.mp4"; my $url = "url"; $mech->get($url, ':content_file' => $filename );
      Thanks again!
Re: Trying to download video from URL links
by aitap (Curate) on Dec 22, 2013 at 09:11 UTC

    Well, $mech->get is an overloaded method from LWP::UserAgent which can be called as $mech->get( $uri, ':content_file' => $filename ); (quote from the WWW::Mechanize documentation). I've been using this method in my own downloader, but later decided to switch to wget to do the actual downloading. Wget gives me a progress bar and download resuming for free, without writing any additional code, and I already have it installed everywhere.

    It's even considered good by "UNIX-way" philosophy to call a program which does one thing and does it well when you need this thing to be done.

        So I added the ability to pass in command line arguments: para(1): url, para(2): regex in link you are looking for, para(3): is the fileExtension you want appended onto link text

        #!/usr/bin/perl -w use strict; use warnings; use WWW::Mechanize; my $total_args = $#ARGV + 1; if ($total_args != 3) { print "usage: getLinks url regEx suffix(file type)\n"; exit; } my $mech = WWW::Mechanize->new(); my $url = $ARGV[0]; my $regEx = $ARGV[1]; my $fileType = ".".$ARGV[2]; $mech->get($url); my @links = $mech->find_all_links( url_regex => qr#$regEx# ); for my $link(@links) { printf("Following: %s\n at address: %s\n", $link->text, $link->url +); printf("Saving file as %s\n", $link->text.$fileType."\n"); $mech->get($link->url, ':content_file' => $link->text.$fileType); }

        I would like to add a progress bar similar to what wget uses so could you explain a bit more about:

        $lwpmechua->show_progress( 1 )?

        Thanks as always!

         $lwpmechua->show_progress( 1 );
        Wow. There is even WWW::Mechanize::Plugin::Retry which probably can be hacked to automatically continue partial downloads, so Mechanize eventually could replace wget.

        While I have to admit that I didn't remember about show_progress, the progress bar of wget shows the percentage and ETA and the -c option automagically continues a broken download which doesn't look like a simple task in Mechanize.

      Ah this is true and definitely supported by software tools principle!
Re: Trying to download video from URL links
by Lennotoecom (Pilgrim) on Dec 22, 2013 at 03:05 UTC
    use LWP::Simple; $url = get("https://class.coursera.org/automata-002/lecture/index"); $link = ''; foreach (split /\n/, $url){ if(/download\.mp4/){ $_ =~m/href="(.+)"/; $link = $1; } if(/(Video \(MP4\) for )(.+)<\/div>/){ print "downloading $2\n" if $1; $file = get($link); open FILE,'>'.$2.'.mp4' or die $!; print FILE $file; close FILE; } }
    update
    though it downloads the files
    they can't be opened afterwards
    my bad, sorry
    P.S. Why are you against wget?
      though it downloads the files
      they can't be opened afterwards
      Do you happen to work on a Windows machine? Then binmode is your friend.
      Not against wget just want to see how to do it with WWW:Mechanize or perhaps another module... I was reading the documentation wrong which is easy for me to do as I am just learning Perl...