Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, im an artist on mp3.com and a new perl programmer... The method of checking stats on mp3.com is tiresome and redundant; it includes logging in, and being taken to a web page creating cgi script that allows you to view one song's stats at a time...to change you must click a button and refresh. Excuse the possible simplicity...I searched rather deeply into the manuals on my linux machine and found absolutely nothing relating to having perl interact with websites to generate desired info and then retriving it. If anyone could help/point me in the right direction, it would be of much help to a new monk ;)

Replies are listed 'Best First'.
Re: mp3.com stats
by lhoward (Vicar) on Jul 02, 2000 at 05:01 UTC
    Where you need to start is the LWP and HTML::Parser modules. The LWP modules let you write a perl program that acts like a browser and retrive information from a website. HTML::Parser provides a framework for parsing HTML documents.

    The LWP modules have built-in support basic authentication. However, if the mp3.com login works through cookies, it may get tricky.

      HTTP::Cookies would handle the cookies, probably, and easily, too.
Re: mp3.com stats
by maverick (Curate) on Jul 04, 2000 at 03:01 UTC
    Here's some code to get you connected and snag the first song page jeffa is going to go back and pretty this up to display all songs and their stats this will just get you connected up :) You have to have Crypt::SSLeay, OpenSSL and libwww installed for this to work.
    #!/usr/bin/perl -w use strict; use LWP::UserAgent; use HTTP::Request::Common; use HTTP::Cookies; # info to pick which page my $email = ''; # SET THIS TO YOUR EMAIL my $password = ''; # SET THIS TO YOUR PASSWORD my $band_id = ''; # SET THIS TO YOUR BAND ID (it's in the url after +you login in after 'band_id' # make a new agent my $agent = new LWP::UserAgent; # make a new cookie jar my $cookie_jar = HTTP::Cookies->new; # build the request for the login web page (so we get the set of cooki +es my $req = POST ('https://login.mp3.com/login', [ cmd => 'login', dest => 'http://studio.mp3.com/cgi-bin/artist-admin/login. +cgi?step=Intro', tmpl => 'login_artist.html', email => $email, password => $password, ]); # issue the request my $res = $agent->request($req); # extract the cookies $cookie_jar->extract_cookies($res); # build the request for the stats page my $req2 = GET ("http://stats.mp3.com/cgi-bin/artist-stats.cgi?band_id +=$band_id"); # add the cookies we got from the previous request $cookie_jar->add_cookie_header($req2); # issue the request $res = $agent->request($req2); if ($res->is_success) { print "Content-type: text/html\n\n"; print "<html>"; print $res->content; } else { print $res->as_string; }
    Enjoy
    /\/\averick
(jeffa) Re: mp3.com stats
by jeffa (Bishop) on Jul 02, 2000 at 20:20 UTC
    mp3.com does work with cookies - and it is very tricky. I have tried my darndest to grab an admin page and I just can't get it to work. What I can offer that works is script that displays the current Netscape cookies that are set. It requires Term::ANSIColor. This is helpfull to quickly find the key-values for a cookie.
    #cookies.pl #usage cookies.pl [site] - specify site or leave null for all use strict; use Term::ANSIColor; my $site = $ARGV[0]; my %values; open(COOKIE, "$ENV{HOME}/.netscape/cookies") or die "Yack!: $!\n"; while (<COOKIE>) { next if ($_ =~ /^[#\n]/); chomp; my ($domain, $x, $path, $x, $x, $key, $value) = split(/\t/, $_ +); $values{$domain}{$key} = $value; } foreach my $d (sort keys %values) { next unless !($site) or $d =~ /$site/; print color("underline"), "$d", color("reset"), "\n"; foreach my $k (sort keys %{$values{$d}}) { print "\t", color("green"), "$k"; print color("reset"), " => "; print color("yellow"), $values{$d}{$k}; print color("reset"), "\n"; } }
    Now, back to the issue at hand - mp3's artist log-in URL is https://login.mp3.com/login?origin=artist. I wrote the following piece of test code to see if I could simply dump the contents:
    #grabme.pl use strict; use LWP::UserAgent; use HTTP::Request::Common; use HTTP::Cookies; # instant a new agent my $agent = new LWP::UserAgent; $agent->agent("Mozilla/4.7 [en] (X11; I; Linux 2.2.13-mosix i686; Nav) +"); # issue a request for the web page my $req = GET("https://login.mp3.com/login?orgin=artist"); my $res = $agent->request($req); if ($res->is_success) { print $res->content; } else { print $res->error_as_HTML; }
    And I get this as a result (abbreviated): 501 Protocol scheme 'https' is not supported. Wow. If you replace https with plain old http a 302 error is returned. If anyone can give a suggestion/alternate route, I would greatly appreciate it.

    In the meantime I will continue to work on this admin page display thingy. I take it that you are wanting to get a list of songs from the Artist Admin Page, feed each one via a GET or POST request to their CGI engine, and print the stats. Great idea, because it is a timely process to do so via a web browser.

    <SHAMELESS PLUG>
    my music
    </SHAMELESS PLUG>

      i believe you can get LWP to work with HTTPS URLs by installing Crypt::SSLeay. it requires SSLeay (which is provided as part of the OpenSSL package).

      i just installed that module and verified that i can GET the URL you are interested in.

      also, there is a HTTP::Cookies::Netscape package as part of HTTP::Cookies that allows you to manipulate the Navigator cookie file without reinventing the wheel. you can use it like so:

      use HTTP::Cookies; $cookiejar = new HTTP::Cookies::Netscape(File => "$ENV{HOME}/.netscape +/cookies");
(jeffa) Re: mp3.com stats
by jeffa (Bishop) on Jul 04, 2000 at 08:02 UTC
    Whew!
    Here it is. If you want to use it as a CGI file, just put it in your cgi-bin directory of choice. If you don't have CGI access, use it like so:
    ./mp3admin-display.cgi > out.html
    and load out.html in your browser. This has been fun. ;)

    UPDATE - mp3.com regularly changes their submit variable names, so, good chance that this code is broken but easily fixed. Not to mention that i wrote this over a year ago without any HTML parsers ...

    #!/usr/bin/perl -w # mp3admin-display.cgi use strict; use LWP::UserAgent; use HTTP::Request::Common; use HTTP::Cookies; # form variables my $email = ''; # SET THIS TO YOUR EMAIL my $password = ''; # SET THIS TO YOUR PASSWORD my $band_id = ''; # SET THIS TO YOUR BAND ID # bot variables my $agent = LWP::UserAgent->new; my $cookie_jar = HTTP::Cookies->new; my $res; # URL's of importance my $login_url = 'https://login.mp3.com/login'; my $admin_url = 'http://stats.mp3.com/cgi-bin/artist-stats.cgi'; # list of songs and their id's my %songs = (); print "Content-type: text/html\n\n"; &login($agent, $res, $cookie_jar, $email, $password, $login_url); &printImageAndFindSongs($agent, $res, $cookie_jar, $band_id, \%songs, $admin_url); foreach my $song_id (keys %songs) { &dumpAttribs($agent, $res, $cookie_jar, $band_id, $song_id, $songs{$song_id}, $admin_url); } exit; # login is necessary to set cookies sub login($$$) { my( $agent, $res, $cookie_jar, $email, $password, $login_url) = @_; # build request for login web page (so we get the set of cookies) my $req = POST ($login_url, [ cmd => 'login', dest => 'http://studio.mp3.com/cgi-bin/artist-admin/login.cgi?ste +p=Intro', tmpl => 'login_artist.html', email => $email, password => $password, ]); # issue the request $res = $agent->request($req); # extract the cookies $cookie_jar->extract_cookies($res); } # prints out Total Stat Image and builds list of songs sub printImageAndFindSongs ($$$$) { my( $agent, $res, $cookie_jar, $band_id, $songs, $admin_url) = @_; my @lines; my $start; my $last; # build the request for the stats page my $req = GET ("$admin_url?band_id=$band_id"); # add the cookies we got from the previous request $cookie_jar->add_cookie_header($req); # issue the request $res = $agent->request($req); if ($res->is_success) { # print stat image first @lines = split(/\n/, $res->content); for (my $i=0; $i <= $#lines; $i++) { if ($lines[$i] =~ /^\s*<form.+?name="stats">$/i) { $start = $i; } elsif ($lines[$i] =~ /Estimated payments/i) { $last = $i; } } print join("\n", @lines[$start..$last]); print "</td></tr></table>\n"; # now get the songs @lines = grep { /^<OPTION/ } @lines; map { /^\s*<OPTION\s+\w*\s*\w+="(\d+)">(.+)\s*$/i; $songs{$1} = $2; } @lines; } else { print "Error trying to load first admin page\n"; } } # loads song page and prints stats sub dumpAttribs ($$$$) { my( $agent, $res, $cookie_jar, $band_id, $song_id, $song, $admin_url) = @_; my @lines; my $start; my $last; # build the request for the stats page my $req = GET ("$admin_url?band_id=$band_id&song_id=$song_id"); # add the cookies we got from the previous request $cookie_jar->add_cookie_header($req); # issue the request $res = $agent->request($req); if ($res->is_success) { @lines = split(/\n/, $res->content); for (my $i=0; $i <= $#lines; $i++) { if ($lines[$i] =~ /^\s*<input type=hidden name="report_month +".*$/i) { $start = $i; } elsif ($lines[$i] =~ /^\s*<!-- Outer Table -->.*$/i) { $last = $i - 3; } } print "<form>\n"; print join("\n", @lines[$start..$last]); } else { print "Error trying to load song page $song_id\n"; } }
    Please inform me of any problems. Now all you monks out there with mp3 Artist accounts rush to use this. Hmm, do I hear crickets chirpin'?
Re: mp3.com stats
by locked_user POASquall (Initiate) on Jul 04, 2000 at 18:58 UTC
    Thanks for all the help you guys put forth for me, before this my curiosity of perl wasn't getting me very far in the way of learning it deeper then writing adlibs programs, but now, i can thankfully say im going to be spending the next few months sitting up on my shrine (computer), drinking bawls, programming like an arch bishop monk. My spark is lit.

    Oh, and now I can see how I didn't make any mp3c money because I suck.

    Heh.


    -Jon/POASquall // mp3.com/PhysicsOfASquall