Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a tough one here that is not working. I need to get the date and time of a web page update. Here is how it looks on the web page(Date and time changes everyday)

Page last updated:
Wednesday, May 08, 2002 14:53:02

Here is how it looks in my browser View/page source:
<SCRIPT LANGUAGE="JavaScript"> document.write("<font face=arial size=1>Page last updated: " + "<br>" + document.lastMod<FONT face=arial size=1> Page last updated: <BR>Wednesday, May 08, 2002 14:53:02
Now I have tried every reg expression combination:
(/Page last updated: \<BR\>(.*)$/) (/Page last updated: \<BR\>(.*)$/) (/Page last updated: <BR>(.*)$/)


As an example:
$content = get($url); print $content; my ($dat) = $content =~ (/Page last updated: <BR>(.*$)/i); print "$1\n"; print $dat;


Nothing prints. I can not fetch the date and time. If I print out the $content = get($url variable and get the whole web page HTML code the date and time DO NOT show up. Please advise if this date and time part are in a javascript then it is not accessible??

Replies are listed 'Best First'.
Re: Reg Expression on Javascript?
by cLive ;-) (Prior) on May 09, 2002 at 12:19 UTC
    Think about it!!! How does JavaScript work?
    • browser requests HTML
    • HTML is parsed by browder
    • browser executes javascript
    And what is LWP doing?
    • LWP requests HTML

    *sigh*

    cLive ;-)

    ps - but in answer to your question, perhaps reading the LWP::Simple docs might help (hint - look at the head() function)

    --
    seek(JOB,$$LA,0);

Re: Reg Expression on Javascript?
by tachyon (Chancellor) on May 09, 2002 at 12:57 UTC

    Do it the easy way....

    use LWP::Simple; $data = head('http://www.somesite.com'); print $data->{'_headers'}->{'last-modified'}; # this shows you the available data in the header use Data::Dumper; print "\n\n", Dumper $data;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Parsing HTML source that contains JavaScript
by mrbbking (Hermit) on May 09, 2002 at 12:16 UTC
    JavaScript is interpreted and executed by your web browser.

    When you use Perl to get the same page, you just get the text of the page - there is no provision to execute any JavaScript that the page might contain.

    You noticed that print $content; does not show you the date and time, only the JavaScript that *would have* produced it, had it been executed. That's why you can't find it ... it's not there...

Re: Reg Expression on Javascript?
by perlplexer (Hermit) on May 09, 2002 at 12:10 UTC
    Perhaps there aren't as many spaces after the ':' as you expect?
    Do this:
    /page last updated:\s*<br>(.*)/i;
    --perlplexer

    Update:
    Forgot <br>
    Update:
    Err, never mind...
    I think I should've gotten some coffee before answering that. :)
    Read all the replies below.