updated - fixed missing HTML::Strip object
updated - now does a tr on whitespace - thx shmem
Its a simple script to diff the text on web pages.

Someone used the phrase somewhere on the perlmonks site which made me think it'd be a handy thing to have - thanks to them.
#!/usr/bin/perl -w use strict; use LWP::Simple; use Text::Diff; use HTML::Strip; require 5.008_000; my $STORE="/home/charlie/diffs"; my $hs = HTML::Strip->new(); die ("Usage: $0 <URL_TO_DIFF>") unless ($#ARGV==0); my $url=$ARGV[0]; # 'nice' URL my $n_url=$url; $n_url=~s/^http:\/\///; $n_url=~s/\//_/g; my $store_as = (-e "$STORE/$n_url" ) ? "$STORE/$n_url.new" : "$STORE/$n_url"; if (is_success(getstore($url,$store_as))) { unless ($store_as eq "$STORE/$url") { + open (IN, $store_as); my @from=<IN>; close IN; open (IN,"$STORE/$n_url"); my @to=<IN>; close IN; my $from = $hs->parse(join ' ', @from); $from=~tr/[ \t]/ /s; my $to = $hs->parse(join ' ',@to); $to=~tr/[ \t]/ /s; my $diff = diff \$from, \$to; print $diff; rename $store_as, "$STORE/$n_url"; } } else { warn "Storing $store_as failed. Life sucks." } __END__ =head1 NAME web_diff.pl =head2 VERSION 0.1 =head1 SYNOPSIS diff text from a page retrieved off interweb and page stored locally =head1 DESCRIPTION Retrieve and store a page locally If we have a previously stored local copy, Compare retrieved and local page If they are not identical Strip html from them Print a diff =head2 OPTIONS =over =item C<URL TO DIFF> This isn't sanitized in properly, this code is not for use by people you don't trust implicitly :-) =back =head1 REQUIREMENTS =over =item Perl >= 5.8.0 (not tested on earlier versions) =item HTML::Strip =item Text::Diff =item LWP::Simple =back =head1 COPYRIGHT AND LICENCE Copyright (C)2006 Charlie Harvey This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Also available on line: http://www.gnu.org/copyleft/gpl.html =head1 SEE ALSO =cut

In reply to web_diff.pl by ciderpunx

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.