A simple program to assist you to sytematically identify what sort of changes occured between that last version of a document and this one. Specifically geared at picking up subtle changes like table cell widths, etc.
I was thinking of using more elaborate means to diff a couple of HTML documents but this serves my needs when they've (our friends the HTML designers) just been fiddling and the two docs are basically the same.
#!/usr/bin/perl # -w not used because of a few noisy warnings in write's # tag_comp.pl # - jlawrenc@infonium.com - use at your own risk # # A quick 'n dirty to help you compare HTML tags across two similar do +cuments. # # This happens to me from time to time. We have an HTML template that +has been # adapted for server-side use. Then the graphic designer goes off and +reformats # with different fonts, tag sizes or whatever. It could be easer to sc +ope out the # changes and then just re-edit our template document rather than rewo +rking the # supplied HTML back into a template. # # Invoke thusly: # tag_comp fn1 fn2 [tag [shift]] # # ie/ # tag_comp index.html new_index.html table # generates a report of how the <table> tag is used differently bet +ween the two # documents # # tag_comp index.html new_index.html img 2 # a report of how <img> tags have changed shifting the left col up +a couple # of rows to help line up the differences # # # Things to consider # a - tag regex is real simple "<" + not > 1 or more times + ">" # this may not always work for you # b - tag compares are lowercased # # It would be nice to try and line up the matches more effectively bu +t a humon # will do the job for now. # Report header format STDOUT_TOP = ---------------------------------------------------------------------- +----------- @|||||||||||||||||||||||||||||||||||||| | @||||||||||||||||||||||||||| +||||||||||| $fn1, $fn2 ---------------------------------------------------------------------- +----------- . # Report body - lines that do not match format STDOUT = ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~ | ^<<<<<<<<<<<<<<<<<<<<<<<<<<< +<<<<<<<<<~~ $srch1[$i], $srch2[$i] ---------------------------------------------------------------------- +----------- . # Report body - lines that do match format STDOUT_MATCH = * match: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< +<<<<<<<<<~~ $srch1[$i] ---------------------------------------------------------------------- +----------- . # Our input arguments - file name1, file name2, tag to report on, shif +t value ($fn1, $fn2, $tag, $shift) = @ARGV; if (!$fn1 or !$fn2) { die "Please supply two file names to compare."; } # Default to "img" tags if (!$tag) { $tag="img"; print STDERR "Defaulting to search for <$tag>s\n\n"; } # Check for positive shift if ($shift<0) { print STDERR "shift only works with positive vals.\n"; print STDERR "if you want to shift the other way then try reversing +your file names. :)\n"; } # Slurp our files undef $/; open FIN, $fn1; $file1=<FIN>; open FIN, $fn2; $file2=<FIN>; # Grab our tags - real crude regex that may not always do the trick while ($file1 =~ /(<[^>]+>)/gms) { push @tags1, $1; } while ($file2 =~ /(<[^>]+>)/gms) { push @tags2, $1; } # Get our list of matching tags @srch1=grep /^<$tag(\s|>)/i, @tags1; @srch2=grep /^<$tag(\s|>)/i, @tags2; # Shift first search result if needed for ($i=0; $i<$shift; $i++) { unshift @srch1, ""; } # Find out who has more rows - set1 or 2 $rows=$#srch1 > $#srch2 ? $#srch1 : $#srch2; # Write our header $~="STDOUT_TOP"; write; # Write report body foreach ($i=0; $i<=$rows; $i++) { # One format for rows that are the same, another for those that are +not if (lc $srch1[$i] ne lc $srch2[$i]) { $~="STDOUT"; write; } else { $~="STDOUT_MATCH"; write; } } # Done - coffee time

In reply to HTML tag compares between similar files by jlawrenc

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.