Great problem, it could also be used to figure out what portions of a web page change so an bot could rip out stories from news sites. I suggest thinking of this along the lines of a diff. First determine you record delimiter. In a diff the delimiter is a new line. However with regular expressions you might go with white space (or this could be a command line option). Look up diff and use a similar algorithm. Once you find the components that are different look at the differences. Would they both fit in the same character class. Maybe just go down the following list to see which describes both first.
/^[0-9]$/
/^[0-9]+$/
/^[0-9]*$/
/^[0-9A-Za-z]$/
/^[0-9A-Za-z]+$/
/^[0-9A-Za-z]*$/
/^[0-9A-Za-z.,]$/
/^[0-9A-Za-z.,]+$/
/^[0-9A-Za-z.,]*$/
/^.$/
/^.+$/
#Giving up
/^.*$/
This might be good for a first pass.
----
I always wanted to be somebody... I guess I should have been more specific.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.