First, many thanks for trying.
IMHO the KISS solution is:
- if there are no <pre> tags, surround the entire post with <code> tags.
- if there are <pre> tags and no other formatting
- replace <pre> tags with <code> tags
- if there is any text above the first <pre> tag, surround it with <code> tags
- if any text is between each pair of </pre> and <pre>, surround it with <code> tags
- if any text is below the last <pre> tag, surround it with <code> tags
Note: I omitted the obvious simplification (remove the <pre> tags and surround the entire post with <code> tags) on the assumption that the post author sees something conceptually distinct from the rest of the text in whatever is surrounded with <pre> tags. It is possible that it would be valuable to allow that section to have its own download link.
The above algorithm won't make the text "pretty", but it will deal with the major sources of pain from badly formatted posts:
- People who don't know how to use html (or aren't comfortable with it) tend to space text as they want to see it. We will, most likely, be preserving the user's original formatting. We have a halfway decent chance of making things look readable.
- If the text contains embedded code, we will actually be able to read it - imagine that!
- <pre> tags are the main reason for janitorial emergencies. Getting rid of these on otherwise unformatted posts would save the janitors time. For the rest of us, we don't have to wait on the janitors to get access to our precious sidebars.
Attempting to insert both <c> and <p> tags is actually quite a difficult task because it requires us to distinguish between code and text. That is non-trivial. Since Perl borrows many words from English, it requires parsing not just the words but their context. I'm not surprised that you found the task too hard to do to your satisfaction in 2 days or so.
Best, beth
Update: explained why KISS doesn't include a very obvious simplification.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.