I've made several minor improvements to the HTML nesting enforcement code based on the
testing done so far. Thanks go to the few who I saw "complain," especially ysth. (:
First, note that P is no longer a block tag as far as PerlMonks html nesting is concerned
(because people often don't close a P tag and
that shouldn't prevent earlier tags from being closed).
The other improvements mostly have to do
with how corrections are displayed. Below
is my attempt at documenting the htmlerror
reporting levels. With luck, the SiteDocClan will improve these into a PM doc node (along
with the previous
announcement).
So there are two new settings in User
Settings. And pretty soon the htmlnest option will go away (it will become the default
behavior that can only be disabled by adding
;htmlnest=0 to a URL).
I also took the opportunity to move all of
the Nodelet Settings out of the over-full
User Settings and clean it up slightly.
HTML error reporting levels
Reporting levels summary
- Level 0 shows invalid/unapproved HTML
tags as plain text.
- Level 1 also shows non-trivial closing HTML tags that had to be inserted (as grey
text).
- Level 2 also shows invalid/unapproved
attributes of approved HTML tags (enclosed in
the approved tag; all as grey text).
- Level 3 also shows non-</p> closing HTML tags that were ignored (as grey text with
a line drawn through it).
- Level 4 also shows trivial closing
HTML tags that were ignored or inserted (as
grey text; ignored tags have a line drawn
through them). Trivial tags are </p> and
non-nesting tags inserted other than at the endof the HTML being filtered.
Reporting levels details
No matter what htmlerror reporting is set
to, any unrecognized, invalid, or unapproved
tags simply have their opening < changed
to < so that the tag becomes visible
as text (see the More HTML escaping
announcement).
When you have htmlerror reporting set to 4
(the maximum), nearly all other corrections
made to the HTML will also be made visible, but in a grey font:
- Any approved closing tags that get
ignored in order to enforce proper nesting of
tags will be made visible inside of
<font color="#808080" class="htmlignored">
tags.
The default PerlMonks CSS includes
font.htmlignored { text-decoration:
line-through; } which means the grey text will have a line drawn through it as if
<strike> tags had been used (unless your
browser does not support CSS). The strike-out allows you to distinguish them from inserted
closing tags (and you can use CSS to customize their appearance).
- Any closing tags that have to be
inserted in order to enforce proper nesting of tags will also be displayed, but inside of
<font color="#808080" class="htmlinserted"> tags.
- Any unrecognized, invalid, or
unapproved attributes in an approved tag will
be displayed inside angle brackets with the tag name. All of this will be inside of <font
color="#808080" class="htmlattrib"> tags.
For example, if IMG is an approved tag with
approved attributes of ALT, HEIGHT, and WIDTH, then HTML of <Img ALT=purdy align="top" oops> will be changed to <img alt='purdy' /><font color="#808080" class="htmlattrib"><img align="top" oops></font> so you'll see "<img
align="top" oops>" displayed after the
image. These are the only non-closing tags
that will be displayed in grey.
This level (4) of htmlerror reporting is
rather obnoxious and is reserved for when you
are composing your own nodes or temporarily
request it by adding ;htmlerror=4 to a
PerlMonks URL.
If you lower the htmlerror reporting level
to 3, then inserted and ignored </p> tags
are not displayed. Neither are closing tags
that were inserted to close a non-nesting tag
other than at the end of the HTML being
filtered.
For example, if you have an HTML
table that is missing all of its </tr>,
</th>, and </td> tags, then these will be inserted but not be displayed (as long as the
</table> is not missing).
Level 3 omits showing these most common
lapses (that are harmless unless you consider
strict compliance to newer HTML standards as
a goal in itself) but shows nearly all other
mistakes.
Level 2 omits showing ignored closing tags. So it shows non-trivial inserted closing tags
(descibed in the next paragraph) and ignored
attributes.
Level 1 omits showing ignored attributes.
This means that it only shows when tags had to be inserted to close an unclosed or misnested
tag (but never shows non-nesting tags unless
they were inserted at the end of the filtered
HTML, and never shows </p>).
Level 0 (the default) just fixes nesting
errors but doesn't display any of them.
User settings
In user settings, you can select between
htmlerror reporting levels of 0, 1, 2, or 3 to be used when you view nodes at PerlMonks. You can temporarily select any reporting level
(including 4) by appending ;htmlerror=4 (for
example) to any PerlMonks URL.1
Note that using an error reporting level
of 3 will show you harmless "errors" so you
shouldn't select this unless you can deal with
seeing a lot of "mistakes" without becoming
obnoxious in pointing them out to others.
When you start composing a new node, for thefirst preview you can select between htmlerror
reporting levels 3 and 4 (the default choice
is also controlled in user settings and
defaults to 3). For previews after the first,
you can pick any reporting level via a form
element on the preview page.
[ The patches to Preview are a bit
complicated and haven't been finished. At the
time of this writing, the first preview uses
your 'preview' level of error reporting and there is no form element
for adjusting the level while previewing.
]
1 You can't select 4 as your
default error reporting level (except for when
previewing your own nodes) because it reports
harmless "errors" that we expect to be made
often by many members and we don't want to hear complaints about such.
Re: Site HTML filtering, Phase II
by Abigail-II (Bishop) on Feb 11, 2004 at 10:11 UTC
|
| [reply] [Watch: Dir/Any] |
|
Actually, the Perlmonks stuff is pretty simple. The
only hard part is remembering which of the less common
but harmless and useful HTML tags don't work. (ISTR
that cite doesn't work, but I could be misremembering;
maybe it was q that doesn't work. I'm not sure. I
often just use them anyway, because when they're what
you intend, there's nothing else with the right
semantics.) That, and remembering
the entity for escaping the left square bracket. (I
usually just put code tags around it. Easier to
remember.) If you want to see some needlessly
complicated and gratuitously different site markup,
have a look at Wikipedia sometime. I am continually
thankful that Perlmonks markup is mostly just HTML.
Can't we just have a setting that puts
an implicite <code> and </code> around our
postings?
Well, you could always change your node template to
that in protest. Such a protest would have about
as much impact on the rest of us as Coruscate's
XP/reputation/voting protest, but we'd all know
where you stand on the issue.
My first reaction when I read the description of these
new changes is that the error checking is quite lenient.
I suppose that's a good thing. If I had written the
checker, it would probably just reject or escape
anything that's not wellformed (in addition to
anything that smacks of javascript), which would
probably
be a major annoyance to people who still write legacy
HTML, of whom there are still quite a few out there
I suspect, the number of years since XHTML was put
forward notwithstanding. So, be happy that tye
wrote it, because he did a pretty good job IMO of
making the checker as lenient as could be reasonably
hoped for. (There are people who would want no
checker at all, but I think you understand why that
would cause problems in practice.)
update3: Hmmm... What I
*thought* I saw was that it actually got stripped.
What I *actually* discovered is that View Selection
Source in Mozilla does not give exactly the same
source as View->Page Source does. The former shows
<hr> and the latter shows <hr />. Weird.
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
| [reply] [Watch: Dir/Any] [d/l] |
|
The only hard part is remembering which of the less common but harmless and useful HTML tags don't work.
No, no, no. The hard part is finding out which elements are
named the same in both HTML and Perlmonks, but act differently. <code> for instance means something else in HTML than in Perlmonks. But I still haven't
figured out how the <a> element is working on Perlmonks. Sometimes, it creates a link. Sometimes it appears
as is.
That, and remembering the entity for escaping the left square bracket. (I usually just put code tags around it. Easier to remember.)
Easier to remember, but not easier to type. Having to type
13 extra characters to be able to type a common
character in Perl isn't what I say "easy". At least in POD,
you only need three extra characters: C<[>. And in POD, you
don't even have to put any markup around a function() or a $variable. POD knows.
If you want to see some needlessly complicated and gratuitously different site markup, have a look at Wikipedia sometime.
Actually, I've contributed some bits to Wikipedia the last
week. I vastly prefer the [[link]] syntax
over [link] as it means one can use unescaped
left brackets if they aren't followed by another left bracket. [..] is common when discussing perl. [[..]] is a rare appearance
in Perl code. I also prefer mechanisms like ''foo''
or *bar* to make something emphasized/italics or
strong/bold, like Wikis or news/mail readers do.
Abigail
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
The XML-style closing / gets stripped out too
What? Yes, </hr> gets stripped now and didn't used to. But for some time now, <hr> has been changed to the XMLish <hr />.
Oh, I see. There is a bug in that <hr /> can *report* (if you have error reporting set high enough) that the / was stripped when in fact it wasn't. I'll fix that soon.
Thanks.
| [reply] [Watch: Dir/Any] |
|
If you give me a list of tags, and where you think they should be allowed, I'll look at them. Can't promise more, I'm rather busy at present.
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
| [reply] [Watch: Dir/Any] |
|
|
BTW, the node you are replying to isn't discussing any changes to how you mark up nodes at PerlMonks. Feel free to ignore it.
The previous related node involved a fairly minor change: Instead of just expecting contributors to get their HTML elements properly nested, we are now checking for it and trying to fix any errors we find (trying to balance DWIM with code complexity/performance). We wouldn't be doing this except such errors can and do impact the contributions of others.
This node is discussing (in quite a bit of detail) how much feedback you can choose to see from this process. If you find it too complicated for you to understand (or it just taxes your patience), then you should probably stop reading after the short summary (or just ignore it completely and keep the default settings or even just try different settings when you get bored).
Implicit <code> tags would make for a rather ugly presentation (and a much less flexible one). I and others discuss POD elsewhere. With LaTeX, would we deliver the results as PDF or just big PNGs? (Sorry, I haven't used LaTeX in many years so I don't know how nice any LaTeX-to-HTML engines are -- but I suspect they'd take a lot more load than the current PerlMonks HTML production process.) Plain HTML would make posting Perl code difficult without using a program to help produce the HTML.
I didn't have anything to do with the development of the "near-HTML-subset plus square bracket" syntax. I don't find it particularly hard to understand (and this was back when the documentation was much worse). And I appreciate the short cuts it provides (and realize it isn't a perfect choice for Perl, a language that makes fairly heavy use of nearly every printable ASCII character).
If you simply want text, then the requirements are very simple:
- Put <p> where you want a blank line.
- Put <code> tags around any code (or other uses of &, <, >, [, and ] or text you need displayed in a fixed-width font, such as ASCII drawings). Try not to use this when you don't need it.
You later complain about producing links. Plain text doesn't have links, so you need to decide whether you want plain text or not. If you want links, then please stop asking why you can't have plain text. (:
| [reply] [Watch: Dir/Any] |
Re: Site HTML filtering, Phase II
by Anonymous Monk on Feb 11, 2004 at 15:18 UTC
|
<h3>this is a broken title</h4>
<!--
But it displays correctly
-->
<h3>this is a broken title spanning beyond the end of my post</h>
<!--
It will "infect" all the page, till the end
-->
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
Re: Site HTML filtering, Phase II
by ysth (Canon) on Feb 16, 2004 at 04:16 UTC
|
How would you feel about a ;htmlerror= level to show the source (as it would be show in an Update window)? It would make it easier to see what's going on when other peoples nodes come out strange (for instance, seeing what's up with the readmore tags on 329062). Obviously this level wouldn't be an option in user settings.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
Thanks for the reminder, tye; does the XML view bypass the html correction? (Update: of course it does; should it?)
| [reply] [Watch: Dir/Any] |
|
|