Additions to Approved HTML

G'day All,

This post was prompted by ambrus' comment, in Re^4: Approved PM markup?:

"You may try to petition us with a complete proposal of what attributes to enable in what elements, and then maybe I'll write a patch and maybe some god will apply it."

However, I'd much rather see discussion and concensus before submitting any proposal; accordingly, please freely discuss the following and let's see if a concensus can be reached.

For reference, here's the currently approved elements and attributes: Perl Monks Approved HTML tags

Non-standard Elements

Firstly, the non-standard <code> tag requires a little more explanation than the remainder do, so I'll deal with that first. (Assume everything here also applies to the <c> tag.)

I pretty much like everything about the <code> tag except for the fact that it renders Unicode characters with code points greater than 255 (U+00FF) as character entity references. To illustrate, consider this markup:

<pre>
U+007E: ~
U+007F: DEL
U+00FF: ÿ
U+0100: Ā
</pre>
<code>
U+007E: ~
U+007F: DEL
U+00FF: ÿ
U+0100: Ā
</code>

which renders as

U+007E: ~
U+007F: DEL
U+00FF: ÿ
U+0100: Ā

U+007E: ~
U+007F: DEL
U+00FF: ÿ
U+0100: A&#772;
[download]

So, in order to post code containing these characters, we need to use <pre> (or <tt> for inline text). When we do this, we lose all the features of <code>, such as code-wrapping and the [download] function.

Therefore, I'd like to suggest a uni attribute for <code> that might be used either as <code uni="1"> or, given <code> is non-standard anyway, and won't appear in the final HTML, just <code uni>. Unless feedback indicates otherwise, I'll recommend: <code uni>.

It would, of course, be important, that <code uni> still renders character entity references as written; e.g. > still renders as > and not >.

Update (code): This is not something that can be done easily, so I've removed it from the proposal. See ambrus' comment below.

I'm not suggesting any changes to the remaining non-standard elements: <spoiler> and <readmore>. Having said that, I do note that <readmore> allows a title attribute whereas <spoiler> does not: feel free to argue the case for <spoiler title="...">.

Standard Elements and Attributes

These are all straightforward and simply represent attributes that are missing from certain elements but allowed elsewhere. I'm really just aiming for consistency here.

Attribute	Elements to Allow this Attribute
class	All standard elements that don't currently allow it.
dir	All standard elements that don't currently allow it, except `br`.
lang	All standard elements that don't currently allow it, except `br`.
title	All standard elements that don't currently allow it.

Unknown Element: `wbr`

I've no idea what the <wbr> tag is. It's not standard HTML (see Index of the HTML4 Elements), nor is it documented as being non-standard.

Unless there's a good reason to keep it, removal would seem to be the appropriate action. Conversely, if we want to keep it, it should be documented. I'll recommend removal unless feedback indicates otherwise.

Update (wbr): Feedback has indicated otherwise (see Re: Additions to Approved HTML ( ISO-8859-1)): this just needs documenting.

-- Ken

Comment on Additions to Approved HTML Select or Download Code

Replies are listed 'Best First'.
Re: Additions to Approved HTML by ambrus (Abbot) on Jun 29, 2015 at 16:00 UTC
Code Anonymous has already told a bit about the state of code tags. While it would be nice to permit code snippets that can have characters that aren't in cp1252, it's not so easy. The main problem is that the content of all nodes is submitted and stored cp1252-encoded, so you cannot represent characters literally in the perlmonks-HTML source code of a node unless they have an encoding in cp1252. If perlmonks wanted to allow that the raw text of some nodes is encoded as utf-8, we'd have to change a lot of code, and this is not a priority. (Note that perlmonks nodes were officially encoded as iso-8859-1 until 2008-31, but as cp1252 since that. This caused few user-visible changes, except in the XML tickers. See Tidings.) In theory, we could add a tag that lets you put code snippets into a node that can contain any character, but then you couldn't just represent all those characters literally in the perlmonks-HTML source of your node, but would have to somehow encode them on your side. This is already partly possible if you don't insist on actually using specific code tags, but just write ordinary text (or monospaced text) and ampersand-encode some characters. In fact, some browsers already encode characters that aren't in cp1252 that way, because they can't encode those characters to cp1252 but the submission form on the website asks them to. You then have to manually encode the punctuation characters `><&[]` and whitespace as well. The drawback is that in this case people can't use the "download code" option to download just the code part of your node. The other alternative for a poster is to link to an off-site web resource. None of these is perfect, but there's certainly no easy solution. WBR For information about the WBR element, please see http://www.w3.org/html/wg/drafts/html/master/semantics.html#the-wbr-element and https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr. I don't see the need to forbid it once we've already allowed, but then I also didn't see why we had to forbid Q tags. Spoiler If you want to allow a title attribute for spoilers, please tell us precisely how its value would be used in the formatted spoiler, in all combinations of the five possible values of the "Render <spoiler> as" setting of Display Settings, and whether the spoiler is expanded or not, and then convince us why that change would be worth over the current state of not being able to specify a title. I'm not saying I'm completely against this, because spoiler tags on some web forums do have a title that's useful, but if you want a change, please give precise proposlas or we won't change anything. Other attributes I have no big problem with permitting the class, title, dir, lang attributes in posts for any element, given that we already permit these in some elements. The only problem these could cause IMO is that an unclosed tag with a dir or lang element could mess up the formatting of the rest of the page. (Unclosed tags can get into the formatted output with some display settings that some old users may still have.) However, unclosed tags can cause problems in lots of other ways, and you could already add these tags to a blockquote tag, so the situation wouldn't be any worse. For this reason, however, I would recommend that dir and lang attributes are not added to the PerlMonks Approved Chatter HTML Tags unless there is a compelling use case for them.	[reply] [d/l]
Re^2: Additions to Approved HTML by kcott (Archbishop) on Jun 29, 2015 at 19:03 UTC
OK, thanks. Here's where I see the proposal to be (in terms of what can be done now and what will receive approval from one of the gods): Code No action: Remove from proposal. WBR Action point: A small documentation change to Perl Monks Approved HTML tags. [See below.] Spoiler No action: This never was part of the proposal ("I'm not suggesting any changes to the remaining non-standard elements: `<spoiler>` and `<readmore>`."). Other attributes Action point: Add the attributes `class`, `dir`, `lang` and `title` to the elements as described in the table in my OP. For the `wbr` documentation change, this might save yourself (or someone else in pmdev) a bit of work. In the table, change `wbr` [download] to `wbr<sup>5</sup>` [download] Add to the end of the existing notes: `<p> <sup>5</sup>The <c>wbr</c> element provides a <em>suggested</em> wrapping point for long strings that contain no whitespace. See [http://www.w3.org/html/wg/drafts/html/master/semantics.html#the-wbr-e +lement\|W3C: The <c>wbr</c> element] for more information. </p>` [download] Obviously, I don't have access to the code for adding attributes to the existing elements; however, if that turns out to be a lot of work (due to the large number of element/attribute combinations), and you'd like an extra pair of hands, just let me know. -- Ken	[reply] [d/l] [select]
Re: Additions to Approved HTML by ww (Archbishop) on Jun 29, 2015 at 13:05 UTC
This is to register general agreement with kcotts OP... while acknowledging the general accuracy of the first reply from an AnonyMonk. That said, I favor making the non-standard elements like <code> tags accept attributes in a manner as close as possible to w3c standards; IOW, `<code uni="1"> and <spoiler title="...">` I, for one, would particularly like to be permit use of color, strike, size, etc. (using a <span class=... or <span style=...) inside code tags to facilitate highlighting (and permit more concise snippets).Yeah, sure, the use of comments to call attention to an error or suggested change works (well "sorta' works") but often only at the expense of extra lines or lines that exceed screen widths of some users. PM's stated aversion to <br> is something I've never agreed with or understood. Is there some wise monk who'll explain that? (I included it when writing Markup... only because I'd been chastised as a newbie for using break tags.) Aside: perhaps modification efforts should also be directed to PM's handling of <br /> which Perl Monks Approved HTML tags appears to countenance but which generates an error indication when HTML preview error reporting is set to 4). Here's an example: => -- NB: the XHTML form follows the => & precedes this statement but the base form [ without a slash] follows. The XHTML form appears fairly often -- possibly used by authors for whom it's the norm. (For those seekign a non-authoritative but compact explanation, an XHTML trott from U of Wash. offers this: `A few tags are called non-container tags, because they don't contain any content - they stand alone. Examples are images and line breaks. XHTML requires that all open tags must be closed, even if they're not container tags. Therefore, non-container tags end in />. For example, the tag for a line break is <br />.` Another aside: someone recently suggested (in the CB?) that we'd be well served by automating insertion of code tags around anything that looks like data or code before the preview stage of creating a new post. (Caveat: I have no idea of how to code an implementation nor any intent to try to do so anytime soon!) IIRC, the bold, big and (sup or sub) tags also sometimes behave oddly when combined. This may be only in special cases and I have not been able to conjure up an illustration right now. Spirit of the Monastery	[reply]
Re^2: Additions to Approved HTML by kcott (Archbishop) on Jun 29, 2015 at 17:49 UTC
Thanks for registering your general agreement, ww. ++ When considering the "`<code uni="1">`" and "`<code uni>`" forms, I chose the latter for reasons of brevity and, therefore, esae of use. The W3C Standards (for HTML4) do include examples of this form; for instance, checked and disabled. Your preference for "`<code uni="1">`" is noted and will be taken into account with other feedback I receive. Referring back to ambrus' post (Re^4: Approved PM markup?), you'll see that embedded CSS (via the `style` element or `style` attributes) will not be allowed, so I won't be proposing "`<span style=...`" at all. Similarly, from the same post, "`<span class=...`" within code tags, would allow syntax-highlighting, so that's off the table as well. Regarding `br`, that's a modification so I won't be including it in the proposal. I believe if I keep the proposal tight, i.e. just to additions, I'm more likely to get somewhere with this, than I would if I allowed feature-creep. Having said that, should you wish to make a separate proposal, for instance, "allow `br` inside `p` blocks but continue to discourage its use as an alternative to using `p` blocks", I would certainly support that. -- Ken	[reply] [d/l] [select]
Re^3: Additions to Approved HTML by ww (Archbishop) on Jun 30, 2015 at 17:45 UTC
OK, re-read Re^4: Approved PM markup? but should not have needed to had I read more carefully before. However, just as we implemented non-standard tags -- <code> etc -- skilled individuals could, I suspect, implement equivalents for a limited variety of style-attribute-equivalents -- .red, .green. etc. Creating a few of those would go a long way, IMO, to making it possible to highlight important subject matter in ways that would enhance communication other than <>;font>; <b>; <em>; <big> etc. As things stand now, explaining some issues appearing in CODE requires lengthy notes pointing out that problem x at line such-and-such in your original creates problems at... at a remove from the problem (i.e., in the narrative rather than the code) or requires an extended comment in the code section, interrupting the flow. My intent re span is NOT TO PROVIDE SYNTAX HIGHLIGHTING but rather to provide non-standard tag -- say <note> maybe -- as a means to create a visibly distinct rendering for comments inserted inside code tags without the limitations of using the hash_sign or pod markup. As to <br>, you invited general comments. My thoughts on that were clearly an aside for the community reading your (++ed) thoughts.	[reply]
Re: Additions to Approved HTML ( ISO-8859-1) by Anonymous Monk on Jun 29, 2015 at 09:29 UTC
Hello and welcome monk, your concern/thoughtfulness is appreciated and is refreshing `<code uni="1">` perlmonks doesn't support unicode, so, um, yeah , it would be nice if it did but perlmonks faq doesn't unicode or utf8 or utf-8 it only latin1 or windows-1252 or something like that ) ; its encoding="ISO-8859-1" ie ISO-8859-1 These are all straightforward and simply represent attributes that are missing from certain elements but allowed elsewhere. I'm really just aiming for consistency here. When is the last time you've seen "cite" used? Its one of those semantic-web ideas that are useless to humans that browsers don't display -- its library search engine stuff -- since the beginning of perlmonks there has been less than 30 uses of it (one by you, four since 2009 ), thats over 15 years also a good 99.99% of nodes only ever use `<p>` and `<code>` tags Also just consider dir, used to indicate the directionality of text -- latin1 is always one direction, so yeah Perlmonks uses the windows-1252 (similar to Latin-1) encoding ... Sure theoretically perlmonks welcoms other languages and with entities they could be represented, but well, non-english postings are practically unicorns Unknown Element: wbr tye likes it http://mdn.beonex.com/en/HTML/Element/wbr.html and documentation is good (like a link or something) However, I'd much rather see discussion and concensus before submitting any proposal; accordingly, please freely discuss the following and let's see if a concensus can be reached. :) Wow such optimism, refreshing update: oh kcott user since 2010 , that explains it :P I like you	[reply] [d/l] [select]
Re^2: Additions to Approved HTML ( ISO-8859-1) by kcott (Archbishop) on Jun 29, 2015 at 15:17 UTC
That's a fair amount of research you've done in compiling your response. Thank you for this effort. My main aim here was to add, rather than remove. I'm guessing (and correct me if I'm wrong) that you brought up `cite` as something to be removed. While I won't be adding it to this proposal, you are, of course, absolutely free to start a "Removals from Approved HTML" thread. I included `dir` for reasons of consistency: it's already allowed in one place, so why exclude it from others. I'm fairly certain I've never used it on this site, but I most definitely have used it elsewhere for arabic text. (Hebrew text would be another place where it would be appropriate.) Thanks for the feedback: if enough people don't want `dir`, I'll remove it from the proposal. `wbr`: you learn something everyday. I'll change the proposal to document and get rid of the suggestion for removal. "I like you" I like you too: ++ -- Ken	[reply] [d/l] [select]
Re: Additions to Approved HTML by ikegami (Patriarch) on Jun 29, 2015 at 20:58 UTC
The problem is that your browser is sending `Ā` for both "Ā" and for "Ā", and there's no way to differentiate between the two. Fixing this requires changing the charset PerlMonks uses throughout.	[reply] [d/l]
Re^2: Additions to Approved HTML by kcott (Archbishop) on Jun 29, 2015 at 21:52 UTC
Thanks, ikegami. I had removed the original suggestion for changes to the '`<code>`' element from the proposal (see Re^2: Additions to Approved HTML). I should've also updated the OP - now done. -- Ken	[reply] [d/l]
Re^3: Additions to Approved HTML by ikegami (Patriarch) on Jun 30, 2015 at 19:28 UTC
It's still in the OP. It would, of course, be important, that `<code uni>` still renders character entity references as written; e.g. `>` still renders as `>` and not `>`. It's impossible to have auto-escaping and be able to display Ā. Update: Oh I see what mean. You "removed" it by adding an update that says it's been removed.	[reply] [d/l] [select]


more useful options
	PerlMonks