Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Web forum markup language and the Monastery

by szabgab (Priest)
on Jan 15, 2005 at 09:27 UTC ( [id://422484]=perlmeditation: print w/replies, xml ) Need Help??

I am writing a web forum application to be used mainly by programmers. One of the things I have to do is to decide on the markup language and special tags (or shortcuts) to be used there. I like the way the Monastery works and I'd like to see what the Monks can say about the following questions:

  • What have you learned (regarding the markup) from the past 5 years here ?
  • Is the list of approved HTML tags the Monks have a good list ?
    Are there HTML tags that would make life easier ?
  • How do you like the system of the shortcuts ?
  • Are there things that make it difficult to expand the system ?
  • What limitations did you encounter so far, if any ?
  • What would you do differently if you opened the Monastery today ?
  • Would it make more sense to disallow the submission of bad markup or that of not approved HTML tags ?

In order to learn more about how the Monastery works I tried to play around with the various tags
Here is what I learned.
(Correct me if I am not right in some of the points.)

In the Monastery you can use

  1. A restricted set of approved HTML tags
  2. <code> to wrap snippets of code
  3. Shortcuts within [ ] tags

HTML tags, regular text

The approved HTML tags will be displayed as they were submitted. Anything else, that resembles an HTML tag (either valid or not) will be HTML escaped so you can even type in

while (<IMG>) {}

in your regular text and it will show up correctly.
This means that the approved HTML tags cannot be expanded with tags (either those valid today or others added to the HTML standard later) without breaking old submits that might have submitted such tags in their regular text. (E.g. the above <IMG> tag).
I find this might be slightly problematic in the long run.

The good news is that you can write almost any other characters as they are, except those used for the shortcuts.

The bad news is that the Monastery does not check for correctly closing the HTML tags so I might forget to close a <b> tag and turn the rest of the page to be bold.

<code>

The <code> tags are actualy valid HTML tags but in the Monastery they are used to mark code snippets. This means that <code> will never be an approved HTML tag that users can use. This is probably not a big issue but one that should be mentioned. Everything, the engine of the Monastery takes the <code> tags and replaces them with some other HTML markup. Text within <code> tags is fully HTML escaped. The only thing I can't seem to correctly insert within the code tags is a closing tag of the </code>.
This is usually not a big issue either.

shortcuts

Normally in the Monastery you can used tags like [id://nodeid] and it will automagically show up as a link to the nodelet with that id. You can even type in a pair of [ ] characters in your text and the string between them will be used to link to a nodelet with that title.
That's very nice.
This also means that if you'd like to add both [ and ] characters in your regular text in this order then you'll have to escape one of them by yourself.
If within the [ ] pair there is :// somewhere like in this: [something://otherthing] then this is considered as either a shortcut (if something is one of the valid shortcut names) or is displayed as it is. This means that in the unlikely event that someone has entered a piece of text that looks like this: [something://otherthing] using something that is currently not a shortcut, if and when something becomes an official Monastery shortcut this piece of text will automatically behave as a real shortcut. As it is not likely that people use [...://....] in their regular text this is only a small expansion annoyance.

  • Comment on Web forum markup language and the Monastery

Replies are listed 'Best First'.
Re: Web forum markup language and the Monastery ([[...]])
by tye (Sage) on Jan 15, 2005 at 17:07 UTC

    Use [[...]] for shortcuts not [...].

    Define which fields hold HTML and which hold text and display them properly (always escape the text into entities and always filter the HTML).

    Use UTF-8.

    Define the valid ranges of characters allowed (may vary by field). For example, filter out non-whitespace control characters everywhere. You might want to disallow poorly supported and dingbat-like characters from titles and/or usernames.

    Spend some quality time designing login / security. Provide automatic means for handling when people forget their password and lose access to their selected e-mail address.

    Keep the mark-up simple and be *very* wary of purists and pedants. For example, HTML tables have proven to work better in a wide varieties of environments than anything I've seen anyone propose.

    I'll probably write up more specifics when I have more time.

    - tye        

      HTML tables have proven to work better in a wide varieties of environments than anything I've seen anyone propose.

      Non-horizontal layout works better. Especially if everything can be left-aligned. If you don't need a chatterbox or always-visible poll, find a simple layout. Avoid tables and CSS if you can. If you cannot, in 2005, I think it's best to try and find a layout that works with pure CSS. And make sure testing isn't limited to the usual browsers. Include some PDA browsers and text browsers and hack in special cases if needed. Oh, and if you dislike headaches, let ancient browsers be the problem of their users instead of your problem, because it will only get worse as time passes.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Oh, and if you dislike headaches, let ancient browsers be the problem of their users instead of your problem, because it will only get worse as time passes.

        I cannot agree with this more. Given that Firefox is a free download, installs quickly, and is a minimal impact on the system ... there is no excuse to not have a CSS-capable browser. Period.

        And, if you're complaining that you may not be able to install it at work - what're you doing reading Perlmonks at work? :-)

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I think for a site supporting a perl like language that <[...]> might be better. Its a little uglier, but IMO a lot less likely to occur in code.

      ---
      demerphq

        I am not sure one needs to be able to include shortcuts in the code snippets. PerlMonks does not let me. At least not in this response:
        [Code]
        Then I only have to cae that the end-of-code sequence (</code> in PerlMonks) is not likely to occure in code.
Re: Web forum markup language and the Monastery
by Juerd (Abbot) on Jan 15, 2005 at 11:58 UTC

    What have you learned (regarding the markup) from the past 5 years here ?

    That tables sucks for design, because they make the entire page wider if only one node is wide. And that demerphq doesn't like users to use structured HTML with h1 and h2, but wants them to start with h4 instead, because PM itself uses h3 for the title. (I disagree and think PM should be fixed, because that can be done much more easily (with CSS, for example).)

    Is the list of approved HTML tags the Monks have a good list ? Are there HTML tags that would make life easier ?

    It's good now. Although some complain that images cannot be used, I think this is a good thing. Images are slow and, for Win32 users with MSIE, dangerous :)

    How do you like the system of the shortcuts ?

    I like it a lot, but I still don't understand why // is needed. [id:422484] instead of [id://422484] would be nice. The double slash has a specific meaning according to RFC 1738, and PM breaks this. // is a promise that //<user>:<password>@<host>:<port>/<url-path> (the common internet scheme syntax) can be used.

    Would it make more sense to disallow the submission of bad markup or that of not approved HTML tags ?

    Yes and no. I don't think this is a big deal. It would be nice if a simple HTML parser were used to close open tags, or to avoid tags being closed without ever having been opened.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      And that demerphq doesn't like users to use structured HTML with h1 and h2, but wants them to start with h4 instead, because PM itself uses h3 for the title. (I disagree and think PM should be fixed, because that can be done much more easily (with CSS, for example).)

      Well, i agree with your diagreement, but until it is changed I do have more or less that opinion. OTOH since I've only ever considered a node about it once (which was yours) and that the community didn't agree with me when I did, I pretty much recognize that moaning about it is a lost cause. :-) (And by inference im not really sure why you brought it up, my opinion in things like this is no more relevent that anyone elses.)

      ---
      demerphq

        im not really sure why you brought it up

        To answer the question. I had always learned that h1, h2, etc were for structuring, but your POV appears to be that the number in there is to indicate the size of the heading. This demonstrates that even something with a spec, like HTML, is open to multiple interpretations. For the OP, it is a hint to define these things, or at least think of a way to handle them.

        I pretty much recognize that moaning about it is a lost cause. :-)

        I am not convinced that it is. If you feel this is important, we should still seek a way to fix it. Either through CSS (just define font-size in the right contexts) or by disallowing the "big" (or, from my point of view: higher level) headings. There was no clear concensus on the consideration, but there were in fact more people who voted edit than who voted keep, and this proves that you're not the only one who dislikes the current state of things.

        About the headings being too big: I agree, even.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Web forum markup language and the Monastery
by dimar (Curate) on Jan 16, 2005 at 02:50 UTC

    Just a quick comment on "shortcuts" along with a favorite soapbox rant.

    As has already been well-discussed, and well-considered: whatever character sequence you use for interpolating "shortcuts" will preclude their use as ordinary text, therefore it is best to choose a "rare" character sequence.

    The problem is, in the discussion-space of 'computer programming languages' (especially versatile ones like perl) there *are* no rare character sequences. With all the different mini-syntaxes, protocols, idioms and neologisms out there, it's (arguably) impossible to choose a syntax that can never be misinterpreted as a command when it was really intended as plain text. Especially when you limit the command-delimiters to the fewest number of total characters. This is the classic linguistic problem of 'use versus mention'.

    Ironically, perl (and unfortunately, perl seems to be alone on this) solves this universal problem *exceptionally* well. Namely, allow the user to specify her *own* command delimiters, and completely obviate the need to add cumbersome 'escape sequences' in nearly all circumstances.

    q§This is a $brilliant$ idea.§; "This is a $brilliant idea."; qq^This is a "$brilliant" idea.^; q{This is a $brilli@nt ide@}; qŠThi§ i§ a $brilliant idĽaŠ;

    This unique aspect of perl should be on the "best practices" short list of how to solve this particular problem. A+

      Indeed. Let's play chase the delimiters (trying to get deparse to resort to escaping):
      $ perl -MO=Deparse -e'print "foo"' print 'foo'; -e syntax OK $ perl -MO=Deparse -e'print "foo'"'"'"' print q[foo']; -e syntax OK $ perl -MO=Deparse -e'print "foo['"'"'"' print q(foo['); -e syntax OK $ perl -MO=Deparse -e'print "foo(['"'"'"' print q<foo(['>; -e syntax OK $ perl -MO=Deparse -e'print "foo<(['"'"'"' print q{foo<(['}; -e syntax OK $ perl -MO=Deparse -e'print "foo{<(['"'"'"' print q/foo{<(['/; -e syntax OK $ perl -MO=Deparse -e'print "foo/{<(['"'"'"' print q"foo/{<(['"; -e syntax OK $ perl -MO=Deparse -e'print "foo\"/{<(['"'"'"' print q#foo"/{<(['#; -e syntax OK $ perl -MO=Deparse -e'print "foo#\"/{<(['"'"'"' print 'foo#"/{<([\''; -e syntax OK
      The problem is, in the discussion-space of 'computer programming languages' (especially versatile ones like perl) there *are* no rare character sequences.
      That I do not believe. Sure, every character sequence can occur inside a perl program (just put it inside quotes), but many sequences are rare. And the current delimiters, [ ] are one of the most common. [[ ]] would be far less common. Sure, one can make Perl code that uses [[ ]], but if you'd do some statistics on <code> fragments on Perl monks, you'll see that [ ] is uses far, far more often than [[ ]].

        Hence my suggestion of <[]> something that I dont think I've ever seen in real code. IMO [[]] is all too often found in Perl code:

        my $AoA=[[1,2],[1,3]]
        ---
        demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://422484]
Approved by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-26 05:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found