in reply to Re^4: Plain Text To HTML
in thread Plain Text To HTML

I think your best bet is to officially declare that your application accepts Markdown format, and then add some tweaks for anything that Word produces which doesn't format the way you like. For instance, start with the CommonMark library, and if it doesn't like those bullet point characters in your example, you could write a quick search/replace of  $text =~ s/\x{2022}/*/g; or whatever is required to make it valid Markdown.

I suggest Markdown because it's the most common rich-text-in-plain-text format on the Internet, and because there's no standard I'm aware of to receive MS Word formatting into a standard HTML form element. I expect there are custom MS extensions for Edge that can do it, but I don't have a desktop install of Word available to test with. Building on Markdown also helps with identifying indent levels of nested lists, which would be unreasonably hard to do with regexes.

There are also fully-featured javascript client-side rich text editors like CKEditor which you could integrate, and those will submit HTML to the back-end, no Perl translation required. They may have much better support for stuff pasted from Word, but some require a paid license for professional use, and you'd have to spend some time finding which one works the best for your use case.

Replies are listed 'Best First'.
Re^6: Plain Text To HTML
by LanX (Saint) on Sep 21, 2024 at 10:27 UTC
    > There are also fully-featured javascript client-side rich text editors like CKEditor

    Or TinyMCE, ...

    Those client side solutions can handle the paste event, access the navigator.clipboard object in the DOM and then choose between the best MimeTypes offered.

    From personal experience I know that TinyMCE at least offers to display the content as some kind of HTML.

    Pretty off topic here (no Perl involved) and probably beyond the skills of the OP.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      Beyond the skills? They'd just follow the installation instructions of the component, and simplify their controller.
        The generated HTML can become complex and might need to be filtered.

        I had to do this once.

        And customizing such feature rich applications requires skills.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery