I think your best bet is to officially declare that your application accepts Markdown format, and then add some tweaks for anything that Word produces which doesn't format the way you like. For instance, start with the CommonMark library, and if it doesn't like those bullet point characters in your example, you could write a quick search/replace of $text =~ s/\x{2022}/*/g; or whatever is required to make it valid Markdown.
I suggest Markdown because it's the most common rich-text-in-plain-text format on the Internet, and because there's no standard I'm aware of to receive MS Word formatting into a standard HTML form element. I expect there are custom MS extensions for Edge that can do it, but I don't have a desktop install of Word available to test with. Building on Markdown also helps with identifying indent levels of nested lists, which would be unreasonably hard to do with regexes.
There are also fully-featured javascript client-side rich text editors like CKEditor which you could integrate, and those will submit HTML to the back-end, no Perl translation required. They may have much better support for stuff pasted from Word, but some require a paid license for professional use, and you'd have to spend some time finding which one works the best for your use case. | [reply] [d/l] |
> There are also fully-featured javascript client-side rich text editors like CKEditor
Or TinyMCE, ...
Those client side solutions can handle the paste event, access the navigator.clipboard object in the DOM and then choose between the best MimeTypes offered.
From personal experience I know that TinyMCE at least offers to display the content as some kind of HTML.
Pretty off topic here (no Perl involved) and probably beyond the skills of the OP.
| [reply] [d/l] |
Beyond the skills? They'd just follow the installation instructions of the component, and simplify their controller.
| [reply] |
Here is an example solution. It may not be the best but at least it is correct. You can take this and modify it further as your requirements change.
use strict;
use warnings;
use utf8;
use Test::More tests => 1;
my $in = <<EOT;
This is a test posting.
Hello there!
How are you?
Very well I hope!
This is the end of the posting.
EOT
my $want = '<p>This is a test posting.<ul><li>Hello there!</li><li>How
+ are you?</li><li>Very well I hope!</li></ul>This is the end of the p
+osting.</p>';
my $have = '<p>';
my $inlist = 0;
for (split /\n/, $in) {
if (s/\s*/<li>/) {
$_ .= '</li>';
$_ = '<ul>' . $_ unless $inlist;
$inlist = 1;
} elsif ($inlist) {
$_ = '</ul>' . $_;
$inlist = 0;
}
$have .= $_;
}
$have .= '</p>';
is $have, $want;
| [reply] [d/l] |
Even here the text is not displayed as entered. The info on the copied Word document was in the form of a statement, a bulleted list, and another statement. Here is is shown as two sentences.
| [reply] |
Thats because you submitted two sentences with some dots in them and not properly formatted html with <ul> and <li>
Had you done <pre> pastedtext </pre> the formatting wouldve been preserved
What you want is almost impossible from a back end program and a simple text area
have you tried pasting bold, italic or different fonts into your text area?
- Using the pre (for preformatted) tag is as close as you will get
- without implementing a WYSIWYG editor on your form
| [reply] |