Re^2: Plain Text To HTML

More clarification.

The form input is sent to a program on a web server to be processed and displayed as a part of a webpage that the site user will see. The form input might be simply typed into the 'text area' or a document might be copied and pasted into the 'text area'. In either case the final product for display will require some HTML formatting. Anyone that is authorized can post info for display at our website and the method must be simple, either type info into the form or copy and paste something into the form. In either case it is unlikely there will be any HTML formatting. Consequently we need to be able to format the input with the program that is accepting the input before it is displayed.

Hope this makes clear what it is that I am trying to do.

Comment on Re^2: Plain Text To HTML

Replies are listed 'Best First'.
Re^3: Plain Text To HTML by hippo (Archbishop) on Sep 19, 2024 at 14:41 UTC
It would aid your cause greatly if you were to show sample input as pasted into the form (say 3 lines max) and the equivalent desired HTML of that input once it has been transformed. At the moment everyone is left to guess what it is that you actually want to happen during this transformation. When you have a moment, perhaps a read of How to ask better questions using Test::More and sample data will be of help. 🦛	[reply]
Re^4: Plain Text To HTML by Milti (Beadle) on Sep 19, 2024 at 16:01 UTC
Here's a sample input. It is a copy and paste of a Word document: This is a test posting. • Hello there! • How are you? • Very well I hope! This is the end of the posting. As you can see, as entered here the dots are not indented as they were on the document copied. What I am looking for is a routine that will read the input and display it like this code would. `<p>This is a test posting.<ul><li>Hello there!</li><li>How are you?</li><li>Very well I hope!</li></ul>This is the end of the posting.</p>`	[reply] [d/l]
Re^5: Plain Text To HTML by NERDVANA (Priest) on Sep 19, 2024 at 16:32 UTC
I think your best bet is to officially declare that your application accepts Markdown format, and then add some tweaks for anything that Word produces which doesn't format the way you like. For instance, start with the CommonMark library, and if it doesn't like those bullet point characters in your example, you could write a quick search/replace of `$text =~ s/\x{2022}/*/g;` or whatever is required to make it valid Markdown. I suggest Markdown because it's the most common rich-text-in-plain-text format on the Internet, and because there's no standard I'm aware of to receive MS Word formatting into a standard HTML form element. I expect there are custom MS extensions for Edge that can do it, but I don't have a desktop install of Word available to test with. Building on Markdown also helps with identifying indent levels of nested lists, which would be unreasonably hard to do with regexes. There are also fully-featured javascript client-side rich text editors like CKEditor which you could integrate, and those will submit HTML to the back-end, no Perl translation required. They may have much better support for stuff pasted from Word, but some require a paid license for professional use, and you'd have to spend some time finding which one works the best for your use case.	[reply] [d/l]
Re^6: Plain Text To HTML by LanX (Saint) on Sep 21, 2024 at 10:27 UTC
Re^7: Plain Text To HTML by NERDVANA (Priest) on Sep 23, 2024 at 04:22 UTC
Some notes below your chosen depth have not been shown here
Re^5: Plain Text To HTML by hippo (Archbishop) on Sep 19, 2024 at 18:08 UTC
Here is an example solution. It may not be the best but at least it is correct. You can take this and modify it further as your requirements change. use strict; use warnings; use utf8; use Test::More tests => 1; my $in = <<EOT; This is a test posting. • Hello there! • How are you? • Very well I hope! This is the end of the posting. EOT my $want = '<p>This is a test posting.<ul><li>Hello there!</li><li>How + are you?</li><li>Very well I hope!</li></ul>This is the end of the p +osting.</p>'; my $have = '<p>'; my $inlist = 0; for (split /\n/, $in) { if (s/•\s*/<li>/) { $_ .= '</li>'; $_ = '<ul>' . $_ unless $inlist; $inlist = 1; } elsif ($inlist) { $_ = '</ul>' . $_; $inlist = 0; } $have .= $_; } $have .= '</p>'; is $have, $want; [download] 🦛	[reply] [d/l]
Re^5: Plain Text To HTML by Milti (Beadle) on Sep 19, 2024 at 16:12 UTC
Even here the text is not displayed as entered. The info on the copied Word document was in the form of a statement, a bulleted list, and another statement. Here is is shown as two sentences.	[reply]
Re^6: Plain Text To HTML by Maelstrom (Beadle) on Sep 21, 2024 at 07:57 UTC