in reply to Plain Text To HTML

Microsoft Word one of the banes of my existence. I get Word docs sent to me all the time, and it frustrates me to no end, because on my Mac (my primary work machine), I refuse to (once again) buy a license to properly read/write their proprietary format.

My suggestion, since you asked? Ask Microsoft to standardize on a globally recognized format. Otherwise, write a CPAN distribution that handles their format that you consistently update when, at the whim of Microsoft, their proprietary format changes. My other suggestion? Demand people not send you files in a Microsoft Word format.

Replies are listed 'Best First'.
Re^2: Plain Text To HTML
by marto (Cardinal) on Sep 23, 2024 at 08:16 UTC

    The Office_Open_XML format has been the standard since 2007, and you don't need proprietary software to work with it.

      Plus there are various open source readers like LibreOffice available which don't require a licence.

      Alternatively web applications like Google Docs.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      The Office_Open_XML format has been the standard since 2007, and you don't need proprietary software to work with it.

      There is a lot to unpack in that assertion, it's all the harder since Groklaw.net is now offline. You don't need proprietary software to work with OOXML but then OOXML is not .docx and part of the question here is about MS Word documents which default to the proprietary .docx series.

      Yes, OOXML aka ISO/IEC 29500 is one format standard, it was whipped in great haste up to compete with the actual universal format, OpenDocument Format aka ISO 26300. Both are technically open standards, but while OOXML weighs in at well over 6,000 pages it is incompletely documented and no-one not even Microsoft implements it or even can implement it. In contrast ODF is fully documented, and fully implemented in Calligra, LibreOffice, and several others. ODF is already partially implemented in MSO, but that work appears to have stalled as MS has gone back to proprietary formats like the .docx series. Also, OOXML suffers from a tremendous amount of NIH while ODF re-uses many existing standards for components.

      As for the original question, converting from markdown to HTML would be one, as mentioned by nerdvana and anonymous monk. Markdown is rather close to plain text with minimal structure and it is easy to convert between markdown and HTML using Perl. However, structure is the key and one can go from more to less but one cannot automatically produce more detail from less detail.

      Milti, could you please explain more about the task?