in reply to fix ODT files with line breaks looking poor

melutovich:

I don't do a lot of XML processing, but if I had to do what you're talking about, I think I'd reach for XML::Twig. It will handle the XML parsing for you, and you can add handlers for recognizing particular tags in which you can edit the XML. You might be able to handle your task by adding a handlers for <text:p> and <text:line-break> to let you detect the line breaks and break the content into multiple paragraphs with the correct style.

Another option might be XML::XSLT to write transformation rules to alter the document. I've used XSLT in projects before, and it worked well. The difficulty I had with it is that it's essentially another language, and since I didn't use it often, each project with it was a learning experience. If you're going to do a lot of transformations it may be worth your while to learn it.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: fix ODT files with line breaks looking poor
by melutovich (Acolyte) on Apr 07, 2019 at 15:20 UTC

    I'll give this some thought.

    For your suggestion on using XML::Twig unfortunately it would not be that simple as I discovered the <text:line-break> can occur in perhaps various parent/enclosing tags; if I pursue this, I'll have to see if XML::Twig will tell me the parent/enclosing tag during the <text:line-break> handler...

    I see that there is a module OpenOffice::OODoc which uses XML::Twig, however the last release is from 2010.

    I'll give XML::XSLT a quick look also.

    Thanks for a quick reply