in reply to fix ODT files with line breaks looking poor
I know nothing of ODT and the example presented by haukex fails to open so I can't test whether I broke something, but here's a possible solution using XML::Rules.
use strict; use XML::Rules; my $filter = XML::Rules->new( style => 'filter', namespaces => { 'urn:oasis:names:tc:opendocument:xmlns:text:1.0' => 'text', 'urn:oasis:names:tc:opendocument:xmlns:office:1.0' => 'office' }, rules => { _default => 'raw', # we do not care what's inside the tags, # we just want to preserve everything 'text:p' => sub { return $_[0] => $_[1] }, # this doesn't seem + to do anything, # but it's necessary. The filter mode sends everything out +side tags # with special rules directly to output 'text:line-break' => sub { my ($tag, $attrs, $parents, $parentAttrs, $parser) = @_; my $idx = $#$parents; # find the <text:p> tag enclosing th +is one $idx-- while ($idx >=0 && $parents->[$idx] ne 'text:p'); return $tag => $attrs if ($parents->[$idx] ne 'text:p'); # line break outside paragraph, leave alone my $level = $#$parents - $idx + 1; print { $parser->{FH} } $parser->parentsToXML( $level); #output the <text:p> and everything inside we read so far print { $parser->{FH} } $parser->closeParentsToXML( $level +); # close the opened tags all the way to the <text:p> print { $parser->{FH} } "\n"; foreach my $i ($idx .. $#$parents) { # remove the printed +content delete $parentAttrs->[$i]->{_content}; # leaves the at +tributes intact } return; # remove the <text:line-break/> } } ); $filter->filter( \*DATA, \*STDOUT); __DATA__ <?xml version="1.0"?> <office:document-content office:version="1.2" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"> <office:body><office:text> <text:p text:style-name="P1"> Fo<text:span text:style-name="T1">o<text:line-break/> B</text:span><text:span text:style-name="T3">a</text:span> <text:span text:style-name="T5">r<text:line-break/></text:span> </text:p> </office:text></office:body> </office:document-content>
The code will work correctly (provided I understood the requirements right) no matter how many tags are open within the <text:p>.
Jenda
Enoch was right!
Enjoy the last years of Rome.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: fix ODT files with line breaks looking poor
by haukex (Archbishop) on Apr 17, 2019 at 19:48 UTC | |
by Jenda (Abbot) on Apr 18, 2019 at 13:11 UTC |