mdavies23 has asked for the wisdom of the Perl Monks concerning the following question:

I have been able to tokenize a RTF document and then print it to another RTF document. My question is whether or not it is possible to keep the original formatting from the first document (font, font color, background color). There are somethings that are randomly colored in the document so keeping the formatting is important.

Here it the tokenizer code

#!usr/bin/perl use strict; use warnings; use RTF::Writer; use Data::Dumper; use RTF::Tokenizer; die "usage: $0 input output\n" unless @ARGV == 2; my $infile = shift; my $outfile = shift; my $tokenizer = RTF::Tokenizer->new(); $tokenizer->read_file($infile); my ( $token_type, $argument, $parameter ); { # reduce bogus warnings no warnings 'uninitialized'; # get past the header ( $token_type, $argument, $parameter ) = $tokenizer->get_token() until ($token_type eq 'control' and $argument eq 'par'); } my @final; while ($token_type ne 'eof'){ ( $token_type, $argument, $parameter ) = $tokenizer->get_token(); push @final, $argument if $token_type eq 'text'; } my $rtf = RTF::Writer->new_to_file($outfile); my @sorted = sort { my @fields_a = split / / , $a; my @fields_b = split / /, $b; chomp($a, $b); $fields_a[0] cmp $fields_b[0]; } @final; $rtf->prolog; $rtf->print(\@sorted); $rtf->close;

Replies are listed 'Best First'.
Re: Keeping RTF formatting when moving files
by Corion (Patriarch) on Jul 13, 2017 at 13:35 UTC

    Currently, you throw away all the tokens you read in.

    Instead of throwing them away, write them to your output stream instead.

      Which tokens are for the formatting?

        How would I know?

        In your code, here:

        { # reduce bogus warnings no warnings 'uninitialized'; # get past the header ( $token_type, $argument, $parameter ) = $tokenizer->get_token() until ($token_type eq 'control' and $argument eq 'par'); }

        you call ->get_token(), which reads a set of tokens and throws them away.

        Maybe consider changing your code to:

        my @tokens; { # reduce bogus warnings no warnings 'uninitialized'; do { # get past the header ( $token_type, $argument, $parameter ) = $tokenizer->get_token(); push @tokens, [$token_type, $argument, $parameter]; } until ($token_type eq 'control' and $argument eq 'par'); }

        and then, when writing your output document, first output all the saved stuff from @tokens.

        I don't know RTF and I don't know your input document and I don't know your wanted output stuff, so I can't give much more concrete help.

        While you cannot help me with the first issue, you can help me with the second and third issue by posting a short example of your RTF input and wanted RTF output, especially including a part that contains random coloured stuff or whatever you want to keep. If I have this, I guess that would be enough for me to fake knowing RTF enough to keep the formatting.