Hello exalted monks,

I have a problem that I cannot seem to find within the monastery nor 'out there in Google-land'.

I have been looking at converting Word documents to PDF format using perl, and soon realised that I hit my programming and time limits and thought of converting the Word .doc to .rtf first. Then I could use a couple of modules available from CPAN. One to convert from rtf to HTML (RTF::HTML::Converter) and then take the HTML file and put it through a HTML to PDF conversion module (PDF::FromHTML). Kind of like putting a cow through 2 black boxes and getting barbeque steak at the end...

The only problem is that the first black box (RTF::HTML::Converter) seems to hang the server and returns nothing unless the .rtf file is very simple. My code that attempts the conversion follows, which uses the modules in a way that closely resembles the usage in their respective documents:

#!/usr/bin/perl use strict; use warnings; use CGI; use CGI::Carp qw(fatalsToBrowser); my $q=CGI->new; use RTF::HTML::Converter; use PDF::FromHTML; print $q->header; print $q->start_html; my $base_directory = '.'; my $base_filename = 'text_only1'; my $rtf_file = "$base_directory/$base_filename" . '.rtf'; my $html_file = "$base_directory/$base_filename" . '.html'; my $pdf_file = "$base_directory/$base_filename" . '.pdf'; open (RTF_FILE, "< $rtf_file") || die "Couldn't open RTF file: $!"; open (HTML_FILE, "> $html_file") || die "Couldn't open HTML file: $!"; # Convert the rtf file to HTML format my $file = RTF::HTML::Converter->new(output => \*HTML_FILE); $file->parse_stream( \*RTF_FILE ) || die "Error converting RTF to HTML +: $!"; close RTF_FILE; close HTML_FILE; print "Converted RTF to HTML.<br />\n"; # Convert the HTML file to PDF format my $pdf = PDF::FromHTML->new( encoding => 'utf-8' ); $pdf->load_file($html_file); $pdf->convert( Font => '/path/to/font.ttf', LineHeight => 10, Landscape => 0, ); $pdf->write_file($pdf_file); print "Converted HTML to PDF.<br />\n"; print $q->end_html;
Have any monks here experienced this behaviour with that module, or even walked a different path to start with RTF and arrive at HTML (or better yet, PDF)?

Any help would be greatly appreciated.

mlh2003


In reply to Converting RTF documents to PDF format by mlh2003

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.