If like me you search since a long time ago the solution to transform WORD DOCUMENT (extension word,rtf ...)in simple text document you will probably interest by this piece of code All right it make a dependance with windows but it to good result For this , Just using APACHE / PERL on WIN32 with WORD 97,98 or 2000
#!perl -w # If like me you search since a long time ago the solution to # transform WORD DOCUMENT (extension word,rtf ...)in simple text docum +ent # you will be interest by this piece of code # All right it make a dependance with windows but it to good result # # For this , Just using APACHE / PERL on WIN32 with WORD 97,98 or 2000 # use CGI ; use Win32::OLE; use Win32::OLE::Const 'Microsoft Word'; my $query=new CGI; print $query->header; my $filepath=$query->param('document'); my $html=$query->param('html'); my ( $filename , $inputFileName ) ; if ( $filepath ne '' ) { if ($filepath =~ /([^\/\\]+)$/) { $filename = $1 ; } else { $filename= $filepath ;} $filename =~ s/\s+//g; $inputFileName= "c:\\doctotexte\\$filename"; if (!open(WFD,">$inputFileName")) { print "Error last recording file on HD"; exit(0); } while ($bytes_read=read($filepath,$buff,2096)) { binmode WFD; print WFD $buff; } close(WFD); my($outputFileName) = "c:\\doctotexte\\$filename.doc"; die("Can't find $inputFileName\n") if (! -e $inputFileName); unlink($outputFileName); my($word) = Win32::OLE -> new('Word.Application', 'Quit'); my($doc) = $word -> Documents -> Open($inputFileName); $word -> {DisplayAlerts} = 0; # Stop msg box: Do you wish to sa +ve...? $word -> {Visible} = 1; # Watch what happens. $doc -> SaveAs($outputFileName, wdFormatTextLineBreaks); $doc -> Close(); $word -> Quit(); # Success. open ( FIC , $outputFileName ) ; my $texte = join ( '' , <FIC> ) ; close ( FIC ) ; if ($html eq 'on' ) { $texte =~ s/\n/<br>\n/sgi; } print $texte ; unlink($inputFileName); unlink($outputFileName); } else { print 'No Attachment in document field' ; } # --------------------------------------------------------------------

In reply to WORD TO TEXT SIMPLY by iguane

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.