in reply to Converting doc to txt without WIN32::OLE

I did some work on Text::Extract::Word last September. See: Re: Problem in Text::Extract::Word.

I put two super simple scripts together for you that put the files into a string. For the test.docs, I used the test.docs from Text::Extract::Word/t directory:
#single file #!/usr/bin/perl -l use strict qw/refs/; use warnings FATAL => 'all'; use Text::Extract::Word; binmode STDOUT, ':encoding(UTF-8)'; my $file = '/root/Desktop/xls/test1.doc'; my $extractor = Text::Extract::Word->new($file); my $string = $extractor->get_text; print "$string"; close STDOUT;
#!/usr/bin/perl BEGIN { $| = 1; } use autodie; use strict qw/refs/; use Text::Extract::Word; use warnings FATAL => 'all'; binmode STDOUT, ':encoding(UTF-8)'; my(@data) = qw( test1.doc test2.doc test3.doc test4.doc test5.doc test6.doc ); foreach my $data(@data) { my $file = Text::Extract::Word->new($data); my $str = $file->get_text; print "$str===>File done<===\n\n"; sleep 2; } close STDOUT;

Replies are listed 'Best First'.
Re^2: Converting doc to txt without WIN32::OLE
by mrguy123 (Hermit) on Jun 21, 2012 at 07:56 UTC
    Hi, thanks for the scripts
    Problem is that when I run them I get:
    Can't locate object method "new" via package "Text::Extract::Word" at +monks.pl line 11.
    I guess that there was something wrong with the installation or that the module isn't stable- I will try to fix it. The legacy interface works but not very well (some docs just aren't parsed)
    # legacy interface use Text::Extract::Word qw(get_all_text); my $text = get_all_text("test1.doc");
Re^2: Converting doc to txt without WIN32::OLE
by mrguy123 (Hermit) on Jun 21, 2012 at 08:17 UTC
    OK, so it seems I am working with the legacy code for some reason. I am now installing the new version...hope it works better