mahesh1532 has asked for the wisdom of the Perl Monks concerning the following question:

I want to access the data from the word file in linux environment.especially i need to work on with the table elments in the document.Can anyone help me?
  • Comment on acessing the data from word(.doc) file in linux environment

Replies are listed 'Best First'.
Re: acessing the data from word(.doc) file in linux environment
by mwah (Hermit) on Sep 17, 2009 at 08:40 UTC
    word file in linux environment

    There seem to exist a lot of different options with very dfferent complexities, paired with different word-formats.

    If it's a plain old word 2000-2003 file and you already know what your tables look like and you need only some data from within some cells, you could do simply a:

    $> abiword --to=rtf myworddocument.doc
    and then:
    $> perl extract-table-cells.pl myworddocument.rtf

    in the latter (extract-table-cells.pl), you would simply search for:

    [pseudo] ... # table content part already extracted to $tablecontent @cells = $tablecontent =~ /} ([^}]*) }\\cell{/xgs; ...

    which might give you the cells in @cells.

    But it depends on your problem. Of what scale and purpose is your attempt?

    Regards

    mwa

Re: acessing the data from word(.doc) file in linux environment
by Anonymous Monk on Sep 17, 2009 at 07:34 UTC
Re: acessing the data from word(.doc) file in linux environment
by GrandFather (Saint) on Sep 17, 2009 at 08:34 UTC

    What type of Word document? There are many different versions of Word and each version can generate a variety of different document formats. Some formats (rtf for example) are well documented and have some degree of CPAN support. The native Word formats however are much more troublesome to manipulate on a non-Windows system. If you have any control over how the Word documents are generated ask for HTML or RTF.


    True laziness is hard work