in reply to Re: Re: parse MS Word Template fields for legal documents
in thread parse MS Word Template fields for legal documents

I've tried the Office HTML approach, (with Excel not Word), and it works great, but there are a couple of limitations. Once is that unless you learn MSFT's bizarre XML-ish syntax, you can't use many of the features of these applications (then again, maybe that's a good thing :). Second, though this may not apply to your case, it's hard to tell Office what the types of your data are, which can affect things. Third, for especially large documents, it takes Office longer to process HTML than it does its native formats. If none of these apply to your situation (and they may not), then HTML is my suggestion too.
  • Comment on Re: Re: Re: parse MS Word Template fields for legal documents

Replies are listed 'Best First'.
Re: Re: Re: Re: parse MS Word Template fields for legal documents
by dimar (Curate) on May 14, 2004 at 20:52 UTC

    Good point. These limitations that Errto mentions can indeed be a major pain in the *fill-the-blank*. Therefore, here is a quick 'step-by-step' guide that may save you a lot of wasted time.

    STEP: Open the 'form letter' MSFT WORD document with the blanks (aka open ClientIntakeFormFoo.doc)

    STEP: Use MSFT WORD to fill in the document with obviously bogus data (e.g. FAKE_FIRSTNAME, FAKE_LASTNAME, FAKE_FOO, FAKE_BAR)

    STEP: Save the filled in document as ClientIntakeFormFoo.htm in MSFT HTML

    STEP: Search thru the file you just saved for every instance of m/FAKE_[^\s]+/

    STEP: replace the sections you found in the previous step with 'quotelike escapes' (e.g., dear, ^.$NAME.q^ we are gonna sue you if you dont pay ^.$AMOUNT.q^ .)

    STEP: enclose the entire html file with an 'outer quotelike' $sOutput = q^ DOCUMENT GOES HERE ^;

    STEP: save the entire htm file as a perl module that you can use with your perl scripts and you are basically done.

    Beware of all the limitations that Errto and others have mentioned, but this is a solution that should work well, because it saves you from having to learn the ugly and complicated MSFT markup. All you have to do is fill in your easily found 'blanks' ignore the rest. Be sure to enclose your document with a single 'quotelike' (not doublequotes), so that perl does not accidentally interpolate anything that occurs inside your file, other than the 'quotelike escapes' that you supplied.