Re: Design Question
by saintbrie (Scribe) on May 13, 2004 at 20:13 UTC
|
I'd suggest using HTML for the interface. You've already got the skills and it is fairly easy to build an attractive look and feel for the thing. Going with HTML has the added bonus that the programs would be available from anywhere there is internet access. (Do I need to discuss the virtues of the client/server development model?)
You might try Win32::OLE. You'll need a working copy of MS-Word on the computer you are programming for. I don't think there is a module specifically that parses templates. You'll probably end up rolling your own.
Since they already have their stuff in Word, Win32::OLE may be your best bet. You can probably do the same (or similar) with PDF. (Generate postscript, send to ps2pdf or equivalent). Heck you might even be able to make templates out of PDF files. I know there's text in some (though not all) PDF files. Perhaps a monk higher up the food chain would care to comment?
| [reply] |
Re: parse MS Word Template fields for legal documents
by serf (Chaplain) on May 14, 2004 at 11:58 UTC
|
I am in the process of generating Word documents using Perl as well, I didn't think about Office HTML, but about 5 years ago I made some RTF documents from a shell script, so I started from scratch and tried that again from Perl, which has turned out to be very easy...
I reverse engineered an RTF document by saving a basic one, then stripping out tags using a text editor till I had the bare bones of what I needed and nothing else. (and it would still open without crashing Word!) This is what I found works as a basis:
http://ref.a32.net/technical/file_format/rtf/basic_table_rtf_source.txt
Although I expect it's more flexible and robust to use horrible Office HTML as people have pointed out - if you can handle that
You say you have "Word Template fields" - from my memories of my days of working with that stuff - you use that for doing a mail-merge with a master (now called "main"?) document right?
If that's the case then all you should have to do is export to some database-like format (possibly Excel compatible HTML, or even just CSV - TOO easy!) and then do a mail merge... pulling the data from that document/file into Word...
I've just looked in the help in Word where it says:
"What types of data sources can I use?
You can use just about any type of data source that you want, including a Word table, Microsoft Outlook contact list, Excel worksheet, Microsoft Access database, or ASCII text file. If you haven't already stored information in a data source, Word guides you step by step through setting up a Word table that contains your names, addresses, and other data."
Try this Office help link in I.E. if you're using Word 2000:
mk:@MSITStore:C:\Program%20Files\Microsoft%20Office\Office\1033\wdmain9.chm::/html/wdconOverviewOfMailMerge.htm
Or just crank up Macro$haft Wurd Help and type in "mail merge" :o)
IHTH...
| [reply] |
Re: parse MS Word Template fields for legal documents
by dragonchild (Archbishop) on May 13, 2004 at 19:55 UTC
|
Use VB, not Perl. It sounds like that will make your life a lot easier as it has better integration into the Windows world. VBScript is quite powerful, at least for what you need.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] |
|
|
Although there are reasons why you might want to stay away from perl for this type of application, you actually may have a very good case (no pun intended) for using perl, even with MSFT Word. Any experienced application developer would suggest you use a database for this type of thing, but you already said you dont have one. Therefore, although its not necessarily the obvious choice, perl may actually be a very good fit for you (assuming you are competent with it).
Consider these facts:
TEMPLATING: Perl is probably the best package for developing an easily maintained, well designed templating solution. If you do not 'over engineer' it, you can produce good stuff that works quickly. Moreover, you have *much* better string manipulation and delimiting capabilities than with VB (string manipulation and quoting is one of the biggest annoyances with VB) which plays into any 'fill in the blank' templating system.
GUI INTERFACE: Perl in combination with a very easy 'front end' will almost certainly be a design requirement in order to make the law office happy. They should not have to know that perl is at the 'guts' of your application. I would recommend using HTA (since it already leverages your knowlege of HTML, as opposed to MSFT office and VBA. Unless you know VB and you don't mind being 'locked' into MSFT office, steer away from VB)
OFFICE HTML: Most people don't realize this, but you can use perl to easily spit out MSFT office documents by simply saving the documents as MSFT office HTML. This enables you to steer clear of the proprietary binary format while still maintaining the precise formatting that lawyers go nuts over. What this means is that you can build a data driven extensible application that does not require a backend database or any fancy conversion software to output MSFT compatible documents. Again, a backend database is good to have for this kind of thing, but not an absolute requirement.
REPURPOSE YOUR PERL CODE: This also means that you can *repurpose* your code to output *anything* that supports text (for example, a lawyer will love you when you tell them that your document 'fill in' solution can also be used to help track billable hours, and also send it to their timekeeping software, this will also win you brownie points for being a genius).
| [reply] |
|
|
I've tried the Office HTML approach, (with Excel not Word), and it works great, but there are a couple of limitations. Once is that unless you learn MSFT's bizarre XML-ish syntax, you can't use many of the features of these applications (then again, maybe that's a good thing :). Second, though this may not apply to your case, it's hard to tell Office what the types of your data are, which can affect things. Third, for especially large documents, it takes Office longer to process HTML than it does its native formats. If none of these apply to your situation (and they may not), then HTML is my suggestion too.
| [reply] |
|
|
|
|
I dealt with a similar issue. It was legal documents in WA state with line numbers every 3rd line and horizontal and vertical bars at specific measurements and specific widths. I ran into problems when trying to save those as HTML files. The format for the legal documents was very important, and I had a really hard time making the HTML output correctly. Frankly, I never got it to work right.
So, I ended up using perl to write text files that contained data the user had typed into fields in a web form. When they hit submit IN INTERNET EXPLORER, perl would write a file and then spit out code to make the browser execute the MS Word document and I scripted the mail merge with the data in the file. It was a hackish solution that wouldn't work on a public web page, but it was fine for this two person office.
The other thing I explored was Adobe Acrobat. They have a scripted way to insert data into fields, and a perl interface already written. If they are willing to splurge on the cost of Acrobat and translate all their documents to PDFs, you could use PDF Forms (FDF?) and easily script them in perl.
| [reply] |
|
|
I should add that MSFT OFFICE HTML also allows you to
steer clear of Win32::OLE. Not a bad option, but it requires
a copy of MSFT Word on the machine and it also adds an extra level of abstraction and indirection that may be difficult to debug. It is almost always preferrable to simply output text. Which can be opened in word, a web browser, or even a competing office package like OO
| [reply] |
Re: parse MS Word Template fields for legal documents
by NetWallah (Canon) on May 13, 2004 at 19:57 UTC
|
I would design this as a small database/web application.
Since you use ms WORD, you presumably have ms ACCESS as well, so it would be easy to create CLIENT and DOCTEMPLATE tables. The next step is to make web pages to update those -Access can create these also, but you may want to customize it beyond that.
Now you are ready to create a web page that allows the user to select a client, and a template, then supplies a "Print or save document" button.
Lots of programming choices here - you can use MS Word's merge capibility, you can store the templates as BLOBs in the database ..have fun...
Offense, like beauty, is in the eye of the beholder, and a fantasy.
By guaranteeing freedom of expression, the First Amendment also guarntees offense.
| [reply] |
|
|
But Access (if used as a database in its own right, not just as a front end to to real database) is strictly one-user-at-a-time only. So this would not work in a multi-user set-up.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
|
I believe the "single-user" issue was true for Access prior to Office2000. A single DB can be used by up to 10 users (I have seen 4).
In any case, I was suggesting ODBC access (via the web) to the database, although I did not explicitly state that.
Offense, like beauty, is in the eye of the beholder, and a fantasy.
By guaranteeing freedom of expression, the First Amendment also guarntees offense.
| [reply] |
|
|
Re: parse MS Word Template fields for legal documents
by CountZero (Bishop) on May 14, 2004 at 03:38 UTC
|
My first thought was also to do this in VB or VBA, but hey, this is the Perlmonks Monastery not the VB Nunnery (with my apologies to the members of the religious fairer sex).If the existing templates are already set up to use the fields through some form of "mail merging" then all you have to do is have your Perl script spit out a "mail merge database file" and start MSWord through OLE and have it process the mail merge database file. If mail merge is not an option, you're out-of-luck: you probably do not want to translate all existing template files into the MSWord HTML-format and then through some Templating system insert the right data at the appropriate places. As these templates are prone to be changed at any odd time, you will have to redo this exercise again and again.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
|
Yeah, but then some call that 'job security' *shrug*.
| [reply] |
Re: parse MS Word Template fields for legal documents
by Anonymous Monk on May 13, 2004 at 21:07 UTC
|
Actually, everyone recently upgraded to Windows XP Professional and no MS Access to be found; anywhere. hmmm...
Would you recommend redoing the templates into an HTML form as well?
Seems only practical to keep this local application as a small web site. | [reply] |