Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Need something to check if my word document on an NT is 54 lines of type or 6000 characters. Not sure what and how to get this to work?? This attempt here gives line number (including blanks) which is not what I want. I need it to just give line number with type on it only and be able to count how many characters.
$count=0; $file = "bas.txt"; open(FILE, "<$file"); $count++ while <FILE>; print "$count\n";

Replies are listed 'Best First'.
Re: Number of lines and characters
by CubicSpline (Friar) on Aug 28, 2002 at 15:19 UTC
    Before you try this out, you may actually want to look at what a Word .doc file actually looks like in a plain old text viewer. There's a lot of junk in there you don't want to count.

    I'd suggest saving your .doc file off as text first, if you really want to do this through perl.

    Once you've got just the text you want to count, you can do what you want in several ways. Here are my contributions:

    open FILE, "<$file"; my @lines = <FILE>; print "Lines: " . scalar(@lines) . "\n"; print "Chars: " . length(join '', @lines) . "\n"; close FILE;

    ~CubicSpline
    "No one tosses a Dwarf!"

Re: Number of lines and characters
by krujos (Curate) on Aug 28, 2002 at 14:27 UTC
    If you only have to do this with one file the counter in word is pretty good. tools->word count. Otherwise, assuming you have already opened the word document so that perl can read it (using one of the windows modules) you could do something like.
    while (<FILE>) { if(/\w/) { count ++; } }
Re: Number of lines and characters
by VSarkiss (Monsignor) on Aug 28, 2002 at 19:39 UTC

    That code is going to count the number of newline characters in your file, which in the case of a Word doc is next to meaningless.

    To do this right, you'll have to open the file with Word itself (using Win32::OLE), and then either use the Document properties, or walk through the document a paragraph or word at a time, using one of the iterators over the document. You can get an estimate from that, but not a real "line count" -- that depends on how the document's rendered by the printer, not the document itself. Hmm, you may be able to use the Print Preview.... /me slaps self

    I've used the Win32 modules for stuff like this, but never with Word, so I can't vouch for how well this will work. But in general you'll need a decent understanding of the Word object model (start the Visual Basic Editor and bring up the Object Browser).

    And, yes, it's very tedious programming....

Re: Number of lines and characters
by Anonymous Monk on Aug 28, 2002 at 14:12 UTC