GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

I have a Perl app that translates HTML generated from Word documents into TWiki format files. At present the code works well with Word 97 generated HTML, but there are issues with tables for the HTML generated by later versions of Word.

My quandry is: should I publish what I have (probably to the Code section in the Monastery: comments?), or should I beat on it a while longer to sort out the table issues?

Publishing early means people get to use the code sooner, and maybe find a few bugs. Publishing later means the code will be in better shape.


Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re: Publish or Polish
by Tanktalus (Canon) on Jun 22, 2005 at 21:04 UTC

    Isn't the open-source motto "release early, release often"? I vote for that - you can offer what you have, others may submit patches, you can offer the combined efforts, more patches, more offerings, and eventually (hopefully not too quickly) it all stabilises into something hand-crafted by dozens of authors for dozens of inputs for your Twiki-format output.

    Just my two cents CDN.

Re: Publish or Polish
by polettix (Vicar) on Jun 22, 2005 at 20:57 UTC
    At present the code works well with Word 97 generated HTML
    This seems a good point to publish it now. You have something working and useful, and if you clearly document the limitations it's correct to propose it to the others. Moreover, you would probably address a good percentage of the target - I don't think that there are so few Word 97 installations hanging around.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
Re: Publish or Polish
by ww (Archbishop) on Jun 22, 2005 at 22:09 UTC
    Semi-OT: word generated .html is, in a word, "YUCKY!" ...and, increasingly so, per version. 97 output was somewhat sane; not so, more recent versions.

    So, to address "polishing" issues, you may want to look at demoronizer (which is rather limited but eminently expansible... a project which has been (mostly 'sitting') on my ToDo shelf for far too long.

    But as to your basic question, I can only echo the advice: publish and update; don't delay.

      I've looked at both demoronizer and tidy. Tidy strips out stuff that is usefull (like <span> tags). Demoronizer I glanced at, but decided I didn't gain much using it as a pre-pass over the HTML.

      It's easier to use HTML::TreeBuilder to suck in the lot, then pull out the elements that I'm interested in. Mostly works pretty well. I get headings, tables, some character styles (like <code>) and anchors.


      Perl is Huffman encoded by design.
Re: Publish or Polish
by izut (Chaplain) on Jun 22, 2005 at 21:02 UTC
    If you publish your module now, you can be helped by other members that will use your code. If it is mature for Word 97 files, you can release it.
Re: Publish or Polish
by biosysadmin (Deacon) on Jun 22, 2005 at 21:17 UTC
    Publish now. I don't remember who said it, but it's a good idea to "release early, release often." It's not like you can't publish now AND publish later, that way you get the benefits of both scenarios.
      I first saw the "Release Early, Release Often" in Eric Raymond's great book "Cathedral and the Bazaar". From what I understood at the time, he was pointing out the the actual revolution of Linus was not in fact the kernel, but the methodology. The "promiscuous" programming practice of Linus was based on the Release Early, Release Often principle, and Raymond later proves that theory with his Fecthmail controlled experiment.

      I personally think it is no coincidence that the "Rational Process" recommends the Iterative and Incremental method by the late nineties as if it were a natural evolution of _their_ programming practice. IMHO, they were just trying to formalize what Linus had invented almost a decade earlier, but since "Open Source" (by OSI definition) was not coined yet, and it's practices were not accepted in the "Enterprise"[ 1 ], Rational took the opportunity and re-discovered warm water[ 2 ].

      Notes:
      [ 1 ] This is a term I truly despise, specially in the words of Gartner and alike, and in things like J2EE which a far from being anything "Enterprise".
      [ 2 ] Sorry if you might not cache the joke here, but it's a Venezuelan expression that refers to a foolish person that thinks he has discovered something new, when it's actually something well known.
Re: Publish or Polish
by gellyfish (Monsignor) on Jun 22, 2005 at 22:24 UTC
Re: Publish or Polish
by shiza (Hermit) on Jun 22, 2005 at 21:23 UTC
    Also, can't newer versions of Word export as XML?

      Maybe, but that is a whole different thing.

      I don't see XML options for saving with Word 2002 and even if there were I'd think the issues are unlikely to be solved that way.

      When starting this project I considered parsing .rtf or HTML and decided that the HTML tools for Perl were probably more mature so I went that way.


      Perl is Huffman encoded by design.

      Future versions of Word will publish in an XML "Open" format (it's not open, and is in fact a weapon MS is using against Open Source Software), but for now, no. From the looks of it the XML export option has a lot in common with Word-generated HTML. Ick.

      More importantly, tools are needed to get documents out of proprietary, vendor-locked formats into something you have a hope of opening in ten years.

Re: Publish or Polish
by zentara (Cardinal) on Jun 23, 2005 at 10:26 UTC
    Publish now as a beta release. Otherwise someone might beat you to it, and put a "software patent" on it. :-)

    I'm not really a human, but I play one on earth. flash japh
Re: Publish or Polish
by DrHyde (Prior) on Jun 23, 2005 at 09:03 UTC
    If you publish early (and I suggest publishing on the CPAN, not here) other people can polish it for you while you drink beer/fuck like a crazed bunny/go on holiday.
      I am not someone who is generally offended by bad language, I am someone who uses my fair share of bad language. However, PerlMonks is not one of the places that I expect to see it posted. I must say that I am dissapointed in the language used by DrHyde in Re: Publish or Polish. I have always thought of PerlMonks as a community where I could feel safe sharing anything here with my children (which I plan to have someday) or even my mother without fear of offending them. Or, possibly more importantly, that I wouldn't have to be concerned that the company censors may someday feel compelled to block PerlMonks because of "objectionable" content. Using f*** would have been just as effective and yet less offensive.

      Since I am sure that many disagree with me, I am expecting to get downvoted anyway, but I must be honest with what I feel.


      -Kevin
      my $a='62696c6c77667269656e6440676d61696c2e636f6d'; while ($a=~m/(^.{2})/s) {print unpack('A',pack('H*',"$1"));$a=~s/^.{2}//s;}