Finally, let's end this initially wanted recreational thread with a more serious like post... here is the complete code for this 'longest words' thingie.
Please note that this is a revisited version, using nice and clean (a.k.a. 'serious') coding style :))

#!/usr/bin/perl -Twl ###################################################################### # Which language usually uses the longest words? #### # coming from a recreational conversation, this issue was transformed # by the PerlMonks fellows in a real coding debate, so here I come # back with a complete and hopefully quite clean code, in order to # offer some useful maybe to the profane from this recreational thread # #### # AUTHOR: # Marius FERARU, aka AltBlue # DATE: # 2002.11.13 # LICENSE: # Beerware, as the source of all this debate :P~ ###################################################################### use strict; use LWP::Simple; use HTML::Parser 3.00; use POSIX qw(setlocale LC_CTYPE); use locale; use Data::Dumper; { ### Locale, URL pairs that are to be parsed my $ToParse = [ [ 'ca', 'file:/var/www/html/index.html.ca' ], [ 'cz', 'file:/var/www/html/index.html.cz' ], [ 'de', 'file:/var/www/html/index.html.de' ], [ 'dk', 'file:/var/www/html/index.html.dk' ], [ 'ee', 'file:/var/www/html/index.html.ee' ], [ 'el', 'file:/var/www/html/index.html.el' ], [ 'en', 'file:/var/www/html/index.html.en' ], [ 'es', 'file:/var/www/html/index.html.es' ], [ 'fr', 'file:/var/www/html/index.html.fr' ], [ 'it', 'file:/var/www/html/index.html.it' ], [ 'nl', 'file:/var/www/html/index.html.nl' ], [ 'nn', 'file:/var/www/html/index.html.nn' ], [ 'no', 'file:/var/www/html/index.html.no' ], [ 'pt', 'file:/var/www/html/index.html.pt' ], [ 'ru', 'file:/var/www/html/index.html.ru.koi8-r' ], [ 'se', 'file:/var/www/html/index.html.se' ], [ 'zh', 'file:/var/www/html/index.html.zh' ], ]; run_tests($ToParse); } sub run_tests { local $" = ', '; foreach (@{$_[0]}) { setlocale(LC_CTYPE, $_->[0]); print $_->[1]; my @lw = longest_words(strip_html(get($_->[1]))); print length($lw[0]), ' letters: ', "@lw"; } } ###################################################################### # Grabs a chunk of data, computes a list of the longest unique words # around and returns it sorted alphabetically sub longest_words { my $data = shift || return (); my $max = 0; my %longest; while ($data =~ /\b(\w+)\b/sg) { my $word = $1; my $length = length $word; next if $length < $max; if( $max < $length ) { $max = $length; %longest = (); $longest{$word} = 1; } elsif( $max == $length ) { $longest{$word} = 1; } } sort { lc($a) cmp lc($b) } keys %longest; } ###################################################################### # HTML stripping routine sub strip_html { my $buffer = ''; my $p = new HTML::Parser ( api_version => 3, marked_sections => 1, text_h => [ sub { $buffer .= $/ . $_[0] }, 'dtext' ], ); $p->ignore_elements(qw(script style)); $p->parse("@_"); $p->eof; $buffer; }

--
AltBlue.


In reply to Complete 'clean' code (Re: The longest word) by AltBlue
in thread The longest word .... by AltBlue

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.