Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Finally, let's end this initially wanted recreational thread with a more serious like post... here is the complete code for this 'longest words' thingie.
Please note that this is a revisited version, using nice and clean (a.k.a. 'serious') coding style :))

#!/usr/bin/perl -Twl ###################################################################### # Which language usually uses the longest words? #### # coming from a recreational conversation, this issue was transformed # by the PerlMonks fellows in a real coding debate, so here I come # back with a complete and hopefully quite clean code, in order to # offer some useful maybe to the profane from this recreational thread # #### # AUTHOR: # Marius FERARU, aka AltBlue # DATE: # 2002.11.13 # LICENSE: # Beerware, as the source of all this debate :P~ ###################################################################### use strict; use LWP::Simple; use HTML::Parser 3.00; use POSIX qw(setlocale LC_CTYPE); use locale; use Data::Dumper; { ### Locale, URL pairs that are to be parsed my $ToParse = [ [ 'ca', 'file:/var/www/html/index.html.ca' ], [ 'cz', 'file:/var/www/html/index.html.cz' ], [ 'de', 'file:/var/www/html/index.html.de' ], [ 'dk', 'file:/var/www/html/index.html.dk' ], [ 'ee', 'file:/var/www/html/index.html.ee' ], [ 'el', 'file:/var/www/html/index.html.el' ], [ 'en', 'file:/var/www/html/index.html.en' ], [ 'es', 'file:/var/www/html/index.html.es' ], [ 'fr', 'file:/var/www/html/index.html.fr' ], [ 'it', 'file:/var/www/html/index.html.it' ], [ 'nl', 'file:/var/www/html/index.html.nl' ], [ 'nn', 'file:/var/www/html/index.html.nn' ], [ 'no', 'file:/var/www/html/index.html.no' ], [ 'pt', 'file:/var/www/html/index.html.pt' ], [ 'ru', 'file:/var/www/html/index.html.ru.koi8-r' ], [ 'se', 'file:/var/www/html/index.html.se' ], [ 'zh', 'file:/var/www/html/index.html.zh' ], ]; run_tests($ToParse); } sub run_tests { local $" = ', '; foreach (@{$_[0]}) { setlocale(LC_CTYPE, $_->[0]); print $_->[1]; my @lw = longest_words(strip_html(get($_->[1]))); print length($lw[0]), ' letters: ', "@lw"; } } ###################################################################### # Grabs a chunk of data, computes a list of the longest unique words # around and returns it sorted alphabetically sub longest_words { my $data = shift || return (); my $max = 0; my %longest; while ($data =~ /\b(\w+)\b/sg) { my $word = $1; my $length = length $word; next if $length < $max; if( $max < $length ) { $max = $length; %longest = (); $longest{$word} = 1; } elsif( $max == $length ) { $longest{$word} = 1; } } sort { lc($a) cmp lc($b) } keys %longest; } ###################################################################### # HTML stripping routine sub strip_html { my $buffer = ''; my $p = new HTML::Parser ( api_version => 3, marked_sections => 1, text_h => [ sub { $buffer .= $/ . $_[0] }, 'dtext' ], ); $p->ignore_elements(qw(script style)); $p->parse("@_"); $p->eof; $buffer; }

--
AltBlue.


In reply to Complete 'clean' code (Re: The longest word) by AltBlue
in thread The longest word .... by AltBlue

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-24 16:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found