in reply to Re: Create a dictionary from wikipedia
in thread Create a dictionary from wikipedia

There are ways to remove the markup using regexes. Try this:

$page = "my ##Media Wiki [text|here]"; %wordcount; @words = split /(\s*|#|\[|\||\]|@|$|!|.|,)/ $page; foreach $word (@words) { $wordcount{$word}++ if $word =~ /\w/; } foreach $word (keys %wordcount) { print "$word\t$wordcount{$word}\n"; }
I hope this helps.

--linuxkid


imrunningoutofideas.co.cc