in reply to Re: Create a dictionary from wikipedia
in thread Create a dictionary from wikipedia
There are ways to remove the markup using regexes. Try this:
I hope this helps.$page = "my ##Media Wiki [text|here]"; %wordcount; @words = split /(\s*|#|\[|\||\]|@|$|!|.|,)/ $page; foreach $word (@words) { $wordcount{$word}++ if $word =~ /\w/; } foreach $word (keys %wordcount) { print "$word\t$wordcount{$word}\n"; }
|
|---|