in reply to Parsing HTML question
Some algorithms use a list of stopwords (which are words that are so frequent they will poison any database you use for catalogueing the webpage / searching). Typical words are like "the", "a", "who", ... Which I find to be very unfair if you are a fan of The Who!
CountZero
A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parsing HTML question
by GrandFather (Saint) on Jun 24, 2008 at 02:18 UTC | |
|
Re^2: Parsing HTML question
by vit (Friar) on Jun 23, 2008 at 23:57 UTC | |
by chromatic (Archbishop) on Jun 24, 2008 at 01:08 UTC | |
by vit (Friar) on Jun 24, 2008 at 03:08 UTC | |
by CountZero (Bishop) on Jun 24, 2008 at 05:06 UTC | |
by vit (Friar) on Jun 24, 2008 at 19:27 UTC | |
by moritz (Cardinal) on Jun 24, 2008 at 19:39 UTC |