Ok, I'll play 'nasty little boy' too (I remember!)
Of course I had to try the stemming that is built-in in PostgreSQL's full-text search (FTS). I had'nt used it for a while; so this is just playing with it. Below are results of stemming and the distinction between words and stop-words.
I think this FTS-stuff uses snowball, and I don't know how recent the vocabulary is. (UPDATE: I see regular snowball-related updates (every few months) in the PostgreSQL git log so I now think its snowball stuff is reasonably up-to-date)
-- Below are three chunks/resultsets:
-- 1. Your text
-- 2. Real words:
-- select .. from ts_debug('german', '$yourtxt')
-- where lexemes > 0
-- 3. Stop-words:
-- select .. from ts_debug('german', '$yourtxt')
-- where lexemes = 0
txt
----------------------------------------------
Ich Bin Der Geist, Der Stets Verneint! +
Und Das Mit Recht; denn alles, was entsteht,+
Ist wert, daß es zugrunde geht; +
Drum besser wär's, daß nichts entstünde. +
So ist denn alles, was ihr Sünde, +
Zerstörung, kurz, das Böse nennt, +
Mein eigentliches Element.
(1 row)
alias | token | dictionary | lexemes
-----------+--------------+-------------+------------
asciiword | Geist | german_stem | {geist}
asciiword | Stets | german_stem | {stet}
asciiword | Verneint | german_stem | {verneint}
asciiword | Recht | german_stem | {recht}
asciiword | entsteht | german_stem | {entsteht}
asciiword | wert | german_stem | {wert}
asciiword | zugrunde | german_stem | {zugrund}
asciiword | geht | german_stem | {geht}
asciiword | Drum | german_stem | {drum}
asciiword | besser | german_stem | {bess}
word | wär | german_stem | {war}
asciiword | s | german_stem | {s}
word | entstünde | german_stem | {entstund}
word | Sünde | german_stem | {sund}
word | Zerstörung | german_stem | {zerstor}
asciiword | kurz | german_stem | {kurz}
word | Böse | german_stem | {bos}
asciiword | nennt | german_stem | {nennt}
asciiword | eigentliches | german_stem | {eigent}
asciiword | Element | german_stem | {element}
(20 rows)
alias | token | dictionary | lexemes
-----------+--------+-------------+---------
asciiword | Ich | german_stem | {}
asciiword | Bin | german_stem | {}
asciiword | Der | german_stem | {}
asciiword | Der | german_stem | {}
asciiword | Und | german_stem | {}
asciiword | Das | german_stem | {}
asciiword | Mit | german_stem | {}
asciiword | denn | german_stem | {}
asciiword | alles | german_stem | {}
asciiword | was | german_stem | {}
asciiword | Ist | german_stem | {}
word | daß | german_stem | {}
asciiword | es | german_stem | {}
word | daß | german_stem | {}
asciiword | nichts | german_stem | {}
asciiword | So | german_stem | {}
asciiword | ist | german_stem | {}
asciiword | denn | german_stem | {}
asciiword | alles | german_stem | {}
asciiword | was | german_stem | {}
asciiword | ihr | german_stem | {}
asciiword | das | german_stem | {}
asciiword | Mein | german_stem | {}
(23 rows)
Not perfect but more useful than I thought it would be without any work.