@andye
Um... no. Simply using space as delimiter would give me "monarchy.", "Japan.", "company?", and so on, not just the words themselves.
@dk
The text in the variable textBlock doesn't contain "ї" constructs -- it contains the real characters (for example, ț is "T with comma below"). The html encoding changed those characters into "ї" constructs when I submitted them.