In case you're not aware of this, you can add 'lang=xx' attributes to both block-level and inline elements in HTML4 and later, which may or may not make parsing a bit easier.
One question for clarification: what should the system do if your user requests, for example, French, but the source document is Italian in origin, and has more translations for some 'chunks' (for want of a better term) in EN than FR?
i.e. chunk 1 has IT & EN translations, chunk 2 has IT, EN & FR, chunk 3 has IT only - chunk 2 would obviously return the FR version and chunk 3 the IT (as it's the only one available), but what about chunk 1? What would the user expect to see for that?
In reply to Re^3: Extracting appropriate language text from HTML data
by john_oshea
in thread Extracting appropriate language text from HTML data
by UnderMine
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |