I think we just need more info about what the OP actually has. All of these Unicode and HTML problems can be solved. The problem statement as it exists is not correct - the OP's code "works", albeit not the best. I have often had to resort to viewing a file in binary to find "hidden" characters. That is one possibility although I don't think this is likely if this is an HTML page that properly renders in a browser.