I have a text file that contains text stripped from a PDF document. This text contains non-ascii characters that I have to remove before I can run it through some text-mining software.
I have looked at the ord function to remove the ascii values that are not in the basic ascii table, but I am not sure how to use this over the whole text file. I thought of parsing each line, then looking at each letter/non-letter in turn. I have also looked at the previous searches on text cleaning but these are just for stripping out letters and desired content - not non-ascii.
Does anybody have any recomendations for removing these chars?
many thanks,
MonkPaul
In reply to Removing Non-Ascii chars from text file by MonkPaul
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |