I imagine you'd face the following problems:
- Character encoding issues
- Document encoding issues
- False positives from non-text components of the document (e.g. If I search for the word "bold", will I find matches that aren't in the document?)
- False negatives from text interrupted by formating codes (e.g. Can you match a sentence containing a bolded word?)