I think I should clarify the confusion pertaining to the � stuff =). That character was inserted when I copied the verbatim output of the funky char (\375) to pm -- because pm isn't unicode, and my terminal was... Or that's my guess.
Anyway, my problem with web scraping is because HTML::TreeBuilder encodes as some funky encoded html_entity thingy, and that always bites me. I often just want to remove them for simplicity rather than decode the entities and fumble with the complexities of the module.