SpacemanSpiff has asked for the wisdom of the Perl Monks concerning the following question:

Good morning everyone. I've written a script to scrape posts off of an internet forum, but I'm having problems with one last part. Each post has a link in a specific location, but the link text is different on each post. Because of this, I can't program a static value in the find_link command. I thought I got around this by using the following bit of code:
$mech->find_link( text => "$PrevMsgLink"); my $topic_obj = $mech->find_link (text_regex => qr/$PrevMsgLink/i ); my $PrevMsg = $topic_obj->url;
This worked great until I ran into a post where the link had special characters: "eBay item 380565801 (Ends Jul-09-02 144739 PDT) -" When the script sees this, it dies out with the following error:

"Can't call method "url" on an undefined value at C:\Documents and Settings\Tony\Desktop\Perl\....."

So the question is, what can I do to get the script to find links with special characters? I've searched the site here, and read the manpages for the applicable modules and come up with nothing. I'll admit, I'm still a novice at all of this, and have looked right past my own answers before. Be gentle.

Thanks much in advance! tony

Replies are listed 'Best First'.
Re: How can I handle special characters in mech->find_link
by Corion (Patriarch) on Jun 12, 2006 at 07:49 UTC

    Your regex doesn't work because your string $PrevMsgLink contains characters that are special in regular expressions, notably the pair of parentheses. You can use \Q...\E to prevent those chars from being interpreted as regular expression special characters:

    my $topic_obj = $mech->find_link (text_regex => qr/\Q$PrevMsgLink\E/i +) or die "Didn't find anyhting matching '$PrevMsgLink' in the page!" +;

    But what you really should do instead of scraping the eBay website is to use the Net::eBay API, which gives you fast, robust and convenient access to all of eBay, within the limits of the eBay Terms of Use.

      Thanks a ton guys, that did it.

      *tries desperately to commit to memory*

      I'm actually scraping Yahoo Groups as our group is moving to another format. I know there were some modules out there to do this, but Yahoo changes their message format frequently, so they tend to be out of date. Besides, I figured it would be a great project to do some immersion training with Perl.

      Thanks again!

Re: How can I handle special characters in mech->find_link
by Joost (Canon) on Jun 12, 2006 at 07:52 UTC