in reply to Re: general regex question
in thread general regex question

To expand a little on TedPride's terse code segment, he is suggesting you open the page in your browser, save it as "surnames.html" in the same directory as a script with his code, then run the script and feed "surnames.html" to it through STDIN (like perl script.pl < surnames.html).

Also note that the bold tags in the script are lower-case, so you probably want to make your match case-insensitive by adding the /i modifier to your regex. Finally, if all you're interested in is the "Mac"s (as your original code fragment suggests), you might want to update the regular expression a little:
open ($handle, 'surnames.html'); while (<$handle>) { $hash{$1} = () if m/^<\w+?><B>(Mac\w+?)<\/B>/i; } close ($handle); print '"' . join ("\",\n\"", sort keys %hash) . '"';

[id://TedPride]'s code has taken advantage of the fact that all the surnames are between <b> tags. The regular expression matches any text that starts with a general tag (<\w+?>, like "<br>"), is then followed by a bold tag (<B>) and a name starting with "Mac" and one or more following characters ((Mac\w+?)). The parentheses "capture" the matched text to the special variable $1.

Updated after [id://Anonymous Monk]'s post below.

Replies are listed 'Best First'.
Re^3: general regex question
by Anonymous Monk on May 15, 2005 at 04:25 UTC
    then run the script and feed "surnames.html" to it through STDIN (like perl script.pl < surnames.html)
    That's not what he's suggesting at all. He's using open to open surnames.html.