Re^2: general regex question

To expand a little on TedPride's terse code segment, he is suggesting you open the page in your browser, save it as "surnames.html" in the same directory as a script with his code, then run the script ~~and feed "surnames.html" to it through STDIN (like perl script.pl < surnames.html)~~.

Also note that the bold tags in the script are lower-case, so you probably want to make your match case-insensitive by adding the /i modifier to your regex. Finally, if all you're interested in is the "Mac"s (as your original code fragment suggests), you might want to update the regular expression a little:

open ($handle, 'surnames.html');
while (<$handle>) {
    $hash{$1} = () if m/^<\w+?><B>(Mac\w+?)<\/B>/i;
}
close ($handle);
print '"' . join ("\",\n\"", sort keys %hash) . '"';
[download]

[id://TedPride]'s code has taken advantage of the fact that all the surnames are between <b> tags. The regular expression matches any text that starts with a general tag (<\w+?>, like "<br>"), is then followed by a bold tag (<B>) and a name starting with "Mac" and one or more following characters ((Mac\w+?)). The parentheses "capture" the matched text to the special variable $1.

Updated after [id://Anonymous Monk]'s post below.

Comment on Re^2: general regex question Select or Download Code

Replies are listed 'Best First'.
Re^3: general regex question by Anonymous Monk on May 15, 2005 at 04:25 UTC
then run the script and feed "surnames.html" to it through STDIN (like `perl script.pl < surnames.html`) That's not what he's suggesting at all. He's using `open` to open surnames.html.	[reply]