Re: general regex question

Replies are listed 'Best First'.
Re^2: general regex question by crashtest (Curate) on May 14, 2005 at 16:58 UTC
To expand a little on TedPride's terse code segment, he is suggesting you open the page in your browser, save it as "surnames.html" in the same directory as a script with his code, then run the script ~~and feed "surnames.html" to it through STDIN (like `perl script.pl < surnames.html`)~~. Also note that the bold tags in the script are lower-case, so you probably want to make your match case-insensitive by adding the `/i` modifier to your regex. Finally, if all you're interested in is the "Mac"s (as your original code fragment suggests), you might want to update the regular expression a little: `open ($handle, 'surnames.html'); while (<$handle>) { $hash{$1} = () if m/^<\w+?><B>(Mac\w+?)<\/B>/i; } close ($handle); print '"' . join ("\",\n\"", sort keys %hash) . '"';` [download] [id://TedPride]'s code has taken advantage of the fact that all the surnames are between `<b>` tags. The regular expression matches any text that starts with a general tag (`<\w+?>`, like "<br>"), is then followed by a bold tag (`<B>`) and a name starting with "Mac" and one or more following characters (`(Mac\w+?)`). The parentheses "capture" the matched text to the special variable `$1`. Updated after [id://Anonymous Monk]'s post below.	[reply] [d/l] [select]
Re^3: general regex question by Anonymous Monk on May 15, 2005 at 04:25 UTC
then run the script and feed "surnames.html" to it through STDIN (like `perl script.pl < surnames.html`) That's not what he's suggesting at all. He's using `open` to open surnames.html.	[reply]