You don't. You use HTML::TreeBuilder or some such similar module. Life is too short to bother reinventing that particular wheel. Markup is hard to write regexen to parse because there are many special cases for handling things like white space. Try something like:
use strict; use warnings; use HTML::TreeBuilder; my $str = <<'STR'; <html><head><title>my page></title></head> <body> <table><tr><td> <a href="http://mysite/bbsui.jsp?id=dxpwd">dxpwd</a> </td><td> <a href="http://mysite/bbsui.jsp?id=jimeth">jimeth</a> </td><td> <a href="http://mysite/bbsui.jsp?id=jone28">jone28</a> </td><td> <a href="http://mysite/bbsui.jsp?id=25528">25528</a> </td></tr> </body></html> STR my $tree = HTML::TreeBuilder->new; $tree->parse ($str); print $_->attr ('href') . "\n" for $tree->find ('a');
Prints:
http://mysite/bbsui.jsp?id=dxpwd http://mysite/bbsui.jsp?id=jimeth http://mysite/bbsui.jsp?id=jone28 http://mysite/bbsui.jsp?id=25528
In reply to Re: how to use regular expressions read some string from a htm file
by GrandFather
in thread how to use regular expressions read some string from a htm file
by weihe
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |