I think merlyn is right, trying to scan HTML is difficult. On the other hand, for something as simple as what you are attempting, XML::LibXML may be overkill. In this, assuming that the page doesn't change formatting frequently you are really looking for a pattern like:
/(?<=>)([\w ]+?) PRIMARY SCHOOL/This will match the non-greedily any amount of words and space following the last ">" of a tag that is followed by the words " PRIMARY SCHOOL". This will include " PRIMARY SCHOOL" in the match too. This will fail if the line is broken in the middle--but you can get around that by using "\s" instead of spaces between words and such.
In reply to Re: Help with regular expression - real file
by hanenkamp
in thread Help with regular expression - real file
by kiat
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |