Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:
I've got a problematic regex that I hope is something simple that I'm just overlooking.
Given a line of text (and yes that's one contiguous line):
a href="page.cfm?objectid=11933900&method=full&siteid=50144" Costly false alarms /a a href="page.cfm?objectid=11933890&method=full&siteid=50144" Mindless yobs terrorise OAP's /a a href="page.cfm?objectid=11933879&method=full&siteid=50144" Road deaths /a a href="page.cfm?objectid=11933842&method=full&siteid=50144" Twisted porn pervert caged for life /a a href="page.cfm?objectid=11933800&method=full&siteid=50144" Greenbelt homes plan appeal thrown out /a a href="page.cfm?objectid=11933742&method=full&siteid=50144" Youngsters invited to get in the swim /a a href="page.cfm?objectid=11933698&method=full&siteid=50144" Phone mast fears grow /a a href="page.cfm?objectid=11933695&method=full&siteid=50144" Vandals strike at cemetery /a a href="page.cfm?objectid=11928420&method=full&siteid=50144" Windfarm fears are no flights of fancy /a a href="page.cfm?objectid=11928280&method=full&siteid=50144" Mindless vandals wreck headstones /a a href="page.cfm?objectid=11928277&method=full&siteid=50144" Old Firm united to support hospice /a a href="page.cfm?objectid=11928275&method=full&siteid=50144" Family at war over the loss of buses /a a href="page.cfm?objectid=11928273&method=full&siteid=50144" Joining with Jubilations /a a href="page.cfm?objectid=11927986&method=full&siteid=50144" Catch the football fever at coaching days /a a href="page.cfm?objectid=11927859&method=full&siteid=50144" Radio Law ready to hit the airwaves /a
<'s and >'s have been removed to show the text non-html
and a regex of:
m!(<a[^>]*>)(.+</a[^>]*>)!ig
Shouldn't I get $1, $2, $3, etc... until I stop getting matches? All I'm getting at this point is $1 and $2, with $1 having the correct value and $2 having the rest of the string.
Shouldn't the </a[^>]*> match the first occurence of or am I misunderstanding something in my regex?
And yes I *know* I could be using a module to parse this, but this is the exact reason I'm doing it this way -- to better my understanding of regexes.
It also occured to me to split this in to an array and match it that way, but I was unsure on what I could split that on given the number of spaces between tags. Any advice on that would be great too!
Thanks in advance!
Some people fall from grace. I prefer a running start...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex isn't performing like I think it should
by chromatic (Archbishop) on Jun 12, 2002 at 04:13 UTC | |
|
Re: Regex isn't performing like I think it should
by jmarshall99 (Acolyte) on Jun 12, 2002 at 04:35 UTC | |
by Popcorn Dave (Abbot) on Jun 12, 2002 at 04:44 UTC | |
by jmarshall99 (Acolyte) on Jun 12, 2002 at 04:57 UTC | |
by Bird (Pilgrim) on Jun 12, 2002 at 22:13 UTC | |
by Popcorn Dave (Abbot) on Jun 12, 2002 at 22:21 UTC | |
by dsheroh (Monsignor) on Jun 12, 2002 at 16:46 UTC | |
|
Re: Regex isn't performing like I think it should
by u914 (Pilgrim) on Jun 12, 2002 at 05:40 UTC |