Re^2: extracting link *and* tag content from "a href"

I downvoted this because it doesn't actually work on html. It's a good try, but there are several cases it just misses, for example:

<a href='this>breaks>"'>maybe</a>
<a href=#>test</a>
<a href="/path/to/don't/use/this">omg</a>
[download]

(The second two are credited two perlygatekeeper in #perl on Freenode)

Your code produces:

URL: this>breaks
Name: "'>maybe

URL: #>
Name:  

URL: "/
Name:
[download]

I'm sure you could manage to fix these specific cases, but I seriously doubt you'll ever actually get to the point where it parses every type of valid html. And even if you do, whats the point? You just wasted X hours to do something that existing modules already do extremely well. This makes a decent learning exercise but please to not suggest "home grown" regexen for such complicated tasks.

Comment on Re^2: extracting link and tag content from "a href" Select or Download Code

Replies are listed 'Best First'.
Re^3: extracting link and tag content from "a href" by bageler (Hermit) on Jul 19, 2004 at 20:33 UTC
well it worked on his examples :) What's the point? the point is to try and reinvent the wheel. Why would I want to reinvent the wheel? why not, if I'm getting paid :) then I learn things too, such as the mistakes you pointed out. Of course, I was working under the assumption that the links are valid html, of which none of the examples you nor the thread author provided are. Anything not matching `[a-zA-Z0-9]`, such as quotes, anglebrackets,etc. should be urlencoded if put in a url. in any case, you're right it's still broken for some cases. downvote away :)	[reply] [d/l]
Re^4: extracting link and tag content from "a href" by BUU (Prior) on Jul 19, 2004 at 20:41 UTC
That'll teach me to take the easy way out! Anyways, I'm glad we've agreed that it's broken =]. The second example is valid though, as far as I know. In the future if you'd just said "This is a learning exercise, please use one of the modules" I wouldn't have had any problems.	[reply]
Re^5: extracting link and tag content from "a href" by iburrell (Chaplain) on Jul 20, 2004 at 16:49 UTC
The second example is not valid HTML. The quotes are optional when the value contains only letters, numbers, period, and hyphens, </code>0-9A-Za-z.-</code> basically. Browsers and some parsers will work around broken markup, but many won't.	[reply]