RE: Getting an IP out of a string

Don't try to search for the form of the IP address in the string, use the "http:" and URL delimiters in the regex:

$host = m#http://([^/]+)*#;
[download]

That will also select a hostname if the URL is something like "http://foo.bar.com/foo".

HTML::Parser is a better solution, but the above would still work if you're forced into parsing non-conformant HTML.

Comment on RE: Getting an IP out of a string Download Code

Replies are listed 'Best First'.
RE: RE: Getting an IP out of a string by the_slycer (Chaplain) on Aug 16, 2000 at 21:34 UTC
Problem is it's not always going to be http://ipaddress in front - IE it's not always a link :-) Sometimes it will be surrounded by formatting tags, other times by link tags.. basically looking through an HTML file for any IP address. Can do that no problem using hints from node mentioned above, but that returns the whole line - now need to strip anything but the IP :-) Sorry if I wasn't clearer to start out with, starting to play with HTML::Parser now..	[reply]
RE: RE: RE: Getting an IP out of a string by knight (Friar) on Aug 17, 2000 at 00:15 UTC
The problem is that what is or is not an IP address really depends on the file semantics, not just on what an IP address "looks like." A simple regex match for dotted-quad "IP addresses" in arbitrary text will give you false positives. In the HTML source for one site I maintain, for example, such a regex would match the following: `1.41.1.1` [download] Is it an IP address? Nope, it's an RCS version number, and blindly assuming it's an IP address would be wrong at best and dangerous at worst. That said, `@ip_addrs = m/{regex to match IP}/g;` [download] will select only the "IP address" texts from $_ for whatever regex you choose, without the danger of relying on $1.	[reply] [d/l] [select]
RE: RE: RE: RE: Getting an IP out of a string by the_slycer (Chaplain) on Aug 17, 2000 at 19:33 UTC
Thank you, that is exactly what I needed. And, yes, I prompt the user of the script to ensure the strings it has chosen from an html file are indeed ones that currently contain IP addresses, and are ones they would like updated in the future :-) ie: I'm not relying on my shoddy coding alone.	[reply]