in reply to Re: Re: Re: Untaint IP address/hostname question
in thread Untaint IP address/hostname question

It is a historical standard because it was implemented in the BSD inet_ntoa and copied into other implementation. It may even be standardized in POSIX.

No RFC describes the long form IP address. The RFCs I know that describe grammars for IPv4 addresses only support dotted quad form. This includes URLs.

You can see a few places where differences between expectations create problems. For example, most web browsers parse out the host portion of the http URL and pass it to inet_aton. So they accept "long form" address even when the RFCs say they shouldn't. This is seen with scammers writing URLs like: http://www.example.com@0x7F000001/. They use the username and unexpected IP address syntax to hide the destination.

Including the long form IP addresses in a regular expression makes them much more complicated. The regex has to match one to three components that could be decimal, hex, or octal numbers. Just to accept a format that is only used by a few people.

  • Comment on Re: Re: Re: Re: Untaint IP address/hostname question

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Untaint IP address/hostname question
by Juerd (Abbot) on Mar 09, 2004 at 00:00 UTC

    It is a historical standard because it was implemented in the BSD inet_ntoa and copied into other implementation.

    As was the long decimal format.

    No RFC describes the long form IP address.

    As none describes the dotted quad form IP address.

    The RFCs I know that describe grammars for IPv4 addresses only support dotted quad form. This includes URLs.

    They are all protocols. Protocols using IP don't define IP. Note by the way that the RFC for URLs (1630) defines host as digits . digits . digits . digits, thus allowing 999.123.0.12345. They require a quad dotted decimal address, but it doesn't say anywhere that that address is an IPv4 address. (Or perhaps I missed that specification)

    most web browsers parse out the host portion of the http URL and pass it to inet_aton.

    That is exactly what I suggest everyone should do. I've found it hard to find a tool on my Linux box that doesn't think 0x7F000001 is invalid. You talk about expectations. I think doing what other tools do lives up to people's expectations.

    Including the long form IP addresses in a regular expression makes them much more complicated.

    I'm suggesting that no regex be used.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }