There are multiple was to check an IP address and Regexp::Common:net would be one of the least efficient. It's rather missing to point of the task at hand.
A far better, far faster, far lighter option would be Data::Validate::IP but, like I said, this is not a request for how to validate an IP. It's a request about hacking crytpo.
The null return value (that goes into the DB is 100% about security and 0% about validity).
The presented stings come from NGINX so don't really need any validation. They come from the socket directing the traffic...
| [reply] |
"The presented stings come from NGINX so don't really need any validation."
And yet, you have code that identifies the IP type by a regexp, and throws if it is not "valid."
What do you think is "far better, far faster, far lighter" about
sub _slow_is_ipv4 {
shift if ref $_[0];
my $value = shift;
return undef unless defined($value);
my (@octets) = $value =~ /^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,
+3})$/;
return undef unless (@octets == 4);
foreach (@octets) {
return undef if $_ < 0 || $_ > 255;
return undef if $_ =~ /^0\d{1,2}$/;
}
return join('.', @octets);
}
in Data::Validate::IP (which has a dependency on NetAddr::IP)
as compared to the code in Regexp::Common::net?
Snarky reply to AnomalousMonk's attempt to offer assistance. Downvoted. (And the reply to bliako.)
The way forward always starts with a minimal test.
| [reply] [d/l] |
I'll preface with: I'm not a professional cryptographer. The closest I've gotten is understanding and implementing parts of a blockchain prototype.
So, I'll start with the basics. As has been pointed out, your IP validation is a bit sloppy. However, I'm going to take the opposite view to answers previous to this: you don't really care. If you're getting this from your web server, it's 100% clean. The only possible use for validation is to tell you that your code is bad. Since we already know that IPv4 addresses have dots, whereas IPv6 addresses have colons, we can do this even simpler:
my $af = $ip =~ /\./ ?
AF_INET : AF_INET6;
my $bytes = pack("H32 a* H32", $packing[0], inet_pton( $af, $ip ), $pa
+cking[7]);
return sha3_256_hex($bytes);
If it's invalid, inet_pton will likely barf, but that's fine, it's never going to be invalid because it's not coming from a user (unlike, say, the browser string).
Now on to your hashing choices. First off, your salts. You are using two salts - $packing[0] and $packing[7]. I'm not sure why. I've never seen anyone do that before. A single salt suffices. For your purposes, if I'm understanding this, you need to use the same salt for everything, which is unfortunate in that a rainbow table can be trivially produced if the salt is known and the algorithm is known. Your database must not, therefore, have the salt in it. That salt is the equivalent of a secret key, and must be guarded as such. (Thus, don't use the ones you just published here.) It does not go into a public post, a public git repository, or anything of the sort. It also does not get shared among the developers even in a private git repository. It gets stored separately, period.
If you could use a different salt for each key, that would change things, but then you could no longer correlate things based on browser/IP address since they'd result in different hashes.
Next, let's look at SHA-3. Reading on wikipedia indicates that typical x64 hardware that most of us are probably running, you're looking at about 12.6 cycles per byte for encryption. Your IP string is 48 bytes, so that's about 600 cycles to encrypt it. My system has 3.7GHz, so that's over 6 million hashes per second. Per CPU (I have six, but an attacker would have many more). But IPv6 is a very large space, combined with your browser string, this sounds like a lot of brute force required. Would a hacker be able to reduce this space? Yes, a lot.
First off, if I were to be looking for a specific person, there's a good chance I can figure out their IP address. And probably also their browser. So, if I get your secret key, I have a single hash to produce to find the key. However, if I don't have the key, I have to try all possible keys to look them up - though that only takes me about 3 minutes, if my math is working properly (I have 6 CPUs, 12 if you count hyperthreading). This is probably not a huge barrier. This is also why bitcoin rigs can cycle through so many hashes per second in an attempt to find the next block's solution.
Now, if I didn't know the browser string, I could probably guess it. There's not a lot of entropy here - "Mozilla", "Win32", ... there are only so many of these. Yes, some people can customise their browser strings, but almost no one (statistically speaking) does. So, knowing the IP address but not the secret key or the browser string, I will have to try all the possible secret keys times the number of browser strings I'm likely to encounter. If I really care, I just throw one machine at each browser string, and we're still talking minutes.
If I don't have the IP address either, now we're just brute-forcing the entire space (though we can still probably restrict the browser string to the likely culprits - we'll not decrypt everything, but we'll be able to use those to narrow everything down). But we still only have to do this once - once we find the secret keys on one hash, we have the secret keys on all hashes, and we can brute force everything else much faster.
What you probably want, then, is something that makes it cost-prohibitive to find the secret key in the first place. Instead of a handful of minutes, you need something where this takes days or weeks or months. When I worked on that blockchain, the solution to this was to switch the hashing to Argon 2, and my work PC could only manage about 4 hashings per second (and that used all 4 CPUs). Now going through all possibilities of secret keys will take ~1.5 million times longer. It'll take your server a bit more time to generate the hash as well, but you only have to create it once per entry, not billions of times. (The hashing difficulty for the argon-based blockchain was very very low compared to, say, bitcoin's.)
Other than that, I'm not seeing a lot of problem here. But, like I said, I'm not a professional cryptographer, so there may be something I'm missing as well. | [reply] [d/l] [select] |
It's been a while since I wrote any serious perl, but I have some spreadsheets on the COVID-19 statistics which I update several times/day. And would rather automate that. So am looking at Net::Google::Spreadsheets::V4 in CPAN. Here is the link to the shared google sheets where I show these sheets:
Shared (RO) folder on Google Drive
So I will report back with questions as I attempt this. I am 79 1/2, and maybe should have learned Python 10 years ago - but perl is so much more comfortable. I was so pleased to find Perlmonks alive and well! Keep it up, friends.
Boyd
| [reply] |
Hello bliako,
Yes I'm aware of how to fake a browser string. Been doing it for years. If you check your server logs you may well see some amusing messages I left in the string...
The server itself runs a let's encrypt certificate but out in front of it is some Cloudflare proxy infrastructure. Last time I looked they were not banana republic operators given they proxy for 11.6% of the top 10 million websites on the Internet.
They are the man in the middle. For me, they issue a perfectly valid certificate. What country are you in? I'll VPN in and see if I can reproduce the issue.
Given who is doing the proxying it's possible the proxy issue MITM lies with you, not us. Just a thought...
| [reply] |
i did not say the problem is with you. The problem is with my provider and I found it really weird that they presented me with a certificated issued to them. (perhaps that's how it works!)
Sure you are aware that a browser string can be faked/changed. But how are you going to sanitise it so that you use it for checking uniqueness together with the IP, which in itself is not unique, i.e. a given hospital may have the same IP for all personnel trying to report something to you.
| [reply] |
So the task to hand is to convert an IP and Browser String into a cryptographically secure hash that can not be reversed or revealed with a rainbow table.
The problem with IPv4 address space is that it's too damn small. I don't know about SHA-3, but there have been examples of people just going through all 232 possible addresses, concatenating them with site-specific secret and computing SHA-1 hashes of them, effectively reversing the hashing process. It wouldn't have been much slower if each IP address was salted with its own nonce, either.
Use of much more computationally complex password hashing functions, such as bcrypt, PBKDF2, scrypt, Argon2 would slow down such attacks tremendously. It might also be simpler to just strip the least significant byte from the IP address and sidestep the whole hashing problem.
These 2 identifiers will assist researchers in assessing if the crowdsourced data we are gathering is "gamed" or "believable".
Another problem with such datasets is that outsiders may have trouble believing the dataset even if it has a plausible distribution of IP addresses and User-Agent strings (which they wouldn't be sure of because all you would be able to offer them would be opaque hashes). What's to stop the site admins themselves (hypothetically, of course) from faking the data while retaining the IP addresses and the User-Agents, for example?
| [reply] |