Re: Filtering out IP addresses
by atcroft (Abbot) on Jul 22, 2014 at 22:14 UTC
|
This appears to do what you expect, although it replaces all octets but the last with '###' (rather than based on the number of digits in the octet):
s/(([0-2]?\d{1,2}\.){3})([0-2]?\d{1,2})/###.###.###.$3/g;
Code to test regex:
$ perl -le 'foreach my $c ( q{123.45.67.89 123.45.67.98}, qw/ 0.0.0.0
+10.1.2.3 172.001.002.003 192.168.128.1 255.255.255.255 / ) { my $d =
+$c; $d =~ s/(([0-2]?\d{1,2}\.){3})([0-2]?\d{1,2})/###.###.###.$3/g; p
+rint $c, q{ -> }, $d; }'
Test output:
123.45.67.89 123.45.67.98 -> ###.###.###.89 ###.###.###.98
0.0.0.0 -> ###.###.###.0
10.1.2.3 -> ###.###.###.3
172.001.002.003 -> ###.###.###.003
192.168.128.1 -> ###.###.###.1
255.255.255.255 -> ###.###.###.255
Hope that helps. | [reply] [d/l] [select] |
Re: Filtering out IP addresses
by 1s44c (Scribe) on Jul 22, 2014 at 23:27 UTC
|
It might be fun to write regular expressions but it's also reinventing the wheel. Regular expressions for IPs can be found in Regexp::Common.
perl -MRegexp::Common='net' -n -e 'm/$RE{net}{IPv4}{-keep}/ && print "xxx.xx.xx.$5\n"'
| [reply] [d/l] |
|
|
Great idea. Let's generalize to respect the length of each field and make the substitutions in the original text.
use strict;
use warnings;
use Regexp::Common qw /net/;
my $string ='194.66.82.11';
$string =~
s/$RE{net}{IPv4}{-keep}
/'x'x(length $2)
. '.'
. 'x'x(length $3)
. '.'
. 'x'x(length $4)
. '.'
. $5
/xge;
print $string;
| [reply] [d/l] |
|
|
| [reply] |
Re: Filtering out IP addresses
by AppleFritter (Vicar) on Jul 22, 2014 at 21:33 UTC
|
$ perl -nle 's/([0-9]?\d\d?|2[0-4]\d|25[0-5])\.([0-9]?\d\d?|2[0-4]\d|2
+5[0-5])\.([0-9]?\d\d?|2[0-4]\d|25[0-5])\.([0-9]?\d\d?|2[0-4]\d|25[0-5
+])/xxx.xxx.xxx.$4/; print' filename
Depending on what your file contains, this can be simplified a lot, though. If it's e.g. one IP address per line, the following will also be fine:
$ perl -nle 's/\d+\./xxx./g; print' filename
But in the presence of other data, it may or may not work as intended, and also note it neither cares about IP addresses having four fields, nor about each field ranging from 0 to 255, or in fact consisting of no more than three decimal digits.
When in doubt, I'd suggest using the first one, even if it's a bit unwieldy.
| [reply] [d/l] [select] |
|
|
perl -nle 's/\d+\./xxx./g; print' filename
As long we're writing one-liners, that should probably be
perl -ple 's/\d+\./xxx./g' filename
See -p in perlrun.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
|
|
Oh, nice, I'd not been aware of -p yet. Thanks for the pointer!
| [reply] |
|
|
thank you, that was a really helpful answer. We need People like you who take time of their day to help a stranger. Perl monk is the best!
| [reply] |
|
|
Awww shucks, sugarcube. *tips hat* You're very welcome!
| [reply] |
Re: Filtering out IP addresses
by AnomalousMonk (Archbishop) on Jul 22, 2014 at 23:26 UTC
|
Another approach. Identifies 'real' decimal octet IP addresses. Matches 'x' replacements to actual octet digits if that's really what you want to do.
c:\@Work\Perl\monks>perl -wMstrict -le
"use Regexp::Common qw(net);
;;
for my $s (
'1.2.3.4', '1.22.111.222',
'a1.2.3.4a b1.22.111.221b',
'999.9.9.9', '9.9.9.999',
) {
printf qq{'$s' -> };
(my $t = $s) =~
s{ (?<! \d) ($RE{net}{IPv4}) (?! \d) }
{ (my $ip = $1) =~ s{ \d (?= \d* [.]) }{x}xmsg; $ip; }xmsge;
print qq{'$t'};
}
"
'1.2.3.4' -> 'x.x.x.4'
'1.22.111.222' -> 'x.xx.xxx.222'
'a1.2.3.4a b1.22.111.221b' -> 'ax.x.x.4a bx.xx.xxx.221b'
'999.9.9.9' -> '999.9.9.9'
'9.9.9.999' -> '9.9.9.999'
Note that if you're using Perl 5.14+, the /r substitution modifier makes the expression a bit simpler.
c:\@Work\Perl\monks>perl -wMstrict -le
"use 5.014;
;;
use Regexp::Common qw(net);
;;
for my $s (
'1.2.3.4', '1.22.111.222',
'a111.22.3.4a b1.22.111.221b',
'999.9.9.9', '9.9.9.999',
) {
printf qq{'$s' -> };
my $t = $s =~ s{ (?<! \d) ($RE{net}{IPv4}) (?! \d) }
{ $1 =~ s{ \d (?= \d* [.]) }{x}xmsgr }xmsger;
print qq{'$t'};
}
"
'1.2.3.4' -> 'x.x.x.4'
'1.22.111.222' -> 'x.xx.xxx.222'
'a111.22.3.4a b1.22.111.221b' -> 'axxx.xx.x.4a bx.xx.xxx.221b'
'999.9.9.9' -> '999.9.9.9'
'9.9.9.999' -> '9.9.9.999'
| [reply] [d/l] [select] |
Re: Filtering out IP addresses
by kennethk (Abbot) on Jul 22, 2014 at 22:15 UTC
|
In general, I agree with AppleFritter's comment; a regular expression is developed according to the environment in which it must operate. With regard to the code you've posted, I wouldn't think
([0-9]?\d\d?|2[0-4]\d|25[0-5])
would be a good way to describe an IP address because [0-9] is (barring Unicode) equivalent to \d. Therefore, your first term swallows your following two. You probably want something more like (1?\d\d?|2[0-4]\d|25[0-5])
but this still allows 00, which you may or may not care about.
So how about
perl -nle 'print if s/((1?\d\d?|2[0-4]\d|25[0-5])\.){3}(?=\d+)/xxx.xxx
+.xxx./g'
It misses spec on match x-counts, but you should probably be scrubbing that too, if you are scrubbing.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
Re: Filtering out IP addresses
by wjw (Priest) on Jul 23, 2014 at 08:34 UTC
|
This seems to work as well, though obviously not a 1 liner..
#!/usr/bin/perl
use Modern::Perl qw(2014);
while (<DATA>) {
my @octets = split('\.',$_);
chomp @octets;
for (0..2) {
$octets[$_] =~ s/\d/x/g;
}
say join(".", @octets);
}
__DATA__
1.2.3.4
192.168.0.1
255.255.255.128
23.65.98.101
Outputs:
x.x.x.4
xxx.xxx.x.1
xxx.xxx.xxx.128
xx.xx.xx.101
I suppose for further obfuscation one could just stick 3 x's in each of the first three octets and append the last octet of the IP to that string, thus hiding the format...
...just a thought...
...the majority is always wrong, and always the last to know about it...
Insanity: Doing the same thing over and over again and expecting different results...
A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct
| [reply] [d/l] [select] |