in reply to Re: Sort/Uniq Help
in thread Sort/Uniq Help

Thanks for the quick replies. Here is the code that I have at the moment. This is currently working and I get the desired results (except for the fact that I get duplicates). I have also updated the code with some of your recommendations.
#!/usr/bin/perl use strict; use warnings; use File::Find; my $DIRECTORY = "/Users/data/"; find(\&edits, $DIRECTORY); sub edits() { if ( -f and /.txt$/ ) { #Find files ending in .txt and drill down al +l sub dirs my $TEXT_FILE = $_; #save the results to $_; open MATCHING_FILE, $TEXT_FILE; my @all_lines = <MATCHING_FILE>; #Place everything into an array ca +ll all_lines close MATCHING_FILE; for my $each_line ( @all_lines ) { if ( $each_line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}|pa +ssword|(ssn=)/i ) { #Search for IP or password or ssn= #print $each_line, "Found in $File::Find::name\n"; print $each_line; # Print each line that is found } } } }
Results from all files in a directory 192.168.1.1 192.168.1.1 192.168.1.1 64.22.34.66 221.245.23.44 PASSWORD=FpnmRjE
what I want to do is remove the 192.168.1.1 dups from being printed along with all the other dups that show up.

Replies are listed 'Best First'.
Re^3: Sort/Uniq Help
by Corion (Patriarch) on Mar 17, 2008 at 18:13 UTC
Re^3: Sort/Uniq Help
by halfcountplus (Hermit) on Mar 17, 2008 at 23:53 UTC
    one quick note glancing at this:

  • use /\.txt$/ (not /.txt$/) to be exact... altho the latter will probably work in this context, consider the difference between \. and .
Re^3: Sort/Uniq Help
by poolpi (Hermit) on Mar 18, 2008 at 09:48 UTC

    You can simplify your long regexp with :

    use Regexp::Common qw /net/; /\A $RE{net}{IPv4} | password | (ssn=) \z/xmi;

    hth,

    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

    Update: Oops,if it's a succession of alternatives : moritz++ ;)
    I also forgot the /m

      I know that TheDamian recommends character classes to escape chars in regexes (in PBP), but it's generally a bad idea because it will disable some optimizations (at least in older versions of perl, don't know about current ones).

      Also \| is shorten than [|], and thus less noise that your brain has to parse.

      But in the original post the | isn't escaped at all, so you're actually modifiying the behaviour of the regex.

        By curiosity :

        This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi

        #!/usr/bin/perl use strict; use warnings; use Regexp::Common qw /net/; use Benchmark qw( cmpthese ); my $line = q{127.0.0.1}; cmpthese -10, { RE => '$line =~ /\A $RE{net}{IPv4} [|] password [|] (ssn=) \z/xmi' +, RE_O => '$line =~ /\A $RE{net}{IPv4} [|] password [|] (ssn=) \z/xm +io', ORIG => '$line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\|pas +sword\|(ssn=)/i', RE_CHAR => 'use charnames qw( :full); $line =~ /\A $RE{net}{IPv4} \N{LINE TABULATION} password \N{LINE TABULATION} (ssn=) \z/xmi' };
        Rate RE_CHAR RE RE_O ORIG RE_CHAR 17366/s -- -2% -2% -100% RE 17704/s 2% -- -0% -100% RE_O 17747/s 2% 0% -- -100% ORIG 12717477/s 73132% 71732% 71561% --

        PooLpi

        'Ebry haffa hoe hab im tik a bush'. Jamaican proverb