learningperl01 has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone, Hoping someone could shed some light on the following problem I am having. I have the script below which I would like to only display unique results when a regex is performed. I am not sure what I am doing wrong, I am pretty new to the unique command and tried several ways of doing it all with no results. Thanks for the help in advance.
use File::Find; $DIRECTORY = "/Users/data"; find(\&edits, $DIRECTORY); sub edits() { if ( -f and /.txt$/ ) { $TEXT_FILE = $_; open MATCHING_FILE, $TEXT_FILE; @all_lines = <MATCHING_FILE>; close MATCHING_FILE; for $each_line ( @all_lines ) { if ( $each_line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}|pa +ssword|(ssn=)/i ) { @results = $each_line; %hashTemp = map { @results => 1 } @results; %array_out = sort keys %hashTemp; print @array_out; } } } }

Replies are listed 'Best First'.
Re: Sort/Uniq Help
by moritz (Cardinal) on Mar 17, 2008 at 16:59 UTC
    use strict; use warnings; ... open MATCHING_FILE, $TEXT_FILE; my %seen; while (my $file = <MATCHING_FILE>){ chomp $file; if ( $file =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}|password +|(ssn=)/i){ $seen{$file}++; } } print "$_\n" for keys %seen;
    Your code won't work because you'r asigning to %hashTemp for each iteration, thus deleting all previous entries.

    Update: fixed copy&paste error, Roy Johnson++

Re: Sort/Uniq Help
by kyle (Abbot) on Mar 17, 2008 at 17:04 UTC

    I can't tell what you're trying to do here, so it's hard to say what to suggest. However, a couple of things stand out.

    @results = $each_line; %hashTemp = map { @results => 1 } @results;

    The first line there is trying to put a scalar into an array. You'll have an array with one element, which may not be what you want. If you want to add to the array, look at push and unshift.

    The second line seems to be trying to get the unique elements from @results, but the map block is wrong for that.

    This may be what you're shooting for:

    push @results, $each_line; %hashTemp = map { $_ => 1 } @results;

    However, I'm guessing that everything after the push should be outside the for loop, and maybe outside sub edits. That depends on what you're ultimately trying to accomplish.

    Also, I think sub edits() could be sub edits. The former creates a sub with a prototype, and you probably don't want that.

    Finally, I think it would be a really good idea to use strict and warnings.

Re: Sort/Uniq Help
by Roy Johnson (Monsignor) on Mar 17, 2008 at 17:08 UTC
    There are a lot of weird things in that code, which makes it hard to figure out how to fix it. If you'd put comments in to indicate what you expect each line or tightly-related group of lines to do, it would be easier. It might also cause you to see some things that don't make sense.

    I think, at least, that you want @results = $each_line to be push @results, $each_line. And %array_out should be @array_out. All the hash processing most likely goes after the for loop.

    use strict; use warnings;
    would be your friends here, as well.

    Caution: Contents may have been coded under pressure.
      Thanks for the quick replies. Here is the code that I have at the moment. This is currently working and I get the desired results (except for the fact that I get duplicates). I have also updated the code with some of your recommendations.
      #!/usr/bin/perl use strict; use warnings; use File::Find; my $DIRECTORY = "/Users/data/"; find(\&edits, $DIRECTORY); sub edits() { if ( -f and /.txt$/ ) { #Find files ending in .txt and drill down al +l sub dirs my $TEXT_FILE = $_; #save the results to $_; open MATCHING_FILE, $TEXT_FILE; my @all_lines = <MATCHING_FILE>; #Place everything into an array ca +ll all_lines close MATCHING_FILE; for my $each_line ( @all_lines ) { if ( $each_line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}|pa +ssword|(ssn=)/i ) { #Search for IP or password or ssn= #print $each_line, "Found in $File::Find::name\n"; print $each_line; # Print each line that is found } } } }
      Results from all files in a directory 192.168.1.1 192.168.1.1 192.168.1.1 64.22.34.66 221.245.23.44 PASSWORD=FpnmRjE
      what I want to do is remove the 192.168.1.1 dups from being printed along with all the other dups that show up.
        one quick note glancing at this:

      • use /\.txt$/ (not /.txt$/) to be exact... altho the latter will probably work in this context, consider the difference between \. and .

        You can simplify your long regexp with :

        use Regexp::Common qw /net/; /\A $RE{net}{IPv4} | password | (ssn=) \z/xmi;

        hth,

        PooLpi

        'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

        Update: Oops,if it's a succession of alternatives : moritz++ ;)
        I also forgot the /m