Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Sort/Uniq Help

by Roy Johnson (Monsignor)
on Mar 17, 2008 at 17:08 UTC ( [id://674603]=note: print w/replies, xml ) Need Help??


in reply to Sort/Uniq Help

There are a lot of weird things in that code, which makes it hard to figure out how to fix it. If you'd put comments in to indicate what you expect each line or tightly-related group of lines to do, it would be easier. It might also cause you to see some things that don't make sense.

I think, at least, that you want @results = $each_line to be push @results, $each_line. And %array_out should be @array_out. All the hash processing most likely goes after the for loop.

use strict; use warnings;
would be your friends here, as well.

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^2: Sort/Uniq Help
by learningperl01 (Beadle) on Mar 17, 2008 at 18:09 UTC
    Thanks for the quick replies. Here is the code that I have at the moment. This is currently working and I get the desired results (except for the fact that I get duplicates). I have also updated the code with some of your recommendations.
    #!/usr/bin/perl use strict; use warnings; use File::Find; my $DIRECTORY = "/Users/data/"; find(\&edits, $DIRECTORY); sub edits() { if ( -f and /.txt$/ ) { #Find files ending in .txt and drill down al +l sub dirs my $TEXT_FILE = $_; #save the results to $_; open MATCHING_FILE, $TEXT_FILE; my @all_lines = <MATCHING_FILE>; #Place everything into an array ca +ll all_lines close MATCHING_FILE; for my $each_line ( @all_lines ) { if ( $each_line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}|pa +ssword|(ssn=)/i ) { #Search for IP or password or ssn= #print $each_line, "Found in $File::Find::name\n"; print $each_line; # Print each line that is found } } } }
    Results from all files in a directory 192.168.1.1 192.168.1.1 192.168.1.1 64.22.34.66 221.245.23.44 PASSWORD=FpnmRjE
    what I want to do is remove the 192.168.1.1 dups from being printed along with all the other dups that show up.
      one quick note glancing at this:

    • use /\.txt$/ (not /.txt$/) to be exact... altho the latter will probably work in this context, consider the difference between \. and .

      You can simplify your long regexp with :

      use Regexp::Common qw /net/; /\A $RE{net}{IPv4} | password | (ssn=) \z/xmi;

      hth,

      PooLpi

      'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

      Update: Oops,if it's a succession of alternatives : moritz++ ;)
      I also forgot the /m

        I know that TheDamian recommends character classes to escape chars in regexes (in PBP), but it's generally a bad idea because it will disable some optimizations (at least in older versions of perl, don't know about current ones).

        Also \| is shorten than [|], and thus less noise that your brain has to parse.

        But in the original post the | isn't escaped at all, so you're actually modifiying the behaviour of the regex.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://674603]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-20 05:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found