Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: divide one file into multiple arrays

by ramlight (Friar)
on Jun 25, 2013 at 19:57 UTC ( #1040653=note: print w/replies, xml ) Need Help??


in reply to divide one file into multiple arrays

If you have a large number of sites on your bad list (or even lots of sites in your log), you would be better served with a hash. You could populate the hash with the contents of the original array and then use 'exists' for your comparison. So the above code could be written as:

use strict; use warnings; my @sites=qw/ www.yahoo.com www.google.com www.comcast.com /; my @bad_log; my @good_log; my %bad_hash = (); foreach my $bad_site (@sites) { $bad_hash{$bad_site} = 1; } while (my $line = <DATA>) { my $www = (split / /,$line)[5]; $www =~ s/---*//; if (exists $bad_hash{$www}) { push(@bad_log, $line); } else { push(@good_log, $line); } } print "\nBad lines are:\n"; foreach (@bad_log) { print; } print "\nGood lines are:\n"; foreach (@good_log) { print; } __DATA__ X456 TV-yes DB-no 123.12.23.45 dealio3 www.google.com-------- FX-yes d +53 Y-03 X123 TV-yes DB-yes 34.154.43.21 dealio1 www.ask.com-------- FX-no d01 +Y-03 X412 TV-no DB-no 192.365.25.23 rayovac2 www.microsoft.com--- FX-yes d1 +3 Y-07
which returns

Bad lines are: X456 TV-yes DB-no 123.12.23.45 dealio3 www.google.com-------- FX-yes d +53 Y-03 Good lines are: X123 TV-yes DB-yes 34.154.43.21 dealio1 www.ask.com-------- FX-no d01 +Y-03 X412 TV-no DB-no 192.365.25.23 rayovac2 www.microsoft.com--- FX-yes d1 +3 Y-07

Replies are listed 'Best First'.
Re^2: divide one file into multiple arrays
by Eily (Monsignor) on Jun 25, 2013 at 20:52 UTC

    I'll have to remember to upvote that one when I get a new load of votes :).

    You can make the bad_hash straightaway with map :

    my %isBad= map { $_ => 1 } qw/ www.yahoo.com www.google.com www.comcast.com /;
    Which would be   my %isBad = map { $_ => 1 } @sites; for tevus_oriley. And with the hash values being 1, you can just write if ( $isBad{$www} ) instead of if (exists $isBad{$www})

    Edit : Whoops, posted too fast, chomp returns the number of chomped elements, not the chomped list.

Re^2: divide one file into multiple arrays
by Athanasius (Archbishop) on Jun 26, 2013 at 09:04 UTC

    Golf, anyone?

    If you’re OK reading the whole input file into memory, the part function from List::MoreUtils can be used to populate both arrays at once. (This also incorportes Eily’s use of map to populate %bad_hash.)

    #! perl use strict; use warnings; use List::MoreUtils qw( part ); my %bad_hash = map { $_ => 1 } qw( www.yahoo.com www.google.com www.comcast.com ); my ($good_log, $bad_log) = part { exists $bad_hash{ (split)[5] =~ s{-- +-*}{}r } } <DATA>; print "\nBad lines are:\n"; print for @$bad_log; print "\nGood lines are:\n"; print for @$good_log; __DATA__ X456 TV-yes DB-no 123.12.23.45 dealio3 www.google.com-------- FX-yes d +53 Y-03 X123 TV-yes DB-yes 34.154.43.21 dealio1 www.ask.com-------- FX-no d01 +Y-03 X412 TV-no DB-no 192.365.25.23 rayovac2 www.microsoft.com--- FX-yes d1 +3 Y-07

    Hope that’s useful,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re^2: divide one file into multiple arrays
by Laurent_R (Canon) on Jun 25, 2013 at 21:25 UTC

    I fully agree a hash would be far better, even with a relatively small number of sites. It is not only faster, but it is also easier to code. A simple search in a hash is more straight forward than a grep (or whatever other implementation) in an array

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1040653]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2022-08-17 00:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?