Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

newbie hasher

by prodevel (Scribe)
on Nov 20, 2003 at 05:07 UTC ( [id://308499]=perlquestion: print w/replies, xml ) Need Help??

prodevel has asked for the wisdom of the Perl Monks concerning the following question:

I've been hacking/man-ing/searching for a couple of hours now and I'm getting a bit tired. I was curious as to an elegant way to read a file into a hash.

Take a hosts file for example...

host1 1.1.1.1
host2 1.2.3.5

I just want to assign this to a hash for further processing, e.g.

while (($host,$ip) = each(%hosts))

I normally figure this stuff out in different ways, but I'd like to use a hash for this.

Thanks!

Replies are listed 'Best First'.
Re: newbie hasher
by Zaxo (Archbishop) on Nov 20, 2003 at 05:18 UTC

    Probably the easiest way is to split each line and explicitly add the pair to the hash,

    my %hosts; { local $_; open my $fh, '<', '/path/to/data.file' or die $!; while (<$fh>) { my @pair = split; $hosts{ $pair[0] } = $pair[1]; } close $fh or die $!; }

    After Compline,
    Zaxo

      Though I'm responding to Zaxo, my comment applies to most of the answers I see. With code doing a split, check that $pair[1] aka $ip is actually getting set to a defined value. Otherwise, your hash value may be undef on bad input but you won't get a warning about it until you actually try to use the hash value later on. Best to validate as you go:
      while (<$fh>) { chomp; my ($host, $ip) = split; warn("bad hosts line: $_"),next if !defined $ip; $hosts{$host} = $ip; }
      Note that if you check defined($ip) the chomp is necessary, since otherwise $ip set to the empty string (taken from the empty string following the newline character). If you use a regex to split up the line instead, make sure it actually requires both fields and check if the match succeeds (which gets a little funky if you are using map):
      %hosts = map { if (/^(\w+)\s(.+)/) { ($1 => $2) } else { (warn "bad hosts line: $_")[1..0]; } } <FILE>;
Re: newbie hasher (No loops!)
by BrowserUk (Patriarch) on Nov 20, 2003 at 05:49 UTC

    Look Ma. No loops :)

    #! perl -slw use strict; use Data::Dumper; my %h = split ' ', do{ local $/; <DATA> }; print Dumper \%h; __DATA__ host1 1.1.1.1 host2 2.2.2.2 host3 3.3.3.3 host4 4.4.4.4

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Re: newbie hasher
by Anonymous Monk on Nov 20, 2003 at 05:19 UTC
    If your lines are simply host and IP separated by a space, then it's pretty simple:
    open my $fh, "<input.dat" or die "Couldn't open file: $!"; my %hosts; while (<$fh>) { chomp; my ($host, $ip) = split; $hosts{$host} = $ip; } use Data::Dumper; print Dumper \%hosts;
Re: newbie hasher
by Roger (Parson) on Nov 20, 2003 at 05:32 UTC
    Or you could write a one-liner to transform the input file into a hash.

    Method 1 - (map with 'short-circuit', now handles empty lines)
    use strict; use Data::Dumper; # Uh, well spotted, typo fixed with the extra + # which would have no effect anyway. :-) # ysth suggested that putting () in map shortcircuts # empty lines. It worked. Thanks. :-) # Wow, even better, dropped that testing bit. #my %hostlist = map { /^(\w+)\s(.*)/?($1,$2):() } (<DATA>); my %hostlist = map { /^(\w+)\s(.*)/ } (<DATA>); print Dumper(\%hostlist); __DATA__ host1 1.1.1.1 host2 1.2.3.5
    Method 2 - Better approach
    use strict; use Data::Dumper; my %hostlist; { local $/; %hostlist = <DATA> =~ /^(\w+)\s(.*)/gm; } print Dumper(\%hostlist); __DATA__ host1 1.1.1.1 host2 1.2.3.5
    Updated: added the 3rd method after seen jonadab's suggestion. Here's my solution of capturing a real host file.

    Method 3 - Capture from the /etc/host file
    use strict; use Data::Dumper; my %hosts; while (<DATA>) { next if /^\s*(?:#|$)/; # ignore comments and empty lines /^([^\s#]+)\s+(.*)/; # capture ip address and names $hosts{$_} = $1 foreach split /\s+/, $2; } print Dumper(\%hosts); __DATA__ # IP Masq gateway: 192.168.0.80 pedestrian # Primary desktop: 192.168.0.82 raptor1 # Family PC upstairs: 192.168.0.84 trex tyrannosaur family # Domain servers: 205.212.123.10 dns1 brutus 208.140.2.15 dns2 156.63.130.100 dns3 cherokee
      By no means am I discounting the other methods, but I really like this one-liner due to my haphazzard knowledge of regex:

      %hostlist = <DATA> =~ /^(\w+)+\s(.*)/gm;

      This is exactly what I was looking for even though I was no where near explicit about what I wanted.

      I'm already applying this method to a few more scripts. The map was cool too.

      I am somewhat curious as to the extensive use of Data::Dumper instead of print-ing %hash?

      Thanks all!
        I am somewhat curious as to the extensive use of Data::Dumper instead of print-ing %hash?
        Just shows the structure better, and shows up any undefs, tabs, newlines, wide characters, unprintable characters, etc. that sneak into your data better.

      Look at your first regex again. /^(\w+)+/ looks quite funny :) aww he decided to remove the double '+'ing :(

      Anyhow... just another way of writing the regex:

      my %hotlist = map { m#\A(\S+)\s+(.*)\z#; $1 => $2 } <DATA>;

      my %hostlist = map { /^(\w+)\s(.*)/; $1 => $2 } (<DATA>);
      should be....
      my %hostlist = map { /^(\w+)\s+(.*)/; $1 => $2 } (<DATA>);
      (extra + after the whitespace metachar)

      davis
      It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
Re: newbie hasher
by davido (Cardinal) on Nov 20, 2003 at 05:35 UTC
    Here's another way:

    use strict; use warnings; my %hash = map { chomp; split /\s+/, $_, 2 } <DATA>; print "$_, $hash{$_}\n" foreach keys %hash; __DATA__ host1 1.1.1.1 host2 1.2.3.5

    Fun huh? ;)


    Dave


    "If I had my life to live over again, I'd be a plumber." -- Albert Einstein
Re: newbie hasher
by etcshadow (Priest) on Nov 20, 2003 at 05:24 UTC
    # assuming you've already opened your file... while (my $line = <FILE>) { chomp $line; my ($host,$ip) = split(/\s+/,$line,2); $hosts{$host} = $ip; }

    ------------
    :Wq
    Not an editor command: Wq
      Close to perfect, IMO, except that I'd add a test for empty lines, like this:
      my ($host,$ip) = split(/\s+/,$line,2) or next;
Re: newbie hasher
by DrHyde (Prior) on Nov 20, 2003 at 09:06 UTC
    While I don't doubt that your file looks like that, it's not a regular hosts file. In /etc/hosts, each line has an IP followed by a name (and optionally more names), whereas you have a name followed by an IP. /etc/hosts can also have comments in it. To deal with a traditional /etc/hosts file ...
    open(HOSTS, '/etc/hosts') || die("Out of cucumber error\n"); my %hosts; while(<HOSTS>) { s/#.*//; next unless /(\S+)\s+(.*)/; my $host = $1; push @{$hosts{$host}}, split(/\s+/, $2); } use Data::Dumper; print Dumper(\%hosts);
Re: newbie hasher
by Coruscate (Sexton) on Nov 20, 2003 at 05:37 UTC

    Since nobody else has used map() yet:

    open my $hosts, '<', 'hosts.txt' or die "open failed: $!"; my %hosts = map { chomp; split /\s+/ } <$hosts>; close $hosts or die "close failed: $!";

    Update: Dope! Two people got maps in before me (somehow) lol

Re: newbie hasher
by jonadab (Parson) on Nov 20, 2003 at 12:12 UTC
    my %hosts = map{ split /\s+/, $_, 2 }<DATA>; # Or substitute your favourite filehandle here. __DATA__ host1 1.1.1.1 host2 1.2.3.5

    Note, however, that the format you give is not the standard format for hosts files. The usual format is more like...

    # IP Masq gateway: 192.168.0.80 pedestrian # Primary desktop: 192.168.0.82 raptor1 # Family PC upstairs: 192.168.0.84 trex tyrannosaur family # Domain servers: 205.212.123.10 dns1 brutus 208.140.2.15 dns2 156.63.130.100 dns3 cherokee

    This is easy enough to read too...

    open HOSTS, "</etc/hosts"; # Or "<C:\\WINDOWS\\hosts"; my %hosts = map{ my $ip, $hn, @hn; if (not /^\s*#/) { chomp; s/\s*#.*$//; ($ip, @hn) = split /\s+/, $_; } map { $_ => $ip }, @hn; }<HOSTS>;

    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: newbie hasher
by Art_XIV (Hermit) on Nov 20, 2003 at 13:58 UTC

    If you expect your input to be 'dirty', then regular expressions might serve you better:

    use strict; use Data::Dumper; my %hosts; while (<DATA>) { chomp; next if /^#/; $hosts{$1} = $2 if /^(\w+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/; warn "Garbage at line $. of DATA: $_\n" unless $1 and $2; } print Dumper(%hosts); 1; __DATA__ host1 1.13.10.116 #comment host2 21.90.30.31 #blah blah blah host3 56.87.1.10 host4 13.16.17 #The following hosts are on network B host5 57.98.18.10 #yadda yadda yadda host6 106.10.3.12

    Splits would be more efficient if you expect the input to be clean, though.

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://308499]
Approved by Roger
Front-paged by Roger
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2024-04-19 01:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found