in reply to parsing a terrible /etc/hosts

Whew, im not sure where to begin.

https://en.wikipedia.org/wiki/Hosts_(file)
The hosts file contains lines of text consisting of an IP address in the first text field followed by one or more host names. Each field is separated by white space – tabs are often preferred for historical reasons, but spaces are also used. Comment lines may be included; they are indicated by a hash character (#) in the first position of such lines. Entirely blank lines in the file are ignored.

First there is no rule that the first four lines of the hosts file will be comments

hosts files may have blank lines

Any line may have a comment, any text after the # is taken to be a comment. A comment may make the line otherwise blank if there is no ip/names before it.

Not every ip in the hosts file HAS to be 127.0.0.1. Mine has lines like

192.168.2.1 wifi.mylan
So i can access things on my local net by name.

notice followed by one or more host names. Multiple names are allowed on one line for the same ip address.

I think this will correctly read a hosts file, and do what you are after.

use strict; use warnings; my $file_read='C:/WINDOWS/system32/drivers/etc/hosts'; my $ha=[]; my $names={}; my $ips={}; #open (my $hf,'<',$file_read) or die "Can't open $file_read: $!";; my $hf=\*DATA; while (my $line=<$hf>) { chomp $line; # print $line."\n"; my $h={}; $h->{line}=$line; my ($pre,$comment)=split('#',$line,2); $h->{comment}=$comment if ($comment); if ($pre) { my @parts=split(/\s+/,$pre); if (scalar(@parts)>1) { my $ip=shift @parts; $h->{ip}=$ip; push @{$ips->{$ip}},@parts; $h->{names}=[@parts]; for my $name (@parts) {$names->{$name}=$ip; } } # parts } # pre push @$ha,$h; } #line use Data::Dumper; print Dumper($ha); print Dumper($ips); print Dumper($names); #open($out, '>', $file_write)|| die "\n error opening file $file_write + \n"; my $out=\*STDOUT; print $out "#Hosts file\n"; print $out "#Last Modified -> ". localtime() . "\n"; print $out "# \n"; print $out "# localhost: Needs to stay like this to work\n"; print $out "127.0.0.1\t localhost\n"; print $out "# \n"; delete $names->{localhost} if ($names->{localhost}); my @ksort=sort {my $r1=$names->{$a} cmp $names->{$b}; return $r1 if($r +1); $a cmp $b} keys(%$names); for my $name (@ksort) { print $out $names->{$name}."\t".$name."\n"; } __DATA__ # Copyright (c) 1993-1999 Microsoft Corp. # # This is a sample HOSTS file used by Microsoft TCP/IP for Windows. # # This file contains the mappings of IP addresses to host names. Each # entry should be kept on an individual line. The IP address should # be placed in the first column followed by the corresponding host nam +e. # The IP address and the host name should be separated by at least one # space. # # Additionally, comments (such as these) may be inserted on individual # lines or following the machine name denoted by a '#' symbol. # # For example: # # 102.54.94.97 rhino.acme.com # source server # 38.25.63.10 x.acme.com # x client host 127.0.0.1 localhost ads.pointroll.com scanner2.malware-scan.com +localhost adsys.townnews.com adimages.townnews.com ad.doubleclick.net + pagead2.googlesyndication.com ad.yieldmanager.com view.atdmt.com ads +.revsci.net servedby.advertising.com jeffcity30.autochooser.com perfo +rmanceoptimizer.com cache.fimservecdn.com pixel.quantserve.com ads.yi +mg.com this.content.served.by.adshuffle.com img-cdn.mediaplex.com cac +he.fimservecdn.com adserving.cpxinteractive.com pixel.quantserve.com +s0.2mdn.net 127.0.0.1 www.zip2save.com d1.openx.org c3.openx.org partner.goo +gleadservices.com media.ljworld.com everythingmidmo.com www.everythin +gmidmo.com edge.quantserve.com pixel.quantserve.com ad-g.doubleclick. +net ads.yimg.com ad.wsod.com s0.2mdn.net s0.2mdn.net 192.168.1.1 nat.mylan 192.168.1.100 dhcp1.nat.mylan 192.168.2.1 wifi.mylan 192.168.2.100 dhcp1.wifi.mylan 192.168.1.234 lxle0 lxle0.mylan 192.168.1.200 me me.mylan 192.168.1.200 me me.mylan 192.168.254.251 wan.mylan
last part of result
#Hosts file #Last Modified -> Fri Mar 24 01:39:59 2017 # # localhost: Needs to stay like this to work 127.0.0.1 localhost # 127.0.0.1 ad-g.doubleclick.net 127.0.0.1 ad.doubleclick.net 127.0.0.1 ad.wsod.com 127.0.0.1 ad.yieldmanager.com 127.0.0.1 adimages.townnews.com 127.0.0.1 ads.pointroll.com 127.0.0.1 ads.revsci.net 127.0.0.1 ads.yimg.com 127.0.0.1 adserving.cpxinteractive.com 127.0.0.1 adsys.townnews.com 127.0.0.1 c3.openx.org 127.0.0.1 cache.fimservecdn.com 127.0.0.1 d1.openx.org 127.0.0.1 edge.quantserve.com 127.0.0.1 everythingmidmo.com 127.0.0.1 img-cdn.mediaplex.com 127.0.0.1 jeffcity30.autochooser.com 127.0.0.1 media.ljworld.com 127.0.0.1 pagead2.googlesyndication.com 127.0.0.1 partner.googleadservices.com 127.0.0.1 performanceoptimizer.com 127.0.0.1 pixel.quantserve.com 127.0.0.1 s0.2mdn.net 127.0.0.1 scanner2.malware-scan.com 127.0.0.1 servedby.advertising.com 127.0.0.1 this.content.served.by.adshuffle.com 127.0.0.1 view.atdmt.com 127.0.0.1 www.everythingmidmo.com 127.0.0.1 www.zip2save.com 192.168.1.1 nat.mylan 192.168.1.100 dhcp1.nat.mylan 192.168.1.200 me 192.168.1.200 me.mylan 192.168.1.234 lxle0 192.168.1.234 lxle0.mylan 192.168.2.1 wifi.mylan 192.168.2.100 dhcp1.wifi.mylan 192.168.254.251 wan.mylan
YMMV

Replies are listed 'Best First'.
Re^2: parsing a terrible /etc/hosts
by afoken (Chancellor) on Mar 24, 2017 at 07:54 UTC
    Not every ip in the hosts file HAS to be 127.0.0.1.

    And not every ip in the hosts file has to be an IPv4 address. It has become quite common to see something like "::1 this-machine.lan this-machine" in /etc/hosts - IPv6.

    Parsing any hosts file is easy: Read line by line, strip comments (s/#.*//), remove trailing and leading whitespace (s/^\s+//; s/\s+$//;), skip empty lines (length or next), split at whitespace (@tmp=split /\s+/). Splitting must return at least two elements. First element must match an IPv4 or IPv6 address (see Regexp::Common::net), all other elements must be valid host names (again, see Regexp::Common::net).

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^2: parsing a terrible /etc/hosts
by Anonymous Monk on Mar 24, 2017 at 07:05 UTC
    Cpan? Let me try hosts ->

    Config::Hosts Interface to /etc/hosts file

    Parse::Hosts Parse /etc/hosts

    App::ParseHosts Parse /etc/hosts (CLI)

    Looks promising

      Config::Hosts Interface to /etc/hosts file

      if ($hosts->{$ip}) { print STDERR "Line $l: Warning: duplicate IP entry $ip, the last one + will be used\n"; }
      As far as i remember(AIX, ubuntu, win) you may have duplicate lines with the same ip and they are "joined".

      Also "output" mixes up names and ips in same array. While strange, this is a valid line
      127.0.0.1 192.168.0.1
      it makes the host-name 192.168.0.1 map to the ip of 127.0.0.1, common use is by script kiddies tho.

      Parse::Hosts Parse /etc/hosts

      unless (defined $content) { open my($fh), "<", "/etc/hosts" or return [500, "Can't read /etc/hosts: $!"]; local $/; $content = <$fh>; }
      only reads /etc/hosts or you have to read it yourself and pass it in. "output" is an array of hashs {ip => $ip, hosts => \@hosts}

      App::ParseHosts Parse /etc/hosts (CLI) Just a wrapper around Parse::Hosts to allow you to read anyfile

      Mine parses the lines same way as them, and has more usable data structures for the task and example. Mine doesnt check for valid ipv4 or ipv6 format tho like Config::Hosts almost does. I looked at Config::Hosts first and didnt like it

      add: and none are core either

      @Anonymous-Monk

      Thanks for the links.

      Well, my first inclination is to write code not search, I like to know what 'stuff' is doing. To each their own. I've been burned before by relying on modules.

Re^2: parsing a terrible /etc/hosts
by f77coder (Beadle) on Mar 25, 2017 at 02:59 UTC
    Huck,

    Thank you for the help. I threw a big string of IP6 at it

    127.0.0.1 2606:f180:1:2e8:2e8:1fbd:8257:d7c1 2600:3c03::f03c:91ff:fee5:3474 2a02:a450:9137:1:c8c:974c:2b65:7fec 2a03:4a80:2:2d6:2d6:853e:e533:bfa7 2600:3c03::f03c:91ff:fee5:3474

    and no problems.