When I first saw the mention of XML, I was tempted to suggest using something like XML::Simple to parse the data. However, I wasn't sure if your "sample data" included all of the possible XML tags from your real data files. So that got me thinking about doing a custom parsing of the data.
Anyways, I decided to challenge myself to see if I could come up with working code that would actually do the job without using an XML parsing module. Well, it may not be the "best" way, but the code below appears to do the job. Hopefully this rough bit of code is good enough to give you some ideas on how to do your file comparison. Enjoy!
Sample File 1 - data1.txt
<host id="bobjones" root-directory="."> <host-alias>www.foo.com</host-alias> <host-alias>www.bar.com</host-alias> <host-alias>www.dj.com</host-alias> </host> <host id="bobsmith" root-directory="."> <host-alias>www.abc.com</host-alias> <host-alias>www.def.com</host-alias> <host-alias>www.ghij.com</host-alias> </host> <host id="pauljones" root-directory="."> <host-alias>www.zyx.com</host-alias> <host-alias>www.wvut.com</host-alias> <host-alias>www.srqpon.com</host-alias> </host>
Sample File 2 - data2.txt
<host id="mikebrown" root-directory="."> <host-alias>www.foo.com</host-alias> <host-alias>www.bar.com</host-alias> <host-alias>www.dj.com</host-alias> </host> <host id="bobjones" root-directory="."> <host-alias>www.bar.com</host-alias> <host-alias>www.dj.com</host-alias> <host-alias>www.music.com</host-alias> </host> <host id="bobsmith" root-directory="."> <host-alias>www.abc.com</host-alias> <host-alias>www.good.com</host-alias> <host-alias>www.def.com</host-alias> <host-alias>www.ghij.com</host-alias> </host> <host id="pauljones" root-directory="."> <host-alias>www.bad.com</host-alias> <host-alias>www.zyx.com</host-alias> <host-alias>www.wvut.com</host-alias> <host-alias>www.srqpon.com</host-alias> </host>
Code:
use strict; my $file1 = "data1.txt"; my $file2 = "data2.txt"; my $raw_data1 = Slurp_File($file1); my $raw_data2 = Slurp_File($file2); my (@sections1) = ($raw_data1 =~ m/(<host .+?\/host>)/sig); my (@sections2) = ($raw_data2 =~ m/(<host .+?\/host>)/sig); my %parsed_file; foreach my $section (@sections2) { my ($id,@parsed_data) = Parse_Section($section); foreach my $alias (@parsed_data) { $parsed_file{$id}{$alias}++; } } foreach my $section (@sections1) { my ($id,@parsed_data) = Parse_Section($section); foreach my $alias (@parsed_data) { if (!$parsed_file{$id}{$alias}) { print "HostID: $id, Host-Alias: $alias was missing from file '$f +ile2'\n"; } } } ############ sub Slurp_File { my $file = shift; my $data; open(DATA,"<",$file) || die "Unable to open file '$file': $!\n"; { local $/; $data = <DATA>; } close(DATA); return $data; } sub Parse_Section { my $data = shift; my ($id) = ($data =~ m/id=\"(.+?)\"/i); my (@alias) = ($data =~ m/host-alias>(.+?)</ig); my (@list) = ($id,@alias); return @list; }
Output:
HostID: bobjones, Host-Alias: www.foo.com was missing from file 'data2 +.txt'
In reply to Re: Difficulty Mapping Data
by dasgar
in thread Difficulty Mapping Data
by walkingthecow
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |