Unique Array Entries

The_Rev has asked for the wisdom of the Perl Monks concerning the following question:

Hi Again, You guys have been so helpful lately, I thought I would ask for some advice on a piece of code I have constructed. My objective is to consolidate all the unique elements of my array. My current elements are:

  CHILD: Topology/IPClassA
  CHILD: Topology/IPClassA/Device
  CHILD: Topology/IPClassA/Device/log_ratio
  CHILD: Topology/IPClassA/Device/poll_interval
  CHILD: Topology/IPClassA/Device = 2
  CHILD: Topology/IPClassA/Device = 2/poll_interval
  CHILD: Topology/IPClassA/Device = 2/is_managed
  CHILD: Topology/IPClassA/Device = 2/Port
  CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress
  CHILD: Topology/IPClassA/Device = 2/Port/poll_interval
  CHILD: Topology/IPClassC
  CHILD: Topology/IPClassC/Device
  CHILD: Topology/IPClassC/Device/poll_interval
  CHILD: Location
[download]

Now as you can see the first element (Topology/IPClassA) is part of the second element so I would like to disregard that one, and continue this until I have an array of unique data. My desired result would look like this:

  CHILD: Topology/IPClassA/Device/log_ratio
  CHILD: Topology/IPClassA/Device/poll_interval
  CHILD: Topology/IPClassA/Device = 2/poll_interval
  CHILD: Topology/IPClassA/Device = 2/is_managed
  CHILD: Topology/IPClassA/Device = 2/Port
  CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress
  CHILD: Topology/IPClassA/Device = 2/Port/poll_interval
  CHILD: Topology/IPClassC/Device/poll_interval
  CHILD: Location
[download]

Please note that order DOES matter and that the eighth element of the final array Topology/IPClassC/Device/) is the same as the first element.

I have tried such things as removing the child portion of the string and pushing them onto an array with the following code:

  push (@foo, $_) unless ($seen{$_}++);
[download]

The eighth element doesn't make the array, due to the fact that it matches the second element

Thanks
-TheRev

2002-02-02 Edit by Corion : Added formatting

Comment on Unique Array Entries Select or Download Code

Replies are listed 'Best First'.
Re: Unique Array Entries by Corion (Patriarch) on Mar 02, 2002 at 15:40 UTC
use strict; my $lastline; while (<DATA>) { chomp; unless (/^\Q$lastline\E/) { print $lastline,"\n"; }; $lastline = $_; }; # Assuming that the last line read will always # be relevant print $lastline; __DATA__ CHILD: Topology/IPClassA CHILD: Topology/IPClassA/Device CHILD: Topology/IPClassA/Device/log_ratio CHILD: Topology/IPClassA/Device/poll_interval CHILD: Topology/IPClassA/Device = 2 CHILD: Topology/IPClassA/Device = 2/poll_interval CHILD: Topology/IPClassA/Device = 2/is_managed CHILD: Topology/IPClassA/Device = 2/Port CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress CHILD: Topology/IPClassA/Device = 2/Port/poll_interval CHILD: Topology/IPClassC CHILD: Topology/IPClassC/Device CHILD: Topology/IPClassC/Device/poll_interval CHILD: Location [download] gives `D:\Downloads>perl -w tmp.pl Use of uninitialized value in quotemeta at tmp.pl line 9, <DATA> line +1. CHILD: Topology/IPClassA/Device/log_ratio CHILD: Topology/IPClassA/Device/poll_interval CHILD: Topology/IPClassA/Device = 2/poll_interval CHILD: Topology/IPClassA/Device = 2/is_managed CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress CHILD: Topology/IPClassA/Device = 2/Port/poll_interval CHILD: Topology/IPClassC/Device/poll_interval CHILD: Location` [download] This is one line less than what you want, as you seem to want to handle `CHILD: Topology/IPClassA/Device = 2/Port` [download] in a special way. `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web` [download]	[reply] [d/l] [select]
Re: Re: Unique Array Entries by The_Rev (Acolyte) on Mar 02, 2002 at 16:32 UTC
Corion, You are correct, regarding Child: Topology/IPClassA/Device = 2/Port That was a typo. Thanks	[reply]
Re: Re: Re: Unique Array Entries by The_Rev (Acolyte) on Mar 02, 2002 at 16:36 UTC
That should not have made the list	[reply]
(jeffa) Re: Unique Array Entries by jeffa (Bishop) on Mar 02, 2002 at 16:22 UTC
Neat problem! I like Corion's answer, but here is another way to try out. The idea is to split each entry on the slash to create an array. Then you eval that array into a hash data structure: `use strict; use Data::Dumper; $Data::Dumper::Indent = 1; my %thingy; my @lines = ... # insert your data here for (sort @lines) { my ($type,$stuff) = $_ =~ /([A-Z]):\s+(.*)/; my @parts = split('/',$stuff); push @parts,''; my $str = q\|$thingy{'\| . shift(@parts) . q\|'}\|; for my $part (@parts) { $str .= qq\|->{'$part'}\|; } eval $str; } print Dumper \%thingy;` [download] This produces a data structure like so: $VAR1 = { 'Location' => {}, 'Topology' => { 'IPClassA' => { 'Device' => { 'log_ratio' => {}, 'poll_interval' => {} }, 'Device = 2' => { 'Port' => { 'ifPhysAddress' => {}, 'poll_interval' => {} }, 'is_managed' => {}, 'poll_interval' => {} } }, 'IPClassC' => { 'Device' => { 'poll_interval' => {} } } } }; Now all you need is some code to turn the data structure back into a list of paths: `my (@list,$flat); flatten($_,$thingy{$_}) foreach keys %thingy; sub flatten { my ($key,$rest) = @_; unless ($rest) { push @list,$flat; undef $flat; return; } $flat .= "$key/"; flatten(%$rest); } print Dumper \@list;` [download] But this does not work correctly: `$VAR1 = [ 'Location/', 'Topology/IPClassA/Device/log_ratio/' ];` [download] Sorry, but i am at my wit's end on this one (Saturday morning laziness :D). At this point, my 'answer' turns into a question: 'How do you recursively "flatten" this data structure?' jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: (jeffa) Re: Unique Array Entries by Corion (Patriarch) on Mar 02, 2002 at 16:40 UTC
The idea is, once you have that structure, to walk down that structure, and each time you end up in a leaf (that is, you find no more children), to output the whole path you've taken : `# Just walk the created structure, and output # the name whenever we hit a leaf : sub walk { my ($hashref, $path) = @_; #print $path,"\n"; my @keys = keys %$hashref; if (@keys) { walk( $hashref->{$_}, $path . "/$_" ) foreach (@keys); } else { print "$path\n"; }; }; walk( \%thingy, "" );` [download] Of course, the whole method of hashes has the problem of not maintaining the order of the keys, but it's not clear if that will be a problem for The_Rev :-)) `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web` [download]	[reply] [d/l] [select]
Re: Re: (jeffa) Re: Unique Array Entries by The_Rev (Acolyte) on Mar 02, 2002 at 16:48 UTC
I think that your first snippet would work except for the fact that it does retain that typo: (Topology/Device=2/Port) Is there anyway to disregard that and have the following: `Topology/IPClassA/Device/log_ratio Topology/IPClassA/Device/poll_interval Topology/Device=2/poll_interval Topology/Device=2/is_managed Topology/Device=2/Port/ifPhysAddress Topology/Device=2/Port/poll_interval Topology/IPClassC/Device/poll_interval Location` [download] I am traversing an hlist that I have built, and would like to utilize all of the unique strings elsewhere. Thanks 2002-02-02 Edit by The Perlmonks Node Pixies : Added formatting	[reply] [d/l]
Re: Re: Re: (jeffa) Re: Unique Array Entries by Corion (Patriarch) on Mar 02, 2002 at 16:58 UTC
Re: Re: Re: (jeffa) Re: Unique Array Entries by The_Rev (Acolyte) on Mar 02, 2002 at 16:56 UTC
Re: Re: Re: (jeffa) Re: Unique Array Entries by The_Rev (Acolyte) on Mar 02, 2002 at 16:59 UTC
Re: Unique Array Entries by webadept (Pilgrim) on Mar 02, 2002 at 20:17 UTC
You might try using a hash with the Tie::IxHash module, which will allow you to retrive the values of your hash in the original insertoin order. This way you dedup and keep the sorting with a single effort. webadept.net	[reply]
Re: Unique Array Entries by nandeya (Monk) on Mar 02, 2002 at 18:55 UTC
Here is another way to do... I don't think it is as fast as could be, but I believe that Corion's initial posting assumes that you are comparing with only the previous line, and also that the last line is always 'good' to keep (but what do I know, he is a Saint, and I am a chump, no wait, an Acolyte, about the same thing though). #!perl -w use strict; open (DATAFILE,"D:/JUNK/data.txt"); my $file_content = do {local $/; <DATAFILE> }; my @fa1 = split(/\n/,$file_content); my $cnt=0; my %hv=(); #Set to hash with a count to retain order of list foreach (@fa1) { $hv{$_}=$cnt; $cnt++; } my %already_deleted=(); foreach my $v1 (@fa1) { if (exists($already_deleted{$v1})) { next; } my $len1 = length($v1); foreach my $v2 (keys %hv) { if ((exists($already_deleted{$v1})) \|\| (exists($already_deleted{$v +1}))) { next; } my $len2 = length($v2); if (($len2 > $len1) && (substr($v2,0,$len1) eq $v1)) { #Delete the hash key value delete $hv{$v1}; #Store deleted hash values to help move through the #foreach statements quicker with next statements $already_deleted{$v1}=1; } else { #Nothing } } } #Retain the original order, then print 'em out. my @keys = sort{$hv{$a} <=> $hv{$b}} keys %hv; foreach (@keys) {print "$_ \n"}; ___DATAFILE___ Topology/IPClassA Topology/IPClassA/Device Topology/IPClassA/Device/log_ratio Topology/IPClassA/Device/poll_interval Topology/IPClassA/Device = 2 Topology/IPClassA/Device = 2/poll_interval Topology/IPClassA/Device = 2/is_managed Topology/IPClassA/Device = 2/Port Topology/IPClassA/Device = 2/Port/ifPhysAddress Topology/IPClassA/Device = 2/Port/poll_interval Topology/IPClassC Topology/IPClassC/Device Topology/IPClassC/Device/poll_interval Location [download]	[reply] [d/l]