The_Rev has asked for the wisdom of the Perl Monks concerning the following question:

Hi Again, You guys have been so helpful lately, I thought I would ask for some advice on a piece of code I have constructed. My objective is to consolidate all the unique elements of my array. My current elements are:

CHILD: Topology/IPClassA CHILD: Topology/IPClassA/Device CHILD: Topology/IPClassA/Device/log_ratio CHILD: Topology/IPClassA/Device/poll_interval CHILD: Topology/IPClassA/Device = 2 CHILD: Topology/IPClassA/Device = 2/poll_interval CHILD: Topology/IPClassA/Device = 2/is_managed CHILD: Topology/IPClassA/Device = 2/Port CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress CHILD: Topology/IPClassA/Device = 2/Port/poll_interval CHILD: Topology/IPClassC CHILD: Topology/IPClassC/Device CHILD: Topology/IPClassC/Device/poll_interval CHILD: Location

Now as you can see the first element (Topology/IPClassA) is part of the second element so I would like to disregard that one, and continue this until I have an array of unique data. My desired result would look like this:

CHILD: Topology/IPClassA/Device/log_ratio CHILD: Topology/IPClassA/Device/poll_interval CHILD: Topology/IPClassA/Device = 2/poll_interval CHILD: Topology/IPClassA/Device = 2/is_managed CHILD: Topology/IPClassA/Device = 2/Port CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress CHILD: Topology/IPClassA/Device = 2/Port/poll_interval CHILD: Topology/IPClassC/Device/poll_interval CHILD: Location

Please note that order DOES matter and that the eighth element of the final array Topology/IPClassC/Device/) is the same as the first element.

I have tried such things as removing the child portion of the string and pushing them onto an array with the following code:

push (@foo, $_) unless ($seen{$_}++);

The eighth element doesn't make the array, due to the fact that it matches the second element

Thanks
-TheRev

2002-02-02 Edit by Corion : Added formatting

Replies are listed 'Best First'.
Re: Unique Array Entries
by Corion (Patriarch) on Mar 02, 2002 at 15:40 UTC
    use strict; my $lastline; while (<DATA>) { chomp; unless (/^\Q$lastline\E/) { print $lastline,"\n"; }; $lastline = $_; }; # Assuming that the last line read will always # be relevant print $lastline; __DATA__ CHILD: Topology/IPClassA CHILD: Topology/IPClassA/Device CHILD: Topology/IPClassA/Device/log_ratio CHILD: Topology/IPClassA/Device/poll_interval CHILD: Topology/IPClassA/Device = 2 CHILD: Topology/IPClassA/Device = 2/poll_interval CHILD: Topology/IPClassA/Device = 2/is_managed CHILD: Topology/IPClassA/Device = 2/Port CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress CHILD: Topology/IPClassA/Device = 2/Port/poll_interval CHILD: Topology/IPClassC CHILD: Topology/IPClassC/Device CHILD: Topology/IPClassC/Device/poll_interval CHILD: Location
    gives
    D:\Downloads>perl -w tmp.pl Use of uninitialized value in quotemeta at tmp.pl line 9, <DATA> line +1. CHILD: Topology/IPClassA/Device/log_ratio CHILD: Topology/IPClassA/Device/poll_interval CHILD: Topology/IPClassA/Device = 2/poll_interval CHILD: Topology/IPClassA/Device = 2/is_managed CHILD: Topology/IPClassA/Device = 2/Port/ifPhysAddress CHILD: Topology/IPClassA/Device = 2/Port/poll_interval CHILD: Topology/IPClassC/Device/poll_interval CHILD: Location

    This is one line less than what you want, as you seem to want to handle

    CHILD: Topology/IPClassA/Device = 2/Port
    in a special way.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      Corion, You are correct, regarding Child: Topology/IPClassA/Device = 2/Port That was a typo. Thanks
        That should not have made the list
(jeffa) Re: Unique Array Entries
by jeffa (Bishop) on Mar 02, 2002 at 16:22 UTC
    Neat problem! I like Corion's answer, but here is another way to try out. The idea is to split each entry on the slash to create an array. Then you eval that array into a hash data structure:
    use strict; use Data::Dumper; $Data::Dumper::Indent = 1; my %thingy; my @lines = ... # insert your data here for (sort @lines) { my ($type,$stuff) = $_ =~ /([A-Z]):\s+(.*)/; my @parts = split('/',$stuff); push @parts,''; my $str = q|$thingy{'| . shift(@parts) . q|'}|; for my $part (@parts) { $str .= qq|->{'$part'}|; } eval $str; } print Dumper \%thingy;
    This produces a data structure like so:
    $VAR1 = {
      'Location' => {},
      'Topology' => {
        'IPClassA' => {
          'Device' => {
            'log_ratio' => {},
            'poll_interval' => {}
          },
          'Device = 2' => {
            'Port' => {
              'ifPhysAddress' => {},
              'poll_interval' => {}
            },
            'is_managed' => {},
            'poll_interval' => {}
          }
        },
        'IPClassC' => {
          'Device' => {
            'poll_interval' => {}
          }
        }
      }
    }; 
    
    Now all you need is some code to turn the data structure back into a list of paths:
    my (@list,$flat); flatten($_,$thingy{$_}) foreach keys %thingy; sub flatten { my ($key,$rest) = @_; unless ($rest) { push @list,$flat; undef $flat; return; } $flat .= "$key/"; flatten(%$rest); } print Dumper \@list;
    But this does not work correctly:
    $VAR1 = [ 'Location/', 'Topology/IPClassA/Device/log_ratio/' ];
    Sorry, but i am at my wit's end on this one (Saturday morning laziness :D). At this point, my 'answer' turns into a question: 'How do you recursively "flatten" this data structure?'

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      The idea is, once you have that structure, to walk down that structure, and each time you end up in a leaf (that is, you find no more children), to output the whole path you've taken :

      # Just walk the created structure, and output # the name whenever we hit a leaf : sub walk { my ($hashref, $path) = @_; #print $path,"\n"; my @keys = keys %$hashref; if (@keys) { walk( $hashref->{$_}, $path . "/$_" ) foreach (@keys); } else { print "$path\n"; }; }; walk( \%thingy, "" );

      Of course, the whole method of hashes has the problem of not maintaining the order of the keys, but it's not clear if that will be a problem for The_Rev :-))

      perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

        I think that your first snippet would work except for the fact that it does retain that typo: (Topology/Device=2/Port) Is there anyway to disregard that and have the following:

        Topology/IPClassA/Device/log_ratio Topology/IPClassA/Device/poll_interval Topology/Device=2/poll_interval Topology/Device=2/is_managed Topology/Device=2/Port/ifPhysAddress Topology/Device=2/Port/poll_interval Topology/IPClassC/Device/poll_interval Location

        I am traversing an hlist that I have built, and would like to utilize all of the unique strings elsewhere.

        Thanks

        2002-02-02 Edit by The Perlmonks Node Pixies : Added formatting

Re: Unique Array Entries
by webadept (Pilgrim) on Mar 02, 2002 at 20:17 UTC
    You might try using a hash with the Tie::IxHash module, which will allow you to retrive the values of your hash in the original insertoin order. This way you dedup and keep the sorting with a single effort.

    webadept.net
Re: Unique Array Entries
by nandeya (Monk) on Mar 02, 2002 at 18:55 UTC
    Here is another way to do...

    I don't think it is as fast as could be, but I believe that Corion's initial posting assumes that you are comparing with only the previous line, and also that the last line is always 'good' to keep (but what do I know, he is a Saint, and I am a chump, no wait, an Acolyte, about the same thing though).
    #!perl -w use strict; open (DATAFILE,"D:/JUNK/data.txt"); my $file_content = do {local $/; <DATAFILE> }; my @fa1 = split(/\n/,$file_content); my $cnt=0; my %hv=(); #Set to hash with a count to retain order of list foreach (@fa1) { $hv{$_}=$cnt; $cnt++; } my %already_deleted=(); foreach my $v1 (@fa1) { if (exists($already_deleted{$v1})) { next; } my $len1 = length($v1); foreach my $v2 (keys %hv) { if ((exists($already_deleted{$v1})) || (exists($already_deleted{$v +1}))) { next; } my $len2 = length($v2); if (($len2 > $len1) && (substr($v2,0,$len1) eq $v1)) { #Delete the hash key value delete $hv{$v1}; #Store deleted hash values to help move through the #foreach statements quicker with next statements $already_deleted{$v1}=1; } else { #Nothing } } } #Retain the original order, then print 'em out. my @keys = sort{$hv{$a} <=> $hv{$b}} keys %hv; foreach (@keys) {print "$_ \n"}; ___DATAFILE___ Topology/IPClassA Topology/IPClassA/Device Topology/IPClassA/Device/log_ratio Topology/IPClassA/Device/poll_interval Topology/IPClassA/Device = 2 Topology/IPClassA/Device = 2/poll_interval Topology/IPClassA/Device = 2/is_managed Topology/IPClassA/Device = 2/Port Topology/IPClassA/Device = 2/Port/ifPhysAddress Topology/IPClassA/Device = 2/Port/poll_interval Topology/IPClassC Topology/IPClassC/Device Topology/IPClassC/Device/poll_interval Location