Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello good monks, I have come seeking your knowledge.

I'm working on a piece of code and want to repeatedly iterate over a data set efficiently without having to write tons of unnecessary code. Basically what I'm looking to do is to perform a regex operation on only SOME parts of a string but not ones that have already been through that loop (which have been stored to a hash with corresponding values) how could I make a conditional loop so that it only performed the operation on parts that haven't appeared yet? The strings are in the format A:B and are then split by the colon into an array containing A and B, processed, and then A:B and a value are stored to a hash. They proceed to get more complicated in later strings such as A:B:C:D:E where A:B and C:D:E have showed up before and I want to split the group into A:B and C:D:E instead of A B C D E, but I haven't gotten anything like that to work. Yes I have attempted to store the values to the hash with another symbol like |, but the source file I'm working from only has colons and that's all I have to work with. Any help is appreciated!

Replies are listed 'Best First'.
Re: Scanning Hashes for Repeated values
by choroba (Cardinal) on Jun 21, 2019 at 19:14 UTC
    Please, include more information. Providing a Short, Self Contained, Correct Example would help us to help you the most.

    For example, I don't understand how to proceed in some situations. Let's say A:B and B:C have been processed. Now A:B:C is read from the input - what should happen with it?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Scanning Hashes for Repeated values
by Anonymous Monk on Jun 21, 2019 at 19:31 UTC

    My apologies for some of the ambiguity, I'll do what I can to clear it up. From the data, only one new group is added at a time, and it will never include parts of an group that has already been established eg. if A:B has been looped through B:C or A:C cannot exist, but C:D can. Only one new group or single element are added at a time. Here's an example list of what I mean: 51:50, 51:50:47, 51:50:47:42, 37:30, 51:50:47:42:37:30 and so on. Here's my current loop which only works for groups of two such as A:B.

    foreach (0..$#names) { my $dist = (($distances[$_]*$xaxisstep)+42); my @tmp = split /:/, $names[$_]; my @tmp1; if (($#tmp+1)==2) { foreach (@tmp){ $_++; print PS " 42.00 $labelmarks{$_} mt $dist $labelmarks{$_} +ls\n"; } print PS " $dist $labelmarks{${tmp[0]}} mt $dist $labelmarks{$ +{tmp[1]}} ls\n"; $xlabelmarks{$tmp[0].":".$tmp[1]}= $dist; $labelmarks{$tmp[0].":".$tmp[1]} = (($labelmarks{$tmp[0]}+$lab +elmarks{$tmp[1]})/2); delete ($labelmarks{$tmp[0]}); delete ($labelmarks{$tmp[1]}); }

      Sorry, but your description is still quite unclear and your code does not run because it contains a lot of variables that we don't know about (I'm going to guess they aren't relevant to the question either). If you could please reduce your problem down to: short sample input, runnable code, the expected output for that input, and the actual output you're getting, including any error messages, each within <code> tags. As choroba already said, please read SSCCE, and also How do I post a question effectively?.

      Just for fun, I'm going to take a guess at what you might mean. Note this is a brute force approach that I just whipped up, there's probably a much more intelligent algorithm than this.

      use warnings; use strict; my @input = qw/ 51:50 51:50:47 51:50:47:42 37:30 51:50:47:42:37:30 /; my %seen; for my $in (@input) { my @items = split /:/, $in; my @subsets; for my $start (0..$#items-1) { for my $end ($start+1..$#items) { push @subsets, join ":", @items[$start..$end]; } } @subsets = sort { length $b <=> length $a } @subsets; my @seen; for my $subset (@subsets) { push @seen, $subset if $seen{$subset}++; } if (@seen) { my $new = $in; my ($re) = map {qr/\b(?:$_)\b/} join '|', map { quotemeta } @seen; 1 while $new =~ s/$re/:/g; $new =~ s/:+/:/g; $new =~ s/\A:|:\z//g; $new = length $new ? "'$new' is new" : "nothing is new"; print "In '$in', I've seen ", join('/', map {"'$_'"} @seen), " before, $new.\n"; } else { print "I have not seen '$in' before, it is new.\n"; } } __END__ I have not seen '51:50' before, it is new. In '51:50:47', I've seen '51:50' before, '47' is new. In '51:50:47:42', I've seen '51:50:47'/'51:50'/'50:47' before, '42' is + new. I have not seen '37:30' before, it is new. In '51:50:47:42:37:30', I've seen '51:50:47:42'/'51:50:47'/'50:47:42'/ +'51:50'/'50:47'/'47:42'/'37:30' before, nothing is new.