Re^3: longest common substring (with needed tweaks)

Replies are listed 'Best First'.
Re^4: longest common substring (with needed tweaks) by Lennotoecom (Pilgrim) on Nov 05, 2013 at 15:20 UTC
#takes first line from the <DATA> and split values by ' ' into $lines +and $matches ($lines, $matches) = split /\s/, <DATA>; #takes the next line from the <DATA>, chop off the \n and split result +ed string #into @a array by symbols $_ = <DATA>; $_ = $` if /$/; @a = split //, $_; #in this cycle(1) we create all possible combinations of substrings ou +t of the #@a array, (out of the first line) and equals them to 1 for $i (0 .. $#a){ $e = $a[$i]; $hash{$e} = 1; for $y ($i+1 .. $#a){ $e .= $a[$y]; $hash{$e} = 1; } } #in this cycle(2) we read file line by line and for every line #we do exactly the same as the previous cycle but into #temporal hash and then in the foreach cycle(3) we increment #existed keys from the first hash if they are in the current line while(<DATA>){ $_ = $` if /$/; @a = split //, $_; %thash = (); for $i (0 .. $#a){ $e = $a[$i]; $thash{$e} = 1 if defined $hash{$e}; for $y ($i+1 .. $#a){ $e .= $a[$y]; $thash{$e} = 1 if defined $hash{$e}; } } foreach $key (keys %hash){ $hash{$key}++ if defined $thash{$key}; } } #and finally here we go through the hash #and print only those keys which have their value == $matches $max = ''; foreach $key (keys %hash){ if($hash{$key} == $matches){ print "$key\n"; # $max = $key if length($max) < length($key); } } print "$max\n"; __DATA__ 3 2 strrringggg ssttrrringggg stttrrringgg [download] this whole script has a flaw: the whole resulting hash is build upon the first text line so in order to fix it in the cycle number 3 if the hash value is undefined you should create one, not omit like in this example	[reply] [d/l]
Re^5: longest common substring (with needed tweaks) by Lennotoecom (Pilgrim) on Nov 06, 2013 at 09:22 UTC
`sub f { @a = split //, shift; $ih = shift; for $i (0 .. $#a){ $e = $a[$i]; ${$ih}{$e} = 1; for $y ($i+1 .. $#a){ $e .= $a[$y]; ${$ih}{$e} = 1; } } } ($l, $m) = split /\s/, <DATA>; $_ = <DATA>; chomp; %h = (); f($_, \%h); while(<DATA>){ chomp; %th = (); f($_, \%th); $h{$_}++ foreach (keys %th); } foreach $key (keys %h){ if($h{$key} == $m){ $r[length($key)] = [] if ! exists $r[length($key)]; push $r[length($key)], $key; } } print "@{$r[$#r]}\n"; __DATA__ 3 2 ac bc b` [download] 1: creates sub named f, which takes two parameters: string and a reference to a hash that sub puts all combinations of substrings out of the given string into the hash 2: splits the first string from the file into two variables $l $m 3: takes the next line from the file and sends it into sub f with the reference to an empty hash %h 4: at this point the first line from the file is split on all its substrings which are put into hash %h and have a value 1 5: then we read the rest of the file line by line and send these lines to the sub f along the reference to an empty hash %th, right after that the two hashes are compared and the %h hash is incremented on the doubled values 6: runs through the %h hash and if the value of the key is amount of overlaps we need, then put it into an @r array of arrays 7: the last line prints all the longest overlaps with the same length	[reply] [d/l]