in reply to Re^2: longest common substring (with needed tweaks)
in thread longest common substring (with needed tweaks)

for example:
$a = 'aa ab c c'; $a=~m/b/; now $` contains 'aa a' $& contains 'b' $' contains ' c c'
in other words all symbols of a line before the found result
found result,
and all the symbols after found results

Replies are listed 'Best First'.
Re^4: longest common substring (with needed tweaks)
by Lennotoecom (Pilgrim) on Nov 05, 2013 at 15:20 UTC
    #takes first line from the <DATA> and split values by ' ' into $lines +and $matches ($lines, $matches) = split /\s/, <DATA>; #takes the next line from the <DATA>, chop off the \n and split result +ed string #into @a array by symbols $_ = <DATA>; $_ = $` if /$/; @a = split //, $_; #in this cycle(1) we create all possible combinations of substrings ou +t of the #@a array, (out of the first line) and equals them to 1 for $i (0 .. $#a){ $e = $a[$i]; $hash{$e} = 1; for $y ($i+1 .. $#a){ $e .= $a[$y]; $hash{$e} = 1; } } #in this cycle(2) we read file line by line and for every line #we do exactly the same as the previous cycle but into #temporal hash and then in the foreach cycle(3) we increment #existed keys from the first hash if they are in the current line while(<DATA>){ $_ = $` if /$/; @a = split //, $_; %thash = (); for $i (0 .. $#a){ $e = $a[$i]; $thash{$e} = 1 if defined $hash{$e}; for $y ($i+1 .. $#a){ $e .= $a[$y]; $thash{$e} = 1 if defined $hash{$e}; } } foreach $key (keys %hash){ $hash{$key}++ if defined $thash{$key}; } } #and finally here we go through the hash #and print only those keys which have their value == $matches $max = ''; foreach $key (keys %hash){ if($hash{$key} == $matches){ print "$key\n"; # $max = $key if length($max) < length($key); } } print "$max\n"; __DATA__ 3 2 strrringggg ssttrrringggg stttrrringgg
    this whole script has a flaw:
    the whole resulting hash is build upon the first text line
    so in order to fix it in the cycle number 3 if the hash value is undefined you
    should create one, not omit like in this example
      sub f { @a = split //, shift; $ih = shift; for $i (0 .. $#a){ $e = $a[$i]; ${$ih}{$e} = 1; for $y ($i+1 .. $#a){ $e .= $a[$y]; ${$ih}{$e} = 1; } } } ($l, $m) = split /\s/, <DATA>; $_ = <DATA>; chomp; %h = (); f($_, \%h); while(<DATA>){ chomp; %th = (); f($_, \%th); $h{$_}++ foreach (keys %th); } foreach $key (keys %h){ if($h{$key} == $m){ $r[length($key)] = [] if ! exists $r[length($key)]; push $r[length($key)], $key; } } print "@{$r[$#r]}\n"; __DATA__ 3 2 ac bc b
      1: creates sub named f, which takes two parameters: string and a reference to a hash
      that sub puts all combinations of substrings out of the given string into the hash
      2: splits the first string from the file into two variables $l $m
      3: takes the next line from the file and sends it into sub f with the reference to an empty hash %h
      4: at this point the first line from the file is split on all its substrings which are put into
      hash %h and have a value 1
      5: then we read the rest of the file line by line and send these lines to the sub f along the reference
      to an empty hash %th, right after that the two hashes are compared and the %h hash is incremented on the doubled values
      6: runs through the %h hash and if the value of the key is amount of overlaps we need, then put it into an @r array of arrays
      7: the last line prints all the longest overlaps with the same length