There's a difference between
I'm trying to write a script which generates the longest common substring + 1 character for each line in a file.
and
I'm trying to get the script to return the smallest number of characters to uniquely identify each line.
Assume the following lines:
AA AB ABC ABCDE
In the first case, the output should be
AA AB AB AB
In the second case, output should be
AA AB ABC ABCD
For the first case, this should work:
#!/usr/bin/perl -lw use strict; chomp( my @lines = <DATA> ); # find shortest line my @foo = sort { length $a <=> length $b } @lines; my $diff = $foo[0]; LINES: for (@lines) { # the longest common substring can only be as long # as the shortest line my $line = substr $_, 0, length $diff; while( $line ) { if ( $line eq $diff ) { $diff = $line; next LINES; } else { # reduce current line and recent common string by one $line = substr $line, 0, -1; $diff = substr $diff, 0, -1; } } } for( @lines ) { print substr $_, 0, length( $diff ) + 1; } __DATA__ AA AB ABC ABCDE
-- Frank ('s first post at perlmonks)
In reply to Re: longest common substring... almost?
by haoess
in thread longest common substring... almost?
by Wodenic
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |