in reply to longest common substring... almost?
There's a difference between
I'm trying to write a script which generates the longest common substring + 1 character for each line in a file.
and
I'm trying to get the script to return the smallest number of characters to uniquely identify each line.
Assume the following lines:
AA AB ABC ABCDE
In the first case, the output should be
AA AB AB AB
In the second case, output should be
AA AB ABC ABCD
For the first case, this should work:
#!/usr/bin/perl -lw use strict; chomp( my @lines = <DATA> ); # find shortest line my @foo = sort { length $a <=> length $b } @lines; my $diff = $foo[0]; LINES: for (@lines) { # the longest common substring can only be as long # as the shortest line my $line = substr $_, 0, length $diff; while( $line ) { if ( $line eq $diff ) { $diff = $line; next LINES; } else { # reduce current line and recent common string by one $line = substr $line, 0, -1; $diff = substr $diff, 0, -1; } } } for( @lines ) { print substr $_, 0, length( $diff ) + 1; } __DATA__ AA AB ABC ABCDE
-- Frank ('s first post at perlmonks)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: longest common substring... almost?
by ambrus (Abbot) on Apr 19, 2004 at 09:03 UTC |