There's a difference between

I'm trying to write a script which generates the longest common substring + 1 character for each line in a file.

and

I'm trying to get the script to return the smallest number of characters to uniquely identify each line.

Assume the following lines:

AA AB ABC ABCDE

In the first case, the output should be

AA AB AB AB

In the second case, output should be

AA AB ABC ABCD

For the first case, this should work:

#!/usr/bin/perl -lw use strict; chomp( my @lines = <DATA> ); # find shortest line my @foo = sort { length $a <=> length $b } @lines; my $diff = $foo[0]; LINES: for (@lines) { # the longest common substring can only be as long # as the shortest line my $line = substr $_, 0, length $diff; while( $line ) { if ( $line eq $diff ) { $diff = $line; next LINES; } else { # reduce current line and recent common string by one $line = substr $line, 0, -1; $diff = substr $diff, 0, -1; } } } for( @lines ) { print substr $_, 0, length( $diff ) + 1; } __DATA__ AA AB ABC ABCDE

-- Frank ('s first post at perlmonks)


In reply to Re: longest common substring... almost? by haoess
in thread longest common substring... almost? by Wodenic

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.