Re: Extract first word from certain lines in a file

Anonymous Monk,
I, like the others, had a hard time understanding what exactly what you wanted. I made some assumptions:

Product: is a delimiter
Work needs to be done on lines between delimiters
The first word (defined by whitespace) of each line is important and will be smaller than 64K
The longest substring that each word ends with is what is desired

CountZero

#!/usr/bin/perl
use strict;
use warnings;
local $/ = "\nProduct:\n";
while ( <DATA> ) {
    my @line = map { /Product:/ ? () : (split " ")[0] } split /\n/;
    print join ' + ', @line;
    print " = ", common( \@line ), "\n";
}
sub common {
    my $line = shift;
    my $short = 65536;
    for ( @$line ) { $short = length $_ if length $_ < $short };
    my $index;
    for ( --$index ; $short-- ; --$index ) {
        my $str = substr($line->[0], $index);
        for ( @$line ) {
            return substr($_, ++$index) if substr($_, $index) ne $str;
        }
    }
}
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
[download]

Cheers - L~R

Comment on Re: Extract first word from certain lines in a file Download Code

Replies are listed 'Best First'.

Re^2: Extract first word from certain lines in a file
by ihb (Deacon) on Oct 26, 2004 at 16:25 UTC

For a common suffix routine the regex engine's brute force approach comes in handy:

sub is_suffix { substr($_[0], -length($_[1])) eq $_[1] }

sub common_suffix {
    $_[0] =~ m[(?>(.+))(?(?{ not is_suffix($_[1], $^N) })(?!))]s;
    return $1;
}
[download]

:-)

Obvious optimizations can be done, such as reorder the arguments so that the string matched against ($_[0]) is the shorter string, or shrink the longer string, but I didn't want to clutter the essence of the routine.

To get the common suffix of a list, use &reduce from List::Util:

use List::Util qw/ reduce /;

print reduce { common_suffix($a, $b) } qw/ redball greenball stall /;

__END__
all
[download]

As the regex might be a bit cryptic, here's how it works: