Extract first word from certain lines in a file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Extract first word from certain lines in a file
by CountZero (Bishop) on Oct 25, 2004 at 21:26 UTC

ball

But what will you do with: hugespoon and littlespoon -> espoon?

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

[reply]

Re: Extract first word from certain lines in a file
by ides (Deacon) on Oct 25, 2004 at 21:16 UTC

Your question isn't very clear. I have no idea what you mean by redball + greenball = ball.

If you are trying to get multiple lines from a file it is usually good to accumulate the information you retrieve from the file, between Product: markers, as either a scalar with both lines inside of it or as an array with one line per array element.

Maybe if you rephrase your question we can help you out.

Frank Wiles <frank@wiles.org>
http://www.wiles.org

[reply]

Re: Extract first word from certain lines in a file
by ikegami (Patriarch) on Oct 25, 2004 at 21:26 UTC

a thread

[reply]

Re: Extract first word from certain lines in a file
by Limbic~Region (Chancellor) on Oct 25, 2004 at 21:41 UTC

Anonymous Monk

Product: is a delimiter
Work needs to be done on lines between delimiters
The first word (defined by whitespace) of each line is important and will be smaller than 64K
The longest substring that each word ends with is what is desired

CountZero

#!/usr/bin/perl
use strict;
use warnings;
local $/ = "\nProduct:\n";
while ( <DATA> ) {
    my @line = map { /Product:/ ? () : (split " ")[0] } split /\n/;
    print join ' + ', @line;
    print " = ", common( \@line ), "\n";
}
sub common {
    my $line = shift;
    my $short = 65536;
    for ( @$line ) { $short = length $_ if length $_ < $short };
    my $index;
    for ( --$index ; $short-- ; --$index ) {
        my $str = substr($line->[0], $index);
        for ( @$line ) {
            return substr($_, ++$index) if substr($_, $index) ne $str;
        }
    }
}
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
[download]

Cheers - L~R

[reply]
[d/l]

Re^2: Extract first word from certain lines in a file

by ihb (Deacon) on Oct 26, 2004 at 16:25 UTC

For a common suffix routine the regex engine's brute force approach comes in handy:

sub is_suffix { substr($_[0], -length($_[1])) eq $_[1] }

sub common_suffix {
    $_[0] =~ m[(?>(.+))(?(?{ not is_suffix($_[1], $^N) })(?!))]s;
    return $1;
}
[download]

:-)

Obvious optimizations can be done, such as reorder the arguments so that the string matched against ($_[0]) is the shorter string, or shrink the longer string, but I didn't want to clutter the essence of the routine.

To get the common suffix of a list, use &reduce from List::Util:

use List::Util qw/ reduce /;

print reduce { common_suffix($a, $b) } qw/ redball greenball stall /;

__END__
all
[download]

As the regex might be a bit cryptic, here's how it works:

Read more... (2 kB)

Related documents:

• perlvar - Perl predefined variables

• perlreref - Perl regular expressions reference

• perlre - Perl regular expressions

ihb

See perltoc if you don't know which perldoc to read!
Read argumentation in its context!

[reply]
[d/l]
[select]

Re: Extract first word from certain lines in a file
by Anonymous Monk on Oct 25, 2004 at 21:44 UTC

Hopefully this time will be better!

[reply]

Re^2: Extract first word from certain lines in a file

by ikegami (Patriarch) on Oct 25, 2004 at 21:59 UTC

sub dump_product {
   my ($product) = @_;
   print(join(', ', @$product), $/) if (@$product);
   @$product = ();
}

my @product;

while (<DATA>) {
   if (/^Product\b/) {
      dump_product(\@product);
   } elsif (/^(\S+)/) {
      push(@product, $1);
   }
}
dump_product(\@product);

__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
[download]

output:

redball, greenball
smallbox, bigbox
[download]

[reply]
[d/l]
[select]

Re^2: Extract first word from certain lines in a file

by CountZero (Bishop) on Oct 25, 2004 at 21:54 UTC

use strict;
while ( <DATA> ) {
    do {print "\n"; next} if /Product:/;
    my $firstword=(split " ")[0];
    print "$firstword ";
}


__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
[download]

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

[reply]
[d/l]

Re^2: Extract first word from certain lines in a file

by borisz (Canon) on Oct 25, 2004 at 22:29 UTC

my (@w, $c);
local ( $", $_ ) = ", ";
do {
  $_ = <DATA>;
  if ( defined $_ && !/^Product:/ ) {
    push @w,(split ' ')[0];
  }
  elsif ( @w ) {
    print "Product: ", ++$c, "\n@w\n";
    @w = ();
  }
} while ( defined $_ );
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
[download]

Product: 1
redball, greenball
Product: 2
smallbox, bigbox
[download]

Boris

[reply]
[d/l]
[select]

Re^3: Extract first word from certain lines in a file

by CountZero (Bishop) on Oct 26, 2004 at 05:42 UTC

local ( $", $_ ) = ", ";

local

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

[reply]
[d/l]
[select]

Re^4: Extract first word from certain lines in a file

by borisz (Canon) on Oct 26, 2004 at 08:03 UTC