Re: Extract first word from certain lines in a file
by CountZero (Bishop) on Oct 25, 2004 at 21:26 UTC
|
| [reply] |
Re: Extract first word from certain lines in a file
by ides (Deacon) on Oct 25, 2004 at 21:16 UTC
|
Your question isn't very clear. I have no idea what you mean by redball + greenball = ball.
If you are trying to get multiple lines from a file it is
usually good to accumulate the information you retrieve from the file, between Product: markers, as either a scalar with both lines inside of it or as an array with one line per array element.
Maybe if you rephrase your question we can help you out.
| [reply] |
Re: Extract first word from certain lines in a file
by ikegami (Patriarch) on Oct 25, 2004 at 21:26 UTC
|
I don't quite understand either, but it sounds like a Longuest Common Substring problem. I found a thread (seemingly) discussing LCS in depth.
| [reply] |
Re: Extract first word from certain lines in a file
by Limbic~Region (Chancellor) on Oct 25, 2004 at 21:41 UTC
|
Anonymous Monk,
I, like the others, had a hard time understanding what exactly what you wanted. I made some assumptions:
- Product: is a delimiter
- Work needs to be done on lines between delimiters
- The first word (defined by whitespace) of each line is important and will be smaller than 64K
- The longest substring that each word ends with is what is desired
As CountZero points out, this is likely flawed
#!/usr/bin/perl
use strict;
use warnings;
local $/ = "\nProduct:\n";
while ( <DATA> ) {
my @line = map { /Product:/ ? () : (split " ")[0] } split /\n/;
print join ' + ', @line;
print " = ", common( \@line ), "\n";
}
sub common {
my $line = shift;
my $short = 65536;
for ( @$line ) { $short = length $_ if length $_ < $short };
my $index;
for ( --$index ; $short-- ; --$index ) {
my $str = substr($line->[0], $index);
for ( @$line ) {
return substr($_, ++$index) if substr($_, $index) ne $str;
}
}
}
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
| [reply] [d/l] |
|
|
sub is_suffix { substr($_[0], -length($_[1])) eq $_[1] }
sub common_suffix {
$_[0] =~ m[(?>(.+))(?(?{ not is_suffix($_[1], $^N) })(?!))]s;
return $1;
}
It's not efficient for large strings but I think it's neat anyway. :-)
Obvious optimizations can be done, such as reorder the arguments so that the string matched against ($_[0]) is the shorter string, or shrink the longer string, but I didn't want to clutter the essence of the routine.
To get the common suffix of a list, use &reduce from List::Util:
use List::Util qw/ reduce /;
print reduce { common_suffix($a, $b) } qw/ redball greenball stall /;
__END__
all
As the regex might be a bit cryptic, here's how it works:
Related documents:
| • perlvar | - Perl predefined variables |
| • perlreref | - Perl regular expressions reference |
| • perlre | - Perl regular expressions |
ihb
See perltoc if you don't know which perldoc to read!
Read argumentation in its context!
| [reply] [d/l] [select] |
Re: Extract first word from certain lines in a file
by Anonymous Monk on Oct 25, 2004 at 21:44 UTC
|
| [reply] |
|
|
sub dump_product {
my ($product) = @_;
print(join(', ', @$product), $/) if (@$product);
@$product = ();
}
my @product;
while (<DATA>) {
if (/^Product\b/) {
dump_product(\@product);
} elsif (/^(\S+)/) {
push(@product, $1);
}
}
dump_product(\@product);
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
output:
redball, greenball
smallbox, bigbox
| [reply] [d/l] [select] |
|
|
use strict;
while ( <DATA> ) {
do {print "\n"; next} if /Product:/;
my $firstword=(split " ")[0];
print "$firstword ";
}
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] |
|
|
my (@w, $c);
local ( $", $_ ) = ", ";
do {
$_ = <DATA>;
if ( defined $_ && !/^Product:/ ) {
push @w,(split ' ')[0];
}
elsif ( @w ) {
print "Product: ", ++$c, "\n@w\n";
@w = ();
}
} while ( defined $_ );
__DATA__
Product:
redball This is for Mike.
greenball This is for Dave.
Product:
smallbox This is for apples
bigbox This is for orange
output:
Product: 1
redball, greenball
Product: 2
smallbox, bigbox
| [reply] [d/l] [select] |
|
|
I was wondering why you use local ( $", $_ ) = ", "; since there is no enclosing scope other than the script file itself. The local does not seem to serve any purpose.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] [select] |
|
|