Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file like:
Product: redball This is for Mike. greenball This is for Dave. Product: smallbox This is for apples bigbox This is for orange

I need to print out like:

redball + greenball = ball smallbox + bigbox = box

2004-10-27 Edited by Arunbear: Changed title from 'How do you get the vallues from mone than one lines?', and added code tags

Replies are listed 'Best First'.
Re: Extract first word from certain lines in a file
by CountZero (Bishop) on Oct 25, 2004 at 21:26 UTC
    Do you have a list of the various products or does the program have to find out all by itself? e.g. the name of the product is the common part between the first word each line within a product section: redball greenball -> ball

    But what will you do with: hugespoon and littlespoon -> espoon?

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Extract first word from certain lines in a file
by ides (Deacon) on Oct 25, 2004 at 21:16 UTC

    Your question isn't very clear. I have no idea what you mean by redball + greenball = ball.

    If you are trying to get multiple lines from a file it is usually good to accumulate the information you retrieve from the file, between Product: markers, as either a scalar with both lines inside of it or as an array with one line per array element.

    Maybe if you rephrase your question we can help you out.

    Frank Wiles <frank@wiles.org>
    http://www.wiles.org

Re: Extract first word from certain lines in a file
by ikegami (Patriarch) on Oct 25, 2004 at 21:26 UTC
    I don't quite understand either, but it sounds like a Longuest Common Substring problem. I found a thread (seemingly) discussing LCS in depth.
Re: Extract first word from certain lines in a file
by Limbic~Region (Chancellor) on Oct 25, 2004 at 21:41 UTC
    Anonymous Monk,
    I, like the others, had a hard time understanding what exactly what you wanted. I made some assumptions:
    • Product: is a delimiter
    • Work needs to be done on lines between delimiters
    • The first word (defined by whitespace) of each line is important and will be smaller than 64K
    • The longest substring that each word ends with is what is desired
    • As CountZero points out, this is likely flawed
    #!/usr/bin/perl use strict; use warnings; local $/ = "\nProduct:\n"; while ( <DATA> ) { my @line = map { /Product:/ ? () : (split " ")[0] } split /\n/; print join ' + ', @line; print " = ", common( \@line ), "\n"; } sub common { my $line = shift; my $short = 65536; for ( @$line ) { $short = length $_ if length $_ < $short }; my $index; for ( --$index ; $short-- ; --$index ) { my $str = substr($line->[0], $index); for ( @$line ) { return substr($_, ++$index) if substr($_, $index) ne $str; } } } __DATA__ Product: redball This is for Mike. greenball This is for Dave. Product: smallbox This is for apples bigbox This is for orange

    Cheers - L~R

      For a common suffix routine the regex engine's brute force approach comes in handy:

      sub is_suffix { substr($_[0], -length($_[1])) eq $_[1] } sub common_suffix { $_[0] =~ m[(?>(.+))(?(?{ not is_suffix($_[1], $^N) })(?!))]s; return $1; }
      It's not efficient for large strings but I think it's neat anyway. :-)

      Obvious optimizations can be done, such as reorder the arguments so that the string matched against ($_[0]) is the shorter string, or shrink the longer string, but I didn't want to clutter the essence of the routine.

      To get the common suffix of a list, use &reduce from List::Util:

      use List::Util qw/ reduce /; print reduce { common_suffix($a, $b) } qw/ redball greenball stall /; __END__ all

      As the regex might be a bit cryptic, here's how it works:

      Related documents:
      perlvar - Perl predefined variables
      perlreref - Perl regular expressions reference
      perlre - Perl regular expressions

      ihb

      See perltoc if you don't know which perldoc to read!
      Read argumentation in its context!

Re: Extract first word from certain lines in a file
by Anonymous Monk on Oct 25, 2004 at 21:44 UTC
    I am sorry if I did not state the question clearly. Actrully I want to get the first word of each lines of each product:
    We can see the output values can be:
    Product 1:
    redball,greenball
    smallbox,bigbox

    Hopefully this time will be better!

      sub dump_product { my ($product) = @_; print(join(', ', @$product), $/) if (@$product); @$product = (); } my @product; while (<DATA>) { if (/^Product\b/) { dump_product(\@product); } elsif (/^(\S+)/) { push(@product, $1); } } dump_product(\@product); __DATA__ Product: redball This is for Mike. greenball This is for Dave. Product: smallbox This is for apples bigbox This is for orange

      output:

      redball, greenball smallbox, bigbox
      That is easy:
      use strict; while ( <DATA> ) { do {print "\n"; next} if /Product:/; my $firstword=(split " ")[0]; print "$firstword "; } __DATA__ Product: redball This is for Mike. greenball This is for Dave. Product: smallbox This is for apples bigbox This is for orange

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      my (@w, $c); local ( $", $_ ) = ", "; do { $_ = <DATA>; if ( defined $_ && !/^Product:/ ) { push @w,(split ' ')[0]; } elsif ( @w ) { print "Product: ", ++$c, "\n@w\n"; @w = (); } } while ( defined $_ ); __DATA__ Product: redball This is for Mike. greenball This is for Dave. Product: smallbox This is for apples bigbox This is for orange
      output:
      Product: 1 redball, greenball Product: 2 smallbox, bigbox
      Boris
        I was wondering why you use local ( $", $_ ) = ", "; since there is no enclosing scope other than the script file itself. The local does not seem to serve any purpose.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law