sharan has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am new to perl. I seek ur help in writing a program. I am trying to read a file with some whose data looks like:
process I know the simple case is finding all the sub; bdjh pond(end)cannot; gsjad pond: process(start); begin;
I need to get the output as:
end process
I tried with:
while(<PAGE>) { if (/pond/) { print "$_"; } } close(PAGE);
But with code i get the whole line instead of just end and start. Thanking you

Replies are listed 'Best First'.
Re: read a word from a line
by ELISHEVA (Prior) on Feb 25, 2009 at 14:13 UTC
    When you read in a line from a file like this while(<PAGE>), $_ stores the whole line you just read in. So print "$_" prints the whole line.

    To print out only part of the line, you need to use a capturing regular expression. You will also need to escape the ( and ) because those have special meaning in a regular expression. So to get the stuff between the parenthesis after pond:

    print $1 if (/pond\(([^)]*)\)/);

    [^)]* means match everything up until the first closing ). ([^)]*) says to capture it and store it in $1. \( and \) say "treat ( and ) as normal parenthesis". See perlretut for more information.

    Best, beth

    Update:Fixed typo, as per AnomalousMonk below. Thanks.

      [^)*] means match everything up until the first closing ).
      [^)*] means "match a single character that is anything other than a ')' or a '*'".

      [^)]* means "greedily match zero or more of any character that is not a ')'".

      Thanks for ur reply.. I tried with ur command.. but its not working as wat i expect. My intention is just to read the value between the brackets.. i.e. between "(" and ")". Thats why my desired output is end and start. Thanking you,
        Perhaps you tried the regular expression before it was corrected? The original version had a typo and I apologize for that. The code above (corrected as per AnomalousMonk) does indeed extract just what is between the parenthesis. However, it can't be used just by itself. It has to be inserted into a loop that reads in each line. A lot depends on exactly how you do it.

        If you are having trouble getting it to work, perhaps you should consider adding an update to your original question and post the code where you are using the regular expression (or just post it in reply to this node). Then we might be able to see if there are other issues besides the choice of regular expression that are causing problems.

        Best, beth

Re: read a word from a line
by atemon (Chaplain) on Feb 25, 2009 at 14:17 UTC

    Hi,

    Try :

    while(<PAGE>) { if (/pond.*\((\w+)\)/) { print "$1\n"; } } close(PAGE);
    output:
    start end

    $_ is the default scalar variable. In your while loop, when you read the file, you are NOT specifying any variable to hold the line read from file. So its kept in $_. Again for your regular expression, you are not specify any string/variable to match with. So it again match with $_. So when you print $_, it prints entire line. Again, $1 contains the first matching string. For details please have a look at perlre and perlvar

    The above code is same as

    while($_ = <PAGE>) { if ( $_ =~ /pond.*\((\w+)\)/) { print "$1\n"; } } close(PAGE);
    or can be written without relying on $_ as
    my $line; while($line = <PAGE>) { if ($line =~ /pond.*\((\w+)\)/) { print "$1\n"; } } close(PAGE);

    Cheers !

    --VC

Re: read a word from a line
by Narveson (Chaplain) on Feb 25, 2009 at 14:16 UTC

    It's good that you're telling us the desired output, but why those two words in particular? It's a bit of a riddle. Your attempted solution gives a clue. Would the following be a good statement of the problem?

    Print the word that follows each occurrence of pond in the given file.
      NO... my intention is to grab just the value between the brackets.. i.e. between "(" and ")". Thats why i need just end and start as output. Thanking you

        Okay, that's clearer, especially now that you've changed your desired output. In your original question your desired output was

        end process