Parsing text without split

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, All,

I almost always use the split function to parse a line of text and then define a variable based on the split result. Here's an example:-

@line_as_list = split (/\s+/,$rptfile[$ln]);
$width = $rptfile[ 3]

$rptfile is the file I have slurped in, and $ln is the current line number in that file.

For the type of parsing I typically do, the above procedure is usually fine. Obviously, though, this is more difficult to manage when the text I need to locate is not in a consistent loction.

What is the easiest way to define a variable after a keyword is found? Here's an example string:-

Startpoint: flop_a (clocked by t_clk)

I want to quickly assign flop_a to a variable and t_clk to another variable, but sometimes there are extra words after "flop_a" and before "clocked", so splitting by \s+ is too unpredictable. The variables I need will always follow "Startpoint:" and "clocked by", though.

My line names are always $rtpfile[$ln], where $ln is the current line number I am working on (I often have to bounce back to previous line numbers...)

Thank You!

Comment on Parsing text without split

Replies are listed 'Best First'.
Re: Parsing text without split by Roy Johnson (Monsignor) on May 27, 2005 at 21:30 UTC
Extract from a regex match: `($flop_a, $t_clk) = $rtpfile[$ln] =~ /Startpoint: (\w+).?clocked by (\w+)/;` Caution:* Contents may have been coded under pressure.	[reply] [d/l]
Re^2: Parsing text without split by thundergnat (Deacon) on May 27, 2005 at 22:34 UTC
That has the same problem as split... It will fail for variables with spaces in them. If you can rely on the parenthesis, it would work better as: `@line_as_list = $rtpfile[$ln] =~ /Startpoint: ([^(]+).?clocked by ([^ +)]+)\)/;` [download] Update:* Doh! Never mind. I misread the OPs specs.	[reply] [d/l]
Re^3: Parsing text without split by Roy Johnson (Monsignor) on May 27, 2005 at 22:42 UTC
I took his meaning to be that the "extra words" were not part of the desired capture. You can capture multiple words with spaces by changing `(\w+)` to `([\w\s]+)`. Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]