Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I made a post yesterday (read file twice) asking if anyone could help me read each line in a file until i found a word for the second time (the word will never appear on the same line twice).
I had an excellent response and spent most of yesterday and today implementing each suggestion and playing around with any variants. A small flaw all of the suggestions, due to the wording of the question i submitted, is that they do indeed cope for a second table title, but only if there are two and not one.
The idea behind finding these words are that they are table titles. I find the first table with 'formulae' as a title and process it, extracting the data i require. There is a problem i overlooked, as i had only hard copies of these files im writing the script for, and failed to notice that some tables that are too long for a page, will continue on the next, with the table title & column headers above it.
So in fact, in some files the script has to process 2 tables with the same title and column headers to get all the data. The files contain other tables as well but i only need the formulae table, they have different titles, which is why i chose to find 'formulae'.
The original code i had was simple & only found one table title (i was assuming any tables that exceeded the page length would just continue on the next page, not having another title & set of column headers). I now need to cater for tables that continue on the next page (i.e. 2 tables with the same title).
do { $_ = <> } until /formulae/; <> for (1..3); # skips 3 lines from title # read input file while (<>) { last if /^\+/; ## extract data from table ## }
Thanks for all your help and the fantastic response i had yesterday, it helped a lot in other parts of the script i needed to write. Steve.
update (broquaint): added link to previous thread
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: RP: Finding the second line an item appears on
by tachyon (Chancellor) on Oct 28, 2003 at 11:57 UTC |