in reply to Re: More Regular Expressions (text data handling)
in thread More Regular Expressions (text data handling)

As I stated earlier, the noise is before and after and it is possible to identify an index and work from there.
The number of sets of data is always 70 and the index /^Number:/ is always the third piece of data (after blank lines are removed).
So you can, somewhat, ignore that for the question. I was including it for completeness.

<a href="http://www.graq.co.uk">Graq</a>

  • Comment on Re: Re: More Regular Expressions (text data handling)

Replies are listed 'Best First'.
Re: Re: Re: More Regular Expressions (text data handling)
by frankus (Priest) on Dec 04, 2001 at 20:39 UTC

    As I see it, you require the use of forward lookaheads in a regex:

    Since the line before Number contains the name and the persons details are terminated again by name,
    something that grabs the name and the text between two instances of the name can be got.

    You could then make a hash of names with the value being a hash of details, does that sound good?

    --

    Brother Frankus.

    ¤

      The Number is the unique key for the data.
      Having written this problem down, and examined it as I try to explain it :), I have decided to attempt this approach:
      1. Remove all blank lines.
      2. Find the index and grab 70 lines (-2..68).
      3. Split the data into three sections.
      4. Deal with section overlaps.
      The three sections are:
      1. All lines up to (but excluding) the first line with a colon.
      2. All lines with a colon.
      3. The rest.
      Count 'The rest' and move that many lines from section 2 into (preceding) section 3.

      This should help sort the data.

      <a href="http://www.graq.co.uk">Graq</a>