stanleysj has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I need help in creating a regex to extract the following lines from a text file.

Table: GO Terms [entity} Table: Aliases [Alias] MAL1P1.18 pfmalp012 alap11 Table: Y2H Interactions

what I need are the lines in between "Alias" .. "Table: Y2H Interactions". I tried to use the range operator but for some reason did not work. Maybe I was wrong with the syntax. Sometimes there could be no entries in between. But if there are "aliases" then i need to pick it up.

I tried a similar code like below

if /\[Alias\]/../^\s*\s$/ { push (@alias, $_); }

Replies are listed 'Best First'.
Re: need help in extracting lines
by Corion (Patriarch) on Jan 13, 2009 at 13:18 UTC

    If Perl thinks there is something wrong with the syntax, Perl tells you so. What did Perl tell you and what did you do about it? Maybe you want to just run your code using the diagnostics pragma?

    I used perl -Mdiagnostics -le  "if /\[Alias\]/../^\s*\s$/ { push (@alias, $_); }" and found the message pretty to the point.

      Another possible conceptual problem with stanleysj's approach is that the regex  /^\s*\s$/ is looking for a line consisting only of one or more whitespace characters, i.e., is equivalent to  /^\s+$/.

      This may or may not be what the OPer really wants to terminate the text block with.

Re: need help in extracting lines
by jdporter (Paladin) on Jan 13, 2009 at 14:44 UTC

    Others have already addressed your explicit question, but I'd like to suggest a different approach to your problem. Of couse, I'm making some assumptions about the real nature of your problem, so correct me if I'm wrong. Or simply disregard. :-)

    It appears that your data is a series of chunks separated by empty lines, and that each chunk begins with a Table: ... line. If so, then we could use perl's "paragraph" mode of reading input records:

    local $/ = ''; # read paragraphs while (<>) { # each paragraph is multiple lines, the first of which is "Table: .. +." # and the second is some kind of tag enclosed in brackets. my( $table ) = /^Table: (.*)/ or die "Hm... bad paragraph:\n$_"; my( undef, $tag, @lines ) = split /\n/; if ( $tag eq '[Alias]' ) { push @aliases, \@lines; } }
    Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.
      I tried this code out and needed to make a change or 2.
      -- the /^Table regex needs an 's' trailing modifier.
      -- the 'If (tag' condition I changed the 'eq' to '=~', because it turns out that the '[Alias]' line actually ends with a trailing space, and is '[Alias] '.

      But the paragraph mode is great! I once used a customer specific language that did list processing in paragraphs, but that was in the 70's. But it is a great processing mode.

      It is always better to have seen your target for yourself, rather than depend upon someone else's description.
        /^Table regex needs an 's' trailing modifier

        Depends on what you want in the $table variable. I was intending to get the name of the table, i.e. only what follows 'Table:' on that line. Adding the s modifier would put the entire rest of the paragraph into the variable.

        the '[Alias]' line actually ends with a trailing space, and is '[Alias] '

        In that case, I'd write

        if ( $tag eq '[Alias] ' )
        :-)

        But more importantly, square brackets are special in regular expressions, so you'd want to escape them if you go that route:

        if ( $tag =~ /\[Alias\]/ )

        Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.
Re: need help in extracting lines
by Bloodnok (Vicar) on Jan 13, 2009 at 13:35 UTC
    Just goes to show that you can see exactly what you want to (see, that is...).

    Without doing the one-liner (coz Corion had already done that:-), it took 3 scans before I noticed the missing parens.

    A user level that continues to overstate my experience :-))