substrate has asked for the wisdom of the Perl Monks concerning the following question:

How do you build a regexp that matches up to but not including a regular expression in perl? For instance, I have the following plain text file:
.subckt cct0 v0 v1 v2 v3 + v4 v5 v6 * useless comment + v7 v8 v9 + va vb x00 v0 v1 x0 cct1 .ends
I want to do an operation on everything between the .subckt line up to, but not including the x00 line. so I write something like this:
while(<>) { if(/^\.subckt/../^\w/) { print; } }
This almost works, but it prints out the x00 line. I know that a word transition is what I want to switch on, but I don't know how to capture everything up to, but not including the word transition. On the word transistions I'll need to do another operation.

Replies are listed 'Best First'.
Re: regexp match up to but not incliuding...
by davido (Cardinal) on Dec 18, 2003 at 16:53 UTC
    A positive zero-width lookahead isn't going to be helpful if your current idiom of reading one line at a time and matching using a flopflop operator.

    Here's one possible solution:

    use strict; use warnings; while ( $_ = <DATA> and /^\.subckt/../^\w/ ) { last if /^\w/; print; } __DATA__ .subckt cct0 v0 v1 v2 v3 + v4 v5 v6 * useless comment + v7 v8 v9 + va vb x00 v0 v1 x0 cct1 .ends

    This has the disadvantage of performing a couple of pattern matches on every line of the file read. However, it improves upon your snippet by jumping out of the loop the instant it detects the end of the data you're looking for.


    Dave

Re: regexp match up to but not incliuding...
by ysth (Canon) on Dec 18, 2003 at 16:56 UTC
    To answer the general question, you use /^(?s:(?!ending-regex).)*/ but for any particular case, there will be a more efficient way.

    Update: I didn't mean to apply the //s flag to the ending-regex; should have been /^(?:(?!ending-regex)(?s:.))*/.

    In your case, to do something like that you would need to switch to slurping in the whole file instead of reading line by line (by undef'ing $/). You may prefer to keep the code you have and just interrogate the return value of your ..:

    while (<>) { my $inrange = /^\.subckt/../^\w/; if ( $inrange && $inrange !~ /E/ ) { print; } }
    The scalar .. operator returns a positive count if you are in the range. At the end of the range the count has "E0" appended to it, explicitly so you can check for this condition.
Re: regexp match up to but not incliuding...
by gjb (Vicar) on Dec 18, 2003 at 16:19 UTC

    Look for zero-width positive look-ahead in the perldocs ((?=pattern)) and you might be interested in using the s regex switch (same document as refered to above).

    Hope this helps, -gjb-

Re: regexp match up to but not incliuding...
by injunjoel (Priest) on Dec 18, 2003 at 18:20 UTC
    Greetings all,
    If your file is small you could always read it in as a string and then ignore the newlines "\n" by using the s modifier.
    Here is my example sans relevant error checking... I made a plain text file with your original posted data for testing (That is what the your_file is referring to)
    #!/usr/bin/perl -w use strict; unless (open(F,'./your_file')){ die "Unable to open file for reading $!"; }else{ my $file_str = join("",<F>); close F; if($file_str =~ m/^(\.subckt.*?)x00/s){ my $match_str = $1; print $match_str."\n"; }else{ print "No match for you!"; } exit; } __OUTPUT__ .subckt cct0 v0 v1 v2 v3 + v4 v5 v6 * useless comment + v7 v8 v9 + va vb
    A little explaination.
    The 'm/^(\.subckt.*?)x00/s' is matching as if the string were a single line so \n's are ignored (or included with '.' however you look at it you can match across newlines with the 's' modifier), the '.*?' is doing a non-greedy match, without the '?' the '.*' would match as much as possible, this really only matters when you have the same pattern present in your search and you want to limit which one the regexp stops at.
    hope that helps.
    -injunjoel