eastcoastcoder has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use a regex to catch the >>> of Python's interpreter, but ignore any lines beginning with >>> if preceeded or followed by a >>.

This:

>>> i = q
Should become:
PYTHON_PROMPT i = q

but this:

>>> i = q >> No!
should stay as is

I've tried:

s/(?<!^>>[^>])^>>>(?=[^>]*\n)(?!^>>[^>])/PYTHON_PROMPT/mg;
but it only partially worked.

Also, I'd prefer to avoid $1 if possible, for performance reasons.

Thanks!

janitored by ybiC: Minor format tweaks so that entire node isn't enclosed in <code> tags

Replies are listed 'Best First'.
Re: Using a regex to catch interpreters
by ikegami (Patriarch) on Jul 06, 2005 at 06:25 UTC

    I see three problems:

    1)The [^>] in (?=[^>]*\n) can match newlines.

    2) ">" isn't allowed anywhere on the ">>>" line (after the ">>>"). Is that intentional?

    3) The second negative lookahead is currently anchored to the character right after the ">>>", not to the start of the next line.

    Fixes:

    s/ (?<!^>>[^>]) ^>>> (?= [^\n]*\n # <-- removed < and added first \n (?!^>>[^>]) # <-- moved into (?=...) ) /PYTHON_PROMPT/xmg;

    Updated

      Also, the above (like your original) won't work if anything appears after ">>" on the previous line (including spaces). For example,

      >> text >>> text

      gets changed to

      >> text PYTHON_PROMPT text

      Is that how it should be? Fixing this isn't easy, and it might be impossible without $1

      Update: Well, the following doesn't use $1. However, I don't know about effeciency (since I don't have another solution with which to compare). It works, though, which is more than I can say for my attempts that did use $1.

      { foreach ($_) { if (/\G (?= >>> )/xgc) { # Substitution supressed by following ">>". redo if /\G >>> [^\n]* \n (?= >> (?!>) ) /xcg; # Do substitution. s/\G >>>/PYTHON_PROMPT/xcg; # Go process next line. /\G [^\n]* \n /xcg; redo; } # Substitution prevented by preceeding ">>" /\G >> (?!>) [^\n]* \n >>> [^\n]* \n /xcg && redo; # ^^^^^ optional due to statement order. # Go process next line. m/\G [^\n]* \n /xcg && redo; } print; print("==========\n"); }
Re: Using a regex to catch interpreters
by anonymized user 468275 (Curate) on Jul 06, 2005 at 09:11 UTC
    How about reading ahead, so simplifying the required substitution (i.e. modify the previous line of input), e.g.:

    #!/usr/bin/perl use strict; my ( $prev, $this ) = ( undef(), undef() ); while( <> ) { $this = $_; /^>>[^>]/ or mySubst( \$prev ); print $prev; $prev = $this; } # finally, process the last line of input which got left over mySubst( \$prev ); print $prev; sub mySubst { my $sref = shift; $$sref =~ s/^>>>/PYTHON_PROMPT/; }
    Update: Having just seen the previous post: The point of reading ahead is to avoid reading the whole file into an array - also (ahem!) the read-ahead version here is one I took a few minutes to test with suitable input.

    One world, one people

Re: Using a regex to catch interpreters
by magnus (Pilgrim) on Jul 06, 2005 at 09:05 UTC
    assuming you have your file in an array  @test(for example purposes) you could do someting pretty simple:
    my $end = @test; my $cnt = 0; until ($cnt == $end) if (($test[$cnt] =~ /^>>>\s/) && ($test[$cnt+1] !~ /^>>\s/)){ $tst[$cnt] =~ s/^>>>/PYTHON_PROMPT/; } $cnt++; }
Re: Using a regex to catch interpreters
by Anonymous Monk on Jul 06, 2005 at 10:43 UTC
    The following assumes the entire text in a single scalar, and leaves lines starting with four >'s alone as well.
    s/^>>>(?!>)(?![^\n]*\n>>(?!>))/PYTHON_PROMPT/gm;
    Also, I'd prefer to avoid $1 if possible, for performance reasons.
    That's a strange remark. You're willing to use regexes, and even Perl, but not $1, "for performance reasons". Apparently, performance is very important to you. In that case, you ought to use C - what you want to do isn't too hard in C.
      Clarification:
      >>> etc >>> etc >>> etc should be converted to PYTHON_PROMPT etc PYTHON_PROMPT etc PYTHON_PROMPT etc while >>> etc >>> etc >> etc should not. So, those read aheads / arrays / etc won't work, will they?

      Also, re not using $1 for performance - it's not so much that I mind for this regex, it's that I'm under the impression that once you use $1 once, perl needs to track it for every regex in the entire program... which I don't want to slow down. Is there a way to tell Perl to only turn on $1 here, but no where else...

        it's that I'm under the impression that once you use $1 once, perl needs to track it for every regex in the entire program...

        You're thinking of $&, $` and $'. They are always set by all regexps when they are used anywhere in the program.

        $1 .. $9 are always set by every regexps that succeeds. When a given regexp has no corresponding capture, they are set to undefined. No attention is payed to whether $1 .. $9 are used anywhere.

        Speaking of test, here's some code I wrote to do basic testing
        use Test::Simple tests => 7; sub cpi { # the converter goes here } $a = '>> no snake here >>> or here I agree'; $b = '>> and neither should this >>> be understood snakily >>> because it is a nested quote >> yes I know it is not really named after a snake '; $c = '>> >>>no snake >>>no snake '; $d= '>>> still no snake seen >> right! '; $snakea = '>>> i = j '; $snakeb = '>>> i = j + 1 >>> q is good code'; $snakec = '>> does this work right? lets see: >>> i = test(func) '; ok (cpi($a) eq $a); ok (cpi($b) eq $b); ok (cpi($c) eq $c); ok (cpi($d) eq $d); ok (cpi($snakea) eq 'PYTHON_PROMPT_PROTECTED i = j '); ok (cpi($snakeb) eq 'PYTHON_PROMPT_PROTECTED i = j + 1 PYTHON_PROMPT_PROTECTED q is good code'); ok(cpi($snakec) eq '>> does this work right? lets see: PYTHON_PROMPT_PROTECTED i = test(func) ');
      Please see my test cases below - your code failed 6 out of 7!