Using a regex to catch interpreters

eastcoastcoder has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use a regex to catch the >>> of Python's interpreter, but ignore any lines beginning with >>> if preceeded or followed by a >>.

This:

>>> i = q
[download]

Should become:

PYTHON_PROMPT i = q
[download]

but this:

>>> i = q
>> No!
[download]

should stay as is

I've tried:

s/(?<!^>>[^>])^>>>(?=[^>]*\n)(?!^>>[^>])/PYTHON_PROMPT/mg;
[download]

but it only partially worked.

Also, I'd prefer to avoid $1 if possible, for performance reasons.

Thanks!

janitored by ybiC: Minor format tweaks so that entire node isn't enclosed in <code> tags

Comment on Using a regex to catch interpreters Select or Download Code

Replies are listed 'Best First'.
Re: Using a regex to catch interpreters by ikegami (Patriarch) on Jul 06, 2005 at 06:25 UTC
I see three problems: 1)The `[^>]` in `(?=[^>]\n)` can match newlines. 2) "`>`" isn't allowed anywhere on the "`>>>`" line (after the "`>>>`"). Is that intentional? 3) The second negative lookahead is currently anchored to the character right after the "`>>>`", not to the start of the next line. Fixes: `s/ (?<!^>>[^>]) ^>>> (?= [^\n]\n # <-- removed < and added first \n (?!^>>[^>]) # <-- moved into (?=...) ) /PYTHON_PROMPT/xmg;` [download] Updated	[reply] [d/l] [select]
Re^2: Using a regex to catch interpreters by ikegami (Patriarch) on Jul 06, 2005 at 06:52 UTC
Also, the above (like your original) won't work if anything appears after "`>>`" on the previous line (including spaces). For example, `>> text >>> text` [download] gets changed to `>> text PYTHON_PROMPT text` [download] Is that how it should be? Fixing this isn't easy, and it might be impossible without `$1` Update: Well, the following doesn't use `$1`. However, I don't know about effeciency (since I don't have another solution with which to compare). It works, though, which is more than I can say for my attempts that did use `$1`. Read more... (658 Bytes) `{ foreach ($_) { if (/\G (?= >>> )/xgc) { # Substitution supressed by following ">>". redo if /\G >>> [^\n]* \n (?= >> (?!>) ) /xcg; # Do substitution. s/\G >>>/PYTHON_PROMPT/xcg; # Go process next line. /\G [^\n]* \n /xcg; redo; } # Substitution prevented by preceeding ">>" /\G >> (?!>) [^\n]* \n >>> [^\n]* \n /xcg && redo; # ^^^^^ optional due to statement order. # Go process next line. m/\G [^\n]* \n /xcg && redo; } print; print("==========\n"); }` [download]	[reply] [d/l] [select]
Re: Using a regex to catch interpreters by anonymized user 468275 (Curate) on Jul 06, 2005 at 09:11 UTC
How about reading ahead, so simplifying the required substitution (i.e. modify the previous line of input), e.g.: `#!/usr/bin/perl use strict; my ( $prev, $this ) = ( undef(), undef() ); while( <> ) { $this = $_; /^>>[^>]/ or mySubst( \$prev ); print $prev; $prev = $this; } # finally, process the last line of input which got left over mySubst( \$prev ); print $prev; sub mySubst { my $sref = shift; $$sref =~ s/^>>>/PYTHON_PROMPT/; }` [download] Update: Having just seen the previous post: The point of reading ahead is to avoid reading the whole file into an array - also (ahem!) the read-ahead version here is one I took a few minutes to test with suitable input. One world, one people	[reply] [d/l]
Re: Using a regex to catch interpreters by magnus (Pilgrim) on Jul 06, 2005 at 09:05 UTC
assuming you have your file in an array `@test`(for example purposes) you could do someting pretty simple: `my $end = @test; my $cnt = 0; until ($cnt == $end) if (($test[$cnt] =~ /^>>>\s/) && ($test[$cnt+1] !~ /^>>\s/)){ $tst[$cnt] =~ s/^>>>/PYTHON_PROMPT/; } $cnt++; }` [download]	[reply] [d/l] [select]
Re: Using a regex to catch interpreters by Anonymous Monk on Jul 06, 2005 at 10:43 UTC
The following assumes the entire text in a single scalar, and leaves lines starting with four `>`'s alone as well. `s/^>>>(?!>)(?![^\n]\n>>(?!>))/PYTHON_PROMPT/gm;` [download] Also, I'd prefer to avoid $1 if possible, for performance reasons.* That's a strange remark. You're willing to use regexes, and even Perl, but not $1, "for performance reasons". Apparently, performance is very important to you. In that case, you ought to use C - what you want to do isn't too hard in C.	[reply] [d/l] [select]
Re^2: Using a regex to catch interpreters by eastcoastcoder (Sexton) on Jul 06, 2005 at 14:53 UTC
Clarification: `>>> etc >>> etc >>> etc should be converted to PYTHON_PROMPT etc PYTHON_PROMPT etc PYTHON_PROMPT etc while >>> etc >>> etc >> etc should not. So, those read aheads / arrays / etc won't work, will they?` [download] Also, re not using $1 for performance - it's not so much that I mind for this regex, it's that I'm under the impression that once you use $1 once, perl needs to track it for every regex in the entire program... which I don't want to slow down. Is there a way to tell Perl to only turn on $1 here, but no where else...	[reply] [d/l]
Re^3: Using a regex to catch interpreters by ikegami (Patriarch) on Jul 06, 2005 at 18:05 UTC
it's that I'm under the impression that once you use $1 once, perl needs to track it for every regex in the entire program... You're thinking of `$&`, $` and `$'`. They are always set by all regexps when they are used anywhere in the program. `$1` .. `$9` are always set by every regexps that succeeds. When a given regexp has no corresponding capture, they are set to `undef`ined. No attention is payed to whether `$1` .. `$9` are used anywhere.	[reply] [d/l] [select]
Re^3: Using a regex to catch interpreters by eastcoastcoder (Sexton) on Jul 06, 2005 at 17:34 UTC
Speaking of test, here's some code I wrote to do basic testing use Test::Simple tests => 7; sub cpi { # the converter goes here } $a = '>> no snake here >>> or here I agree'; $b = '>> and neither should this >>> be understood snakily >>> because it is a nested quote >> yes I know it is not really named after a snake '; $c = '>> >>>no snake >>>no snake '; $d= '>>> still no snake seen >> right! '; $snakea = '>>> i = j '; $snakeb = '>>> i = j + 1 >>> q is good code'; $snakec = '>> does this work right? lets see: >>> i = test(func) '; ok (cpi($a) eq $a); ok (cpi($b) eq $b); ok (cpi($c) eq $c); ok (cpi($d) eq $d); ok (cpi($snakea) eq 'PYTHON_PROMPT_PROTECTED i = j '); ok (cpi($snakeb) eq 'PYTHON_PROMPT_PROTECTED i = j + 1 PYTHON_PROMPT_PROTECTED q is good code'); ok(cpi($snakec) eq '>> does this work right? lets see: PYTHON_PROMPT_PROTECTED i = test(func) '); [download]	[reply] [d/l]
Re^2: Using a regex to catch interpreters by eastcoastcoder (Sexton) on Jul 07, 2005 at 17:30 UTC
Please see my test cases below - your code failed 6 out of 7!	[reply]