replacing code with regex

raflach has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I have a text file with the following in it:

\t\tmy($page)=&create_page(isitvalid("templatefile.html"),\%replace,\%
+lists,@unhide);
[download]

NOTE: the \t's represent actual tabs in the file

I also have a script with the following regex

      s/^\W* # we may have some white space at start of line
      my\W*? # all variable will use my
      \(? # variable may or may not be list
          \W*? # some people space out the parens and some don't
          \$page # assuming for now that page is always the variable
          \W*? # more possible space around parenthesis
      \)? # variable may or may not be a list
      \W*? # whitespce is optional
      = # we have to assign the variable
      \W*? # more optional whitespace
      \&? # function may or may not be explicitly contexted
      create_page\W*? # the name of the funciton followed by optional 
+space
      \( # parenthesis begins the parameterlist
          \W*? # we might have whitespace before first parameter
          ([^,]*?) # BROKEN: this should collect everything up to the 
+next comma in variable number 1
          \W*? # this shouldn't be necessary since whitespace should h
+ave been slurped on previous line but this shouldn't hurt either
      , # Got to have the separator
          \W*? # more optional whitespace
          ([^,]*?) # NOT BROKEN: this grabs everything up to the secon
+d comma (of course we don't have parens and dblquotes in second param
          \W*? # this shouldn't be necessary since whitespace should h
+ave been slurped on previous line
      , # Got to have the separator
          \W*? # more optional whitespace
          ([^,]*?) # NOT BROKEN: third one gets gotten fine as well
          \W*? # again the unnecessary whitespace collector
      , # Again the seperator
          \W*? # Again the optional whitespace
          ([^)]*?) # NOT BROKEN: heres our final collection point
          \W*? # Again with the unnecessary whitespace
      \) # our parameter list has come to a close
      \W*? # whitespace might seperate from closing punctionation thou
+gh it seems unlikely
      ; # closing punctuation ends the statement the replace doesn't m
+atter for the question  
      /my \$renderer = new HRsmart::Lightning::Render ( template => $1
+,\n\t\tloops => $3,\n\t\treplace => $2,\n\t\tfinals => $4 );\n\$rende
+rer->customize_header\( \$company_id \);\nmy \$page = \$renderer->ren
+der;\n\$page =~ s\/~%.{0,20}%~\/\/g;\n/gx;
[download]

As you can see I'm trying to replace the old functional way of doing things with a new semi-oo way.

The problem is, instead of getting 'isitvalid("templatefile.html")' for $1 which is what I was expecting, it is cutting off before the second double quote, so I get 'isitvalid("templatefile.html'

Anyone can point me in the right direction?

BTW there's no chance of spaces between the doublequotes, so that's not an issue.

UPDATED: Under-commented code replaced with Over-commented code

Comment on replacing code with regex Select or Download Code

Replies are listed 'Best First'.
Re: replacing code with regex by reasonablekeith (Deacon) on May 11, 2005 at 16:08 UTC
I would suggest you make use of the x modifier, break your regex out on multiple lines, and put some comments in. If you've not figured it out for yourself by then (and I bet you will have) you'll get a much better response from people here. HTH, Rob --- my name's not Keith, and I'm not reasonable.	[reply]
Re^2: replacing code with regex by raflach (Pilgrim) on May 11, 2005 at 17:38 UTC
You overestimate me! Highly commented code still leaves me stumped. Anyone want to jump in here?	[reply]
Re^3: replacing code with regex by reasonablekeith (Deacon) on May 11, 2005 at 18:34 UTC
As you followed my advice I could hardly not try and help! Anyway changing `(\W?) # this shouldn't be necessary since whitespace should have be +en slurped on previous line but this shouldn't hurt either` [download] ... to ... `(\s?) # this shouldn't be necessary since whitespace should + have been slurped on previous line but this shouldn't hurt either` [download] fixes it. Horrah. $1 printed 'isitvalid("templatefile.html")' when I tested it Anyway \s is a more standard way of matching whitespace, so I guess you could get milage out of changing all your instances of \W?, which are probably matching more than you expect. Case in point. `my $test = '{}[]Ł$%'; my $match = ($test =~ m/(\W)/)[0]; print $match; __OUTPUT__ {}[]Ł$%` [download] --- my name's not Keith, and I'm not reasonable.	[reply] [d/l] [select]
Re: replacing code with regex by Joost (Canon) on May 11, 2005 at 16:09 UTC
Is there a particular reason you don't just write the create_page and isitvalid subroutines that do all the OO stuff and then eval() the textfile? Seems a lot more robust than this solution - well, except if you can't trust the input, ofcourse. update: never mind, I misread your question. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re: replacing code with regex by Animator (Hermit) on May 11, 2005 at 18:43 UTC
Based on my first look at it I can already give these hints: (I'm still looking at it though) (This first point follows on reasonablekeith's point:) You use \W? to match optional whitespace, but \W represents a non-word charachter, and \s represents whitespace... also do you really want to make it non-greedy? I would suggest replacing \W? with \s, and also I would not give it the comment optional whitespace... (but that's just me) What I supsect is wrong is this part of the regex: `\W?([^,]?)\W?,` You are making the `[^,]` optional and non greedy... Maybe \W? eats all the non-comma symbols? Something like: `\s([^,]+),\s` might be better. Verifying by capturing \W? shows that this is the problem. (as in $2 holds '")') And, if you post a message with a regex then you might want to give both the input and the output you expect, you described it, but giving the actual string makes it easier for everyone to verify... Update: I did some further investigating, and here is a regex that works... (or atleast as far as I can tell): `s/^ \s* my \s* $?\s* \$page # page is the variable \s$? \s = \s* \&? # function may or may not be explicitly contexted create_page # function name \s* $\s* # Begin param ([^,]+?), # 1st param \s* ([^,]+?), # 2nd param \s* ([^,]+?), # 3th param \s* ([^,]+?) # 4th param \s*$ # End param ; /your_replace_string/x;` [download] Also note that the /g in your regex is useless, since you have the ^... which will only match at the start of the string (unless ofcourse when you have /m too, but that's not the case in your example)	[reply] [d/l] [select]


We don't bite newbies here... much
	PerlMonks