denap has asked for the wisdom of the Perl Monks concerning the following question:

community... a fellow monk was kind enough to offer me a solution yesterday on a problem I have trying to remap a large quantity of arg lists. ie:
# my %translate = ( # funca => [ qr/funca\( ([^,]+) , ([^,]+) ,([^,]+) ,([^)]+) \)/x, # '"func($1,$2,XXX,YYY,$4)"' ], # funcb => [ qr/funcb\( ([^,]+) , ([^,]+) ,([^,]+) ,([^)]+) \)/x, # '"func($1,$2,,,$4)"' ], #);
I've since discovered that the arg lists in question *might* themselves contain function calls. i.e.
funca(1,2,funcH(A,B,C),3)
I cannot figure out how to break the regex so that it gathers  funcH(A,B,C) as a single arg. Unfortunately, these embedded functions might occur in any arg position. thoughts? thanks.

Replies are listed 'Best First'.
Re: regex assistance for parsing arg list
by ihb (Deacon) on Feb 04, 2003 at 19:14 UTC

    Having a function call in the argument list is not your only problem. An argument can be any arbitrary expression, and many of them include "own" commas. The simplest is probably just a string; funca("this is just one argument, I think"). But then there are many other common argument expressions, like anonymous arrays or hashes, method calls, map/grep that doesn't use a "regular" function syntax and has a block that may and often does use a comma (in any form or shape), which leads me into mentioning the => operator. Etc etc. It's up to you to decide how detailed you want to be. You'll quite likely end up with a little parser, far beyond simple one-line regexes. For that I recommend using Parse::DecDescent, or at least looking into it to see if it fits you. Parse::RecDescent provides means to extract code blocks, several quote constructs, including qr//, qq!!, etc.

    If you decide to stay with regexes or a hand-rolled parser, you should look into recursive regexes, see the perlre manpage.

    Another approach would be to first try to find something that looks like a function call, by scanning for function names and then just grabbing everything inside the parenthesis. You'd do this by "balanced matching", i.e. you match all of "(1,(2,3),4)". Again, look in perlre for an example of this. But you need to specifically handle strings, since they might contain any kind of brackets. After than you can try to parse the argument elements. If you fail, handle the specifically. Then you'll at least be aware of that there are "strange" function calls.

    This method is far from perfect, but perhaps it's good enough for you.

    Hope I've helped,
    ihb
      yes, you've helped, thanks for that. I'll look at the references and see where I get. fortunately for me I *know* that the most complex embedded func will be of the form func(1,2,func(A,B),3). No embedded strings with commas, brackets etc.
Re: regex assistance for parsing arg list
by Enlil (Parson) on Feb 04, 2003 at 19:26 UTC
    As with most regular expressions, the following is assuming a whole lot. For example that there are not more embedded functions in the functions that occur inside the first function. Also that there are only 4 arguments (parameters?) passed to the first function. Lastly that there are no spaces between the argument and the comma. You might have to modify this to get the spacing (adding \s in places and whatnot. Anyhow here is some code that might work for your purpose:
    use strict; use warnings; while ( <DATA> ) { my $string = $_; if ( $string =~ /func[A-Za-z]\( (func[A-Za-z]\([^\)]+\)|[^,]+) #looks for a function a +nd if not found then ,(func[A-Za-z]\([^\)]+\)|[^,]+) #grab everything to the +comma ,(func[A-Za-z]\([^\)]+\)|[^,]+) ,(func[A-Za-z]\([^\)]+\)|[^,]+)\)/x) { print "FOR STRING: $string", '$1 = ',$1,$/, '$2 = ',$2,$/, '$3 = ',$3,$/, '$4 = ',$4,$/,$/; } } __DATA__ funca(1,2,3,4) funca(funcH(A,B,C),2,3,4) funca(1,funcH(A,B,C),3,4) funca(1,2,funcH(A,B,C),4) funca(1,2,3,funcH(A,B,C))
    Updated:Removed some of the ugliness from the formatting in code.

    -enlil

      But what happens when you have a function call as funcH's argument? You've just nestled the code, but the the exception that the outer can take privitive function calls as arguments. You could easily turn this pattern into a recursive dito though. Check out the (??{}) assertion in perlre and it should be pretty trivial.

      Cheers,
      ihb