Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^5: What perl operations will consume C stack space?

by hv (Prior)
on Feb 27, 2006 at 15:33 UTC ( [id://533039]=note: print w/replies, xml ) Need Help??


in reply to Re^4: What perl operations will consume C stack space?
in thread What perl operations will consume C stack space?

Corion answers your second point; on the first point, refactoring to /(ab+|a)+/ reduces stack usage but does not eliminate it: for me, "a" x $n cores with /(ab*)+/ at n=10080 and with /(ab+|a)+/ at n=20157, so it appears to save exactly half of the stack usage.

As TimToady mentioned, anything that quantifies "a compound submatch of varying length" will trigger it. (In fact even "compound" does not seem required, as /(a+?)+/ attests.)

Hugo

Replies are listed 'Best First'.
Re^6: What perl operations will consume C stack space?
by BrowserUk (Patriarch) on Feb 27, 2006 at 17:22 UTC

    On my system using 5.8.6, /(ab*){$n}/ cores with $n == 21166, whereas /(ab|a){$n}/ completes sucessfully for all values on $n upto the repetition limit of 32766. If I drop the stack reservation to 8 MB (similar to the default on Linux?), then I get a similar breakpoint of 10582.

    That seems to indicate that (OMS), the regex engine requires 792 bytes of stack for each repetition. That seems a lot of state to preserve on the stack, but I know nothing about how the regex engine is implemented, so it's probably not.

    It does make me wonder whether repetition counts, at least in these fairly simple cases, couldn't be fulfilled with by a tail recursive routine to alleviate the stack growth?

    If not, isn't there some scope for putting a check of the form die 'Not enough stack' if reps > stacksize / 792?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      The intention is to remove the C-stack recursion altogether and use perl's dynamic stacks instead. But that involves quite major surgery to the regexp engine, and I don't know when it is likely to happen.

      It does make me wonder whether repetition counts, at least in these fairly simple cases, couldn't be fulfilled with by a tail recursive routine to alleviate the stack growth?

      I don't know how you'd implement it to be tail recursive, but feel free to have a go. I suspect you'd need a quite different matching algorithm, in which case you'd probably end up needing rather more surgery than the current plan.

      If not, isn't there some scope for putting a check of the form die 'Not enough stack' if reps > stacksize / 792?

      As far as I know the stacksize isn't available within the perl process at the moment (nor more relevantly the current free stack space), and the cost per iteration may go up or down (depending on the build). If those numbers can be made available then yes, it would be a good idea to put a check in, probably by treating REG_INFTY as min(32766, freestack/stackcost).

      Hugo

        Okay, i was just thinking out loud. You know I won't offering any patches to the regex engine any time soon :)

        It's quite easy to find out the base address and extent of the stack segment on x86, which combined with the current value of SP gets part of the equations. That's probably not true on all platforms though.

        For simple, self recursive routines I've calculated the stack requirement of the routine at runtime by subtracting the address of the first auto from the last + a fudge factor for parameters & return address, but it means putting all autos at the top of the routine. Trying to apply that technique to mutually recursive functions get tricky, and for functions the size and complexity of S_regmatch() it's non-starter.

        Everytime I look at subs that are basically one huge switch statement, I wonder of they couldn't be refactored into one of those GCC computed gotos that Parrot uses in one of it's dispatch loop varients.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://533039]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-04-19 10:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found