Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a small monster of a Perl script that runs too slow, but then runs fast. I am familiar with programs running fast, and then, once they run low on memory, slow, but I can't imagine what would drive it in the other direction. I am wondering if there is some explanation for this behavior. In short, the program utilizes Inline::C to call a C library to read in a large data structure (1 GB), and then to query the data structure for each line of an input file. Whether the input is only 20 lines or 1 million lines, the progress is very slow for about the first 20%, then greatly accelerated for the rest of the file. I can follow the memory usage with top and there doesn't seem to be a memory limitation (the system has 4 GB). Is there any possible explanation for this sudden acceleration?
  • Comment on My program drags and then runs fast. Why?

Replies are listed 'Best First'.
Re: My program drags and then runs fast. Why?
by GrandFather (Saint) on Oct 13, 2008 at 00:48 UTC

    Many. You don't give us much to go on. Caching at some level is most likely - the first 20% builds the cache and the remainder gains the benefit.

    It should be fairly trivial to slow the fast bit down by checking loop iteration time and putting sleeps in as required.


    Perl reduces RSI - it saves typing
Re: My program drags and then runs fast. Why?
by Anonymous Monk on Oct 13, 2008 at 00:42 UTC
Re: My program drags and then runs fast. Why?
by gone2015 (Deacon) on Oct 13, 2008 at 11:14 UTC

    As others have said, you're not giving much away...

    ...however one can speculate.

    • suppose the C stuff reads in the entire 1GB into some simple searchable structure whose leaves are lists of items. A common way of improving performance is to move items to the front of the list when found. Subsequent queries for the same thing will run faster.
    • suppose the C stuff reads in the entire 1GB, but doesn't do much organisation of the data -- perhaps to minimise latency. Each query could do some opportunistic organisation, speeding up future queries.
    • or, as above, but runs a background thread to organise the data.
    • suppose the C stuff does not read in the entire 1GB, but reads parts of it as queries require, but keeps what it's read.
    • suppose the C stuff simply maps the 1GB into VM, and lets the OS load stuff on demand,
    • or, as above, but runs a background thread to read and organise the data.
    these are examples of forms of cacheing as suggested by GrandFather. It's also possible that:
    • the processing of a query triggers a lot of memory allocation/deallocation/garbage collection activity, but this settles down after a few queries.
    • although you have plenty of real memory, it still takes a while to build up to the full working set.
    • the disc is struggling to read the data, but once its in memory, you're fine.
    so much for speculation.

    Mind you, you say that things are slow to start with, no matter whether the input you're querying with is 20 lines or 1 million lines... and no matter how many lines, it's very slow for the first 20% of them ? Everything I can think of I would expect to speed up either as more queries are made, or after a period related to the size of the data. Neither of these are proportional to the number of lines queries.

    You're sure it's not data dependent ?

    Long story, made short: yes one can imagine ways that a program along the lines you describe can speed up over time -- but more information is required to diagnose why in this case !

      Thanks everyone for your suggestions. (I am the poster—thought I was logged in).

      I know that the data structure is static once it's loaded since I wrote the C library, and that the Perl script is parsing only one line of data at a time. That's why this is really confusing me.

Re: My program drags and then runs fast. Why?
by kyle (Abbot) on Oct 13, 2008 at 15:25 UTC

    I don't think it directly relates to your problem, but it reminds me of Strings and numbers: losing memory and mind. In that case, my large data structure started out as strings and was converted to numbers on the fly as I used it. That caused a lot more memory usage after the point where I thought it would be static. Maybe your data structure is experiencing a similar "one time cost" as it's accessed.

Re: My program drags and then runs fast. Why?
by apomatix (Novice) on Oct 14, 2008 at 18:31 UTC
    Thanks for everyone's suggestions. At least I have some ideas now of why my loops might speed up. Now I think that in this particular situation the slow-down is somehow in the C library, and not Perl's or Inline's fault. I am now re-implementing it in pure Perl---I think this will be faster for me than figure out what's going wrong in C.