in reply to Re: Slowness when inserting into pre-extended array
in thread Slowness when inserting into pre-extended array

"It makes no sense to read the entire file into an array .... and almost certainly speed the whole thing up considerably." Oh, certainly. I had temporarily rewritten it this way to see if the slowness was due to readng the file or processing it. Unfortunately the sparse representation I'm using is necessary for the way the data structure will be applied; each record has about 30 features, but which features each record has varies among a set of about 150,000, making any representation which stores the values of all features for all records impractical. I also unfortunately must have all the data in memory at once in order to calculate various infromation statistics for each attribute and to do extensive swapping of records (I'm building a decision-tree learner). :-(
  • Comment on Re: Re: Slowness when inserting into pre-extended array

Replies are listed 'Best First'.
Re: Re: Re: Slowness when inserting into pre-extended array
by tilly (Archbishop) on Jul 20, 2003 at 00:06 UTC
    Have you considered trying to use data structures which are backed by files, like BerkeleyDB or in a relational database like MySQL? Of course they will have to go to disk for data sometimes, but they should be able to make intelligent caching choices so that most of the time you can access data without going to disk...

    Yes, it is slower than having it all in RAM if you have the memory for that, but it is likely to make better file access choices than will be made by your OS paging randomly accessed RAM back and forth into virtual memory...

      hm - that could be useful. I'll look into that. Thanks!