Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^3: Too Many IDs

by The_Dj (Sexton)
on Jan 09, 2020 at 13:53 UTC ( #11111247=note: print w/replies, xml ) Need Help??

in reply to Re^2: Too Many IDs
in thread Too Many IDs


I will traverse all the data

The data lives on another server

Pulling it all at once is just faster*

I acutlly have both selectall_hashref and selectall_arrayref($sql, {Slice => {}}) in my code, each where it is best (I believe)

* I should probably benchmark that

Replies are listed 'Best First'.
Re^4: Too Many IDs
by Tanktalus (Canon) on Jan 09, 2020 at 20:42 UTC

    Have you given thought that maybe you should just do the work on the db server via stored procedure or some such, and only return back the data needed to render user output (regardless of format)? Since you've claimed both of these keys to be unique, you presumably already have enforced that in the db and that should mean they're indexed anyway, and then you don't really need to worry about how many lookup keys you have.

    And, out of left field, comes Tanktalus...

      Yeah, this. Databases are actually amazingly fast at doing this type of operations.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Sadly, not possible

      The task is to update the database based on fresh results from a few 1000 3rd party API calls

      I suppose It's not impossible to do that as a stored procedure, but I couldn't put that load on the DB server anyway

      Also, my SQL isn't as strong as my Perl ;-)

        Fair enough. What you're trying to do may best be done off the db server. I only point it out because we had a similar problem about a year ago in a previous job where another team was trying to do a lot of data manipulation in C#: pulling the data out of postgres, performing analysis, and then pushing the results back to postgres. It was taking 25+ hours to handle 24 hours of data, and the team working on it just couldn't optimise it sufficiently. When I and my teammate were tapped to look at the problem, the first thing we each said was "stored procedure". Once we had written that, it dropped to about 1 hour to handle 24 hours of data, so I thought I should at least propose it on this thread.

        It used less CPU and memory on the DB server, too, because it didn't have to serialise all that data. You might be surprised at how much less load it actually takes on the server. If you were to make your API calls, throw them into a temp table, and then use a stored procedure to injest it to the correct tables, it may actually do better than you expect. Or maybe not - there's definitely not enough information here to tell, but sometimes it takes a total algorithmic change to effect the performance gains you need, when simple tweaks are insufficient.

Re^4: Too Many IDs
by LanX (Sage) on Jan 09, 2020 at 16:25 UTC
    So "best" means fastest?

    We don't know how big your table is and if memory is an issue.

    In programming you can almost always trade memory with time!

    ... like pulling n big chunks of the table in sliding windows.

    For instance I suppose (see your footnote) two SQL queries for ID and SN are faster but your map solution occupies less memory ( the second level hash refs are reused scalars)

    So .... It really depends...

    > * I should probably benchmark that

    That's a bingo! ;)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11111247]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2023-06-07 13:12 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (29 votes). Check out past polls.