Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^4: Too Many IDs

by Tanktalus (Canon)
on Jan 09, 2020 at 20:42 UTC ( [id://11111260]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Too Many IDs
in thread Too Many IDs

Have you given thought that maybe you should just do the work on the db server via stored procedure or some such, and only return back the data needed to render user output (regardless of format)? Since you've claimed both of these keys to be unique, you presumably already have enforced that in the db and that should mean they're indexed anyway, and then you don't really need to worry about how many lookup keys you have.

And, out of left field, comes Tanktalus...

Replies are listed 'Best First'.
Re^5: Too Many IDs
by talexb (Chancellor) on Jan 10, 2020 at 18:51 UTC

    Yeah, this. Databases are actually amazingly fast at doing this type of operations.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re^5: Too Many IDs
by Anonymous Monk on Jan 13, 2020 at 01:34 UTC

    Sadly, not possible

    The task is to update the database based on fresh results from a few 1000 3rd party API calls

    I suppose It's not impossible to do that as a stored procedure, but I couldn't put that load on the DB server anyway

    Also, my SQL isn't as strong as my Perl ;-)

      Fair enough. What you're trying to do may best be done off the db server. I only point it out because we had a similar problem about a year ago in a previous job where another team was trying to do a lot of data manipulation in C#: pulling the data out of postgres, performing analysis, and then pushing the results back to postgres. It was taking 25+ hours to handle 24 hours of data, and the team working on it just couldn't optimise it sufficiently. When I and my teammate were tapped to look at the problem, the first thing we each said was "stored procedure". Once we had written that, it dropped to about 1 hour to handle 24 hours of data, so I thought I should at least propose it on this thread.

      It used less CPU and memory on the DB server, too, because it didn't have to serialise all that data. You might be surprised at how much less load it actually takes on the server. If you were to make your API calls, throw them into a temp table, and then use a stored procedure to injest it to the correct tables, it may actually do better than you expect. Or maybe not - there's definitely not enough information here to tell, but sometimes it takes a total algorithmic change to effect the performance gains you need, when simple tweaks are insufficient.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11111260]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-24 11:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found