in reply to Overhead of Inline::Python?
OP here.
We're using the Google Cloud Platform. They have tons of docs on uploading files, but the basic variants are all explained here, with variants under each tab: gsutil is their commandline tool, Code Samples shows APIs for various languages (Python, Ruby, Go, Java, Node.js, etc.), and REST APIs shows that.
We haven't yet run detailed profiling. Our original system simply copied files to a remote filesystem, and thus could use regular filesystem tools. It was vastly faster, even though the raw network speed isn't appreciably different.
I appreciate that some kind of system to batch the files would be better. The problem is that the underlying code is extremely complicated: merely selecting the files for uploading takes hours, and every upload requires a number of database updates (which ideally should take place at the same time as the upload, i.e. the location recorded in the database should match the location of the file, so selecting the files all together and then uploading them in one batch will be a problem).
Using the REST APIs would require a lot of manual housekeeping (checking for successful completion, retrying interrupted uploads, etc.), that is handled automatically by gsutil and the language APIs, which is why we felt that it was an advantage to use one of these tools. Perhaps we should just do this, but I'd hoped that some way of using one of these tools could still be more stable than rolling it ourselves with the REST API.
|
|---|