October 15, 2020

Image Processing with Amazon S3 and Blitline IPaaS

After sending millions of images to Amazon S3, we have learned some things about the working of S3 here at Blitline and would like to share some of this information in the hopes that it makes your cloud-based image processing experience with Blitline (and subsequently S3) better.

For the most part, during image processing, we are a proxy to your S3 bucket. As such, we are stuck with the limitations that S3 has. We have implemented some things to help mitigate some issues, but nonetheless they are still issues. First and foremost lets talk about S3's rate limiting.

S3 Rate Limiting:

We (blitline.com) do not rate limit any customers. We take in all we can, and if we can't process it fast enough, we queue it, then process it. We have monitoring to identify when the system becomes overly burdened, and could rate impose restrictions on the fly, but we have not needed to do that. As such, we have some customers that will submit thousands of jobs to be processed over a short period of time. We process those jobs and push the images to S3. We have found that if we are pushing > 10rps to an S3 bucket for an extended period of time, that we start to get back errors from S3. They are usually in the form of a redirect(301 Moved Permanently) or a more generic (Errno::ECONNRESET: Connection reset by peer) error. What we do in response, is throttle our output,and try again in a few moments. This is generally referred to as "backing off".

In the cases where backing off still isn't enough, and we try again multiple times, and still get an error back, we eventually call it a day and return an error in the job. The error will often have the message of something like...

s3-ap-southeast-1.amazonaws.com temporarily unavailable:

We will work to come up with an elegant solution here at Blitline, but there are some things you can do to mitigate this on your end:

First: If you think you are going to be pushing more that 10 images/sec for long periods of time (more than say 5 minutes) you should probably think about sorting your images into different buckets.
Second: Its better to have a bunch of little bursts of images than a huge long burst of images.

The other thing about S3 is:

Latency:

Just because something has been successfully pushed to S3 doesn't mean it is immediately accessible. There is an occasional latency between the push and being able to look at an image. In fact, during some problems with Amazons S3, items pushed to S3 didn't show up until minutes later. This is important to know if you are doing sequential processing, assuming that one image will be available simply because the job has posted back from Blitline. To help mitigate this problem, we have added a "wait_for_s3" options to the top level json (as a peer to "postback_url"). This will cause the job to poll the url of the 'just uploaded' image. If the image appears at that location, it will execute the postback immediately. Otherwise, it will wait 2 seconds and try again, up to a maximum of 10 seconds, at which time it will issue the postback regardless.