Reading Images from you S3 Bucket

It is common for people to want to read images from their S3 account securely. Since Blitline can talk to S3 directly, you can set it up so Blitline reads images directly OUT of you bucket (without the need for a signed url). To accomplish this, you need to do 2 things;

1. Set permission on your S3 bucket to allow Blitline to access the images. If you have already added Blitline permissions to write to your bucket, the change is pretty simple. You would follow the instructions here , and then add the following:

Notice the red “GetObject” action. That allows Blitline to read from your bucket as well as “Put” objects there.

2. For your Blitline job, you will need to change your “src” tag from being a simple url to being a hash containing the “bucket”, “key”( and “location” if you are in an area outside the default eastern US). For example:

“src”:{
   ”name”:”s3”,
   ”bucket:”blitbucket”,
   ”key”:”temp_folder/image.jpg”,
   ”location” : “s3-ap-southeast-1”
 }

We’re including a graphic here to try to illustrate how functions can be embedded in other functions. This allows chaining of functionality to create multiple image results from a single job. For example, you can also improve performance by chaining resize functions so that the resize isn’t happening on the original (potentially LARGE) image, but instead on the already resized smaller image.

Discontinuing OCR Functionality

We are no longer going to support OCR functionality. We have made this decision due to lack of interest and high maintenance/support costs. Maintaining and supporting the OCR software was expensive and we had very few users using it, in fact we had no users actively using it in a product. We had hoped it would fill a niche but unfortunately it never gained traction and we were spending more time trying to support it then people were using it.

We apologize for any inconvenience this may cause you. We have been trying to remove it from our documentation, but bits and pieces may still exist. We are working to remove these.

Don’t DDOS yourself

Be careful when developing and make sure that when you are doing postbacks that you dont overload your own server. If you submit 1000 jobs to Blitline, you are going to get 1000 back in a hurry. If you just have a dev server sitting there and it can’t handle 1000 jobs coming back quickly, it’s going to be a problem.

When calculating the time, we also charge for the time it takes for your postback. If your postback takes 10-15 seconds, that is going to be included in the cost of the job. If your server can’t handle the responses coming back make sure they either throw a 500 or submit them at a slower pace that your server can handle.

It’s quite common that we see someone commit 5000 jobs during dev, and then each postback takes 10-20 seconds. Thats 5000 * 20 seconds and 100,000 seconds = 27 hours ( or $22). Thats pretty expensive for 5000 test photos.

If this happens to you accidentally, notify us at support@blitline.com and we will correct you bill. But please be aware of this when developing. :)

How to turn your S3 Bucket into a CDN via CloudFront

Having Amazon serve your image through their awesome CDN is dead simple. It’s also about $0.12/GB per month… cheap.

Log into your Amazon account via the Amazon Console . Click on “CloudFront”

image

Once you have clicked on CloudFront, fint the “Create Distribution” link, which will either be at the top, or if you are a first time user, might be in the middle of the console page as well. Either way, click “Create Distribution”.

image

You can use the default “Download” option on this page, and simple click the “Continue” button.

image

On the Settings page, click in the Origin Domain Name box, and you will get a dropdown of all your S3 buckets. Choose your bucket you want to be served from the CDN. Leave the other settings alone and click the “Create Distribution” button.

imageimage

Viola! You now have a CDN that uses the images from the S3 bucket you chose on the settings page. The CDN will have some unfriendly url like uweoqruqyr.cloudfront.com (which will be the new base url for your images, like https://uweoqruqyr.cloudfront.com/mykey/foo.jpg)

You can go to your DNS provider and just add a CNAME that can map this to some friendly url like “cdn.myawesomeapp.com”)

image

“Fear of Success” Pricing

We are glad to announce an updated pricing scheme that helps our clients address their “Fear of Success” scenarios.

A question/concern we hear is that Blitline might get too expensive if the application you are working on goes viral. It is totally reasonable to fear exponential growth and the costs of keeping up with that growth. Imagine a site that goes from 10,000 image jobs a month to 10M+ images jobs per month, and the cost of Blitline is linear, so the costs of handling that growth could rapidly get out of hand… (Thus the fear that success will make Blitline too expensive)

To address this, we are offering up the following assurances to make sure that exploding success doesn’t result in exploding cost for Blitline.

Here is our new extended pricing table:

What this means, is that if your bill at our regular pricing ($7/month + $0.79/hr) goes over $100, you will get a 20% discount. Thus, if your bill is $120, we will subtract 20% and you will only be charged 120 * (.80) or $96.

Hopefully, having these measures in place will help many of you feel more comfortable about Blitline as both a short term and long term solution.

Canonical ID & S3 Permissions

We are adding a new field to the Blitline account page that allows users to set their AWS Canonical ID. This is an OPTIONAL optimization that allows Blitline to set public permissions AND item-ownership when we push the image to your S3 bucket.

Our Shortcoming:

Current default behavior is that when Blitline pushes an image to your bucket, we set the headers to include the canned ACL (e.g. ‘x-amz-acl’=’public-read’}. This is to make the item instantly readable by the public. This header can be overridden by including “headers” in your s3_destination tag.

There is a side-effect to this default behavior. The side effect is that the bucket owner (YOU) do not have full control over that item that Blitline pushed to your bucket. This is not an issue in most cases because the bucket permissions can be used to show/conceal items, and in most cases the desired behavior is for those items to just sit there publicly in your bucket. What this does mean, though, is that you do not have the ability to change ACLs on the item, or to rename the item. This is not ideal, and we recognize that. Our hands are a bit tied by what Amazon allows us to do.

We have always given the end user the ability to set their own headers on the object and thus could set their own headers to give themselves permission but this is not obvious, nor intuitive. We would like to just be able to do it automatically without a bunch of overhead calls and permissions from Amazon.

Our Solution:

After careful consideration, we have decided that the cleanest and most straightforward solution is to ask for the client AWS canonical ID as an account setting. This means that if this exists we can set the permission headers for you and reduce the size and complexity of the submitted json needed to accomplish this. (Otherwise it would have to be submitted with every S3_destination).

What this Means:

If you give us your Canonical ID, which can be found on your AWS Security Credentials page at the bottom.

We will automatically set the permissions on images we push to your S3 Bucket to be fully controlled and owned by you. If this does not exist, we will continue to default the canned permissions to be “public-read” as it has been before.

WE RECOMMEND EVERYONE WHO PUSHES IMAGES TO S3 GO SET THIS VALUE.

Debugging Postbacks

One of the things we hear is how it’s difficult to debug our callbacks. The difficulty is that we develop on our private localhost machines, but Blitline wan’t to issue a postback to a public url. We can’t get the data that Blitline is posting back unless we deploy and view logs (or tail logs if we’re using Heroku or the like).

Here’s a couple tools we recommend. They do what’s called reverse tunneling and they allow you machine to be accessible publicly when running, so you localhost can recieve callbacks from Blitline.com

First there is https://showoff.io/, a $5 month service which gives you unlimited access

There is also http://progrium.com/localtunnel/ which is a free service but without any clear guarantee about service, also doesn’t support ssl

Both these services provide a cool and powerful tool for testing webhooks and callbacks. Check them out.

Things to know about S3

After sending millions of images to s3, we have learned some things about the working of S3 here at Blitline and would like to share some of this information in the hopes that it makes your experience with Blitline (and subsequently S3) better.

For the most part, during image processing, we are a proxy to your S3 bucket. As such, we are stuck with the limitations that S3 has. We have implemented some things to help mitigate some issues, but nonetheless they are still issues. First and foremost lets talk about S3’s rate limiting.

S3 Rate Limiting:

We (blitline.com) do not rate limit any customers. We take in all we can, and if we can’t process it fast enough, we queue it, then process it. We have monitoring to identify when the system becomes overly burdened, and could rate impose restrictions on the fly, but we have not needed to do that. As such, we have some customers that will submit thousands of jobs to be processed over a short period of time. We process those jobs and push the images to S3. We have found that if we are pushing > 10rps to an S3 bucket for an extended period of time, that we start to get back errors from S3. They are usually in the form of a redirect(301 Moved Permanently) or a more generic (Errno::ECONNRESET: Connection reset by peer) error. What we do in response, is throttle our output,and try again in a few moments. This is generally referred to as “backing off”.

In the cases where backing off still isn’t enough, and we try again multiple times, and still get an error back, we eventually call it a day and return an error in the job. The error will often have the message of something like…

s3-ap-southeast-1.amazonaws.com temporarily unavailable:

We will work to come up with an elegant solution here at Blitline, but there are some things you can do to mitigate this on your end:

  • First: If you think you are going to be pushing more that 10 images/sec for long periods of time (more than say 5 minutes) you should probably think about sorting your images into different buckets.
  • Second: Its better to have a bunch of little bursts of images than a huge long burst of images.

The other thing about S3 is:

Latency:

Just because something has been successfully pushed to S3 doesn’t mean it is immediately accessible. There is an occasional latency between the push and being able to look at an image. In fact, during some problems with Amazons S3, items pushed to S3 didn’t show up until minutes later. This is important to know if you are doing sequential processing, assuming that one image will be available simply because the job has posted back from Blitline. To help mitigate this problem, we have added a “wait_for_s3” options to the top level json (as a peer to “postback_url”). This will cause the job to poll the url of the ‘just uploaded’ image. If the image appears at that location, it will execute the postback immediately. Otherwise, it will wait 2 seconds and try again, up to a maximum of 10 seconds, at which time it will issue the postback regardless.

July 23 Outage

As most of you may know if you are reading this there was an outage for about 5 hours between ~1:00am - 6:00am PST. All connectivity to blitline.com was lost when our DNS provider Zerigo came under DDOS attack. Since they were our DNS provider, we lost email routing to us, which would negated the fact that they were also the primary monitoring service as well. Thanks to some astute users who contacted us via Twitter we were able to mobilize and start working on the problem. We were informationally dark from Zerigo for quite some time, so we made the decision to move DNS providers. There was some overhead time lost for doing this, but we decided to move onto Amazon’s Route53. By about 6:00am, we saw recovery to blitline.com, and it seems as the DNS caches flushed we saw more and more connectivity restored. We believe at this point we are fully restored.

These events are unpredictable but inevitable. Thanks to the team here for motivating to get this resolved and our apologies to all of our customers. While we still stand behind the decision to use Zerigo in the first place, because we believe they are a competent, and up till now, very dependable service.

Many other websites were down (and still are down) due to this problem and our best wishes go out to them.

Our greatest apologies go out to you, though, our end users. All of our work and effort goes into trying to make service better for you, and issues like this cause us a lot of pain. We just want you to know that our #1 priority has been and always will be to provide you with the most robust, solid, and affordable service.