We invest a lot in our behind-the-scenes infrastructure. We just completed a fairly major set of projects to improves our resiliency and fault tolerance. In addition, we upgraded the versions of the number of the components we use in our infrastructure.
The two coolest things we did (for techies) is improved our SQS Queueing service implementation.We have added a backup SQS Queue in the AWS Oregon Region in addition to the Eastern Region Queue we have been using. The Eastern Queue had a temporary outage several weeks ago and we felt it was better to have a distributed queue available in addition to the other redundancies we had in place that handled that outage.
The other cool this we did was to upgrade our Nagios monitoring and auto-repair infrastructure. This is important to catch problems in our infrastructure and automatically repair them – for example if there are issues with the load balancing systems.
This is all stuff that our customers really do not care about – except it is the reason why we have such high availability and fast performance.