We have been doing a number of infrastructure updates over the past few weeks to improve availability, increase performance and assure security.
For example, there was a major Amazon AWS outage on September 5 that took out the SQS queuing service for 1 hour and 40 minutes. Most other registration companies were offline, but our backup queues kept transactions running smoothly on RunSignUp.
As you can see from the graphic below, the performance has also improved from about 2.7 seconds to about 2.3 seconds even as traffic has increased from 2.5 Million page views per week to over 4 Million page views per week. Remember this is aggregate performance measured from the actual device so the 60% of mobile phone users are often a bit slow because of cell networks – fast internet connections are for the most part sub-second. This helps races have first class website performance and actually improves brand and conversions.
Here is a partial list of improvements we have made:
– Record and log user logins and location
– Add Multi-Factor Authentication
– Improved background task failure handling (e.g. E-mail marketing)
– Additional ways to start extra web servers
– Auto-start web servers based on load
– Block more attacks at Nginx level before they get to the app server level
– Improve monitoring and alerting of payment processor status
– Improve monitoring and alerting of web server response time
– Update deployment process to prevent occasional errors under high load
– Configure web servers with additional disks
– Refactoring database code
– Database backup updates
– Automate SQS backup queues
– Ability to switch to backup AWS region for some critical resources.
– Race page improvements such as lazy loading images and reduce CSS loaded
Managing and maintaining the core infrastructure for our tens of thousands of customers is not as flashy as a new feature, but it is the basis of the trust between us and our customers.