2017 Availability Report

Knock on wood, but it has been another great year for RunSignUp availability, hitting a perfect 100% for the second year in a row! As we covered in February when a major outage for many sites on the Internet like Slack, Trello and Venmo, RunSignUp's advanced infrastructure was able to weather the storm. And having... Continue Reading →

Participant Reports in a Disaster

One of the early disaster recovery mechanisms we built into RunSignUp is saving a full list of your participants in a CSV file once a day starting the week before your race. We send race directors an email link when this is first created each year. In the event RunSignUp is not available (knock on... Continue Reading →

Technology is Hard

Amazon, the leading Cloud provider had some major issues today. The screen shot on the right shows their status page around 1PM today - and RED is not good. Even USA Today, and CNN reported the 4-5 hour outage this afternoon: "People reported outages and delays on services like Slack, Trello, Sprinklr, Venmo and even... Continue Reading →

Sunday Afternoon Errors

UPDATE: We have made two changes to our system to try to improve this situation if it happens again. First, we have adjusted the settings on our New Relic Monitoring tool to try to catch this type of error so we can respond more quickly than hearing about errors from users. Second, we have added... Continue Reading →

Infrastructure Improvement Summary

We try to improve our infrastructure in Q1 of each year to make sure we do not build up technical debt and stay on top of the most current trends in technology. We have four goals when we look at this each year: Improve Availability - reduce the chance our systems will go down, and... Continue Reading →

Updated Database Backups

As part of our big infrastructure upgrade, we have improved our processes for database backups that are in addition to the AWS Aurora automated backups. The Aurora backups are good from a reliability point of view with features like enables point-in-time recovery. However, the backup is only good for 35 days and the time to recover... Continue Reading →

Improved Background Jobs Availability

There are a number of processes that run on a scheduled basis in a big system like RunSignUp - these are called Chronjobs. They are typically run on one server in most environments, and that is how we did things until today. As part of our infrastructure upgrade, we have added the ability for these... Continue Reading →

Up ↑