Recently we experienced a system failure that affected the processing and delivery of our MP3 files. All content from July 25th until the present has been restored. We are starting to process the older recordings today and they should be completely restored soon. The failure has prompted us to implement improvements in the way we process, deliver, and backup recordings. Here is what we are in the process of doing:

  1. We are moving all of the recordings to an eternal direct attached storage (DAS) unit. This unit has built in fault tolerance, is designed for heavy read/write access, and can be attached to any computer on our system. If the computer hosting the DAS unit fails, we can move it to another computer within minutes.
  2. While we do backup all recorded content in triplicate, we’ve discovered that our restoration process is very slow. We are in the process of implementing a new backup procedure. One of the three copies will be copied to a “hot” back up unit. If the original unit fails for any reason and cannot be restored, we can move the hot backup unit into production within an hour.
  3. We are changing the way we process our recorded content. Currently, we process all content serially. In other words, if 20 files are waiting to be processed, all 20 files must be processed before a single one gets posted to the internet. Early next week, we will have a new process in place that will process files in parallel. Smaller files will be processed very quickly without having to wait behind large files that take a long time to process.

With these changes, if we experience the same system failure, we will be back online again within minutes, with ALL content restored.


