And from the ashes it shall rise
Three hundred miles west, as the sun rose above a drowsy Paris, phones began chirping with the day’s first (of many) pings. National media was all over the story, but it was unclear if the average web surfer fully grasped its implications.
In the Lifen offices, engineering staff - who had been up since sunrise as they learned what happened - watched as VPNs went dark, host indicators were lost, and memory alerts started flashing right and left. Indeed, one of OVHcloud’s main units had been destroyed, and another had been irreversibly damaged, effectively wiping out most of the Ramsay Santé data managed by Lifen.
While a datacenter fire is the stuff nightmares are made of, it is part of an extremely remote array of tragic possibilities few can predict. Lifen’s engineers had, however, crafted a solid recovery plan in accordance with the company’s local and international ISO certification requirements, should a sudden and unfathomable host loss situation ever see the light of day.
In just a few hours, Ramsay Santé received not only the bad news, but also the reassurance that a full recovery plan had been deployed. The latter was reviewed by Ramsay IT decision-makers and signed off. Lifen had the green light to begin rising the phoenix from the ashes- data had to be recovered, verified, and then hosted on a brand new infrastructure. In its essence, the process is straightforward, but it is not always without its obstacles and pitfalls.
The first step in any data recovery operation is, logically, to get elbow-deep in the backups. If you’re lucky, and if your customer is serious about their recovery protocols, half of the work is done. Fortunately for Ramsay Santé, Lifen had set up recovery backups in case the former’s data needed to be quickly restored. Thus, Lifen was able to begin recovering the backup right away, with minimal manual coding.
In fact, Industrialized Infrastructure-as-Code, the managing and provisioning of infrastructure through code, instead of through manual processes, was pivotal in helping the team produce an almost exact replica of the original infrastructure, in just under a few hours. The recovered infrastructure was then safely stored on a new host, AWS. This was no arbitrary choice: as part of its disaster recovery protocol, Lifen had designed AWS to be its backup host to the compromised customer environments. Much in the fashion a pilot maps the ground for nearby landing strips, shall he have to emergency land.
Once the applications were back online on the new environment, the backup restored, VPN bridges rebuilt (and oh, had they been burnt!) the whole system had to be checked. This is a step that cannot be skipped: worse than losing all your data is the prospect of only losing a part of it, but not truly knowing which. How would this translate onto a clinical setting? Not knowing if the most recent available X-ray records are up-to-date, or if there have been any recorded decisive developments in your patient’s pneumonia since. Sometimes, a thorough check is a matter of life or death.
Ramsay Santé had its Lifen OS data and full system up and running. The whole process took exactly two working days, with a whooping 95% of data being recovered within that time frame.
Lifen’s ability to recover data - literally from the ashes! - at such a pace, can be explained by the golden triad of disaster recovery: there were solid incident response protocols, whatever tasks could be automated had been programmed to be so, and the team of experienced engineers had benefited from sturdy training on how to respond to such scenarios.
Add to that a genuine commitment to customer happiness, as well as a generalized sense of responsibility towards the bigger picture, a value that Lifen puts at the center of its human resources policy. In French, they call it simply Bienveillance.
And more than just a fancy French word, Bienveillance can also translate into a very tangible contractual commitment, established between Lifen and its customers, to ensure that designated service-related goals are met. This is called a Service Level Agreement, and Lifen wears its 99,9% badge of recorded spotless service with pride. In this context, spotless service means no service disruptions - if the system is down for any longer than 44 minutes, as a provider, Lifen could incur financial penalties such as crediting the customer back.
Datacenter fires are not an everyday occurrence, but they’re far from being the only threat to your data. In fact, in cyberspace, cyberthreats are garden-variety issues: think ransomware, hosting provider unavailability… or simply good old human error. Lifen’s CTO Dali Kilani offers a few words of advice for healthcare organizations concerned about preserving the integrity of their data under all circumstances:
Double-check your backups
But first, you need to carefully survey the scenarios that can lead to you needing a backup in the first place. Then, it’s time to set specific parameters, such as recovery time objectives, and maximum admissible data loss. Only then can you plan accordingly to set the backups you need. But that’s not the end of the line: remember to monitor backup operations and to regularly validate backup files by restoring them using the predefined recovery procedures.
Triple-check your providers have recovery protocols
Not all providers are alike, and some have more experience than others when it comes to worst-case scenarios. Make sure to ask the question during the auditing process, namely how long they believe it would take to recover and rebuild an infrastructure shall an unexpected event arise.
Automate your code
Solutions such as industrialized Infrastructure-as-Code can save the day in the event that your system goes offline due to a host failure. Having your entire architecture described in industrialized code means that you can easily have it reproduced. This requires minimal human intervention, thus avoiding human error and saving your team precious time.
Service Level Agreements
When choosing a new provider, it is important that you consider enforcing this type of engagement, as well as verifying the historical percentage of customer goals met. Think of it as a guarantee that you will not be dealing with frequent disruptions, and that if things do occasionally go South, you can claim adequate compensation. The interest of agreements of this kind only goes to show how a SaaS partner can make all the difference.
What’s worse than a fire? A fire roaring outside while you stand in a room full of smoke, in the dark. This is exactly what it can feel like on the customer’s end, and believe us, it’s just as nerve-wracking as it sounds. But it doesn’t have to be. If you’re a provider, make sure to practice excellent communication with your customer, and to keep them updated on the recovery status of their data at all times. Be realistic with your estimations, and transparent about the roadblocks. It might be useful to include a designated Recovery Officer in your protocols, and prepare them to act as your main spokesperson and point-of-entry when incidents of this sort happen. If you’re a healthcare establishment, make sure to request accurate ETAs, as well as thorough updates on the integrity of your recovered data.
Remember: lives depend on it.