Modern digital and cloud technology support the shift that enables companies to implement new processes, scale them quickly and serve customers in a whole new way.
Historically, organizations would invest in their own IT infrastructure to support their business objectives, and the role of the IT department would be to keep the "lights on".
To minimize the chance of failure of the equipment, engineers traditionally introduced an element of redundancy in architecture. That redundancy can manifest itself on many levels. For example, it can be a redundant data center that is kept as a & # 39; warm & # 39; or & # 39; warm & # 39; site with a complete set of hardware and software that is ready to take on the workload in the event of a failure of a primary data center. Components of the data center, such as power and cooling, may also be superfluous to increase resilience.
On a lesser scale, network infrastructure elements within a single data center may be unnecessary. It is not unusual to purchase two firewalls instead of one to configure them to balance the load or to have a second firewall as a backup. Energy companies and utilities are still struggling with critical industrial control equipment to be able to respond quickly to a defective part.
Traditional data protection
The majority of efforts, however, went into protecting data storage. Magnetic disks were assembled in RAID's to reduce the risk of data loss in the event of a malfunction, and backups were degraded to magnetic tapes to preserve less time-sensitive data and stored in individual physical locations.
Depending on specific business objectives or compliance requirements, organizations had to invest heavily in these architectures. However, one-off investments were only one side of the story. Further maintenance, regular tests and periodic upgrades were also required to keep these components operational. Labor, electricity, insurance and other costs contributed to the final settlement.
In addition, if a company was operating in a regulated area, for example, processing payments and cardholder data, external audits, certification, and certification were also required.
Ensuring resilience with the advent of cloud computing
With the advent of cloud computing, companies were able to cut a large part of this complexity and have someone else carry out the construction and operation of data centers, as well as dealing with physical security compliance issues. However, the need for resilience of companies has not disappeared.
Cloud providers can offer options that far surpass traditional infrastructure (at comparable costs), but only if they are configured correctly.
An example of this is the use of & # 39; zones & # 39; of availability, where your sources can be used in physically separated data centers. In this scenario, your service can be balanced over these availability zones and can continue to work even if one of the & # 39; zones & # 39; is interrupted. Capital investments that are needed to achieve such functionality are much larger if you want to build your own infrastructure for this. In essence, you should build two or more data centers. You better have a solid business case for this.
However, additional resiliency in the cloud is only achieved if you design your solutions well: running your service in a single zone or, worse, on a single virtual server can be less resilient than running it on a physical machine.
It is important to keep this in mind when you decide to move to the cloud from the traditional infrastructure. Simply lifting and moving your applications to the cloud can actually reduce resilience. These applications are unlikely to have been developed to work in the cloud and to take advantage of these additional resilience options. That is why I advise against such a migration in favor of a new architect.
SLA's of Cloud Service Providers should also be considered. Compensation may be offered for non-compliance, but it is your job to examine how this relates to the traditional availability of "5 nines" in a traditional data center, in addition to the financial differences between service credits such as pay and business losses as a result of a lack of availability.
Cloud Service models
You also have to take into account the many differences between cloud service models.
For example, when purchasing a SaaS, your ability to manage resilience is significantly reduced. In this case you fully rely on your provider to keep the service active, which may cause the provider failure. In this scenario, archiving and regular data collection may be your only options, apart from checking the SLA's and accepting the remaining risk. However, even with the data, your options are limited without a second application at hand to process that data, which may also require data transformation. Study the historical implementation and carefully choose your SaaS provider.
IaaS gives you more options to design an architecture for your application, but with this great freedom comes great responsibility. The provider is responsible for fewer layers of the total stack when it comes to IaaS, so you have to design and maintain a lot yourself. Accept failure instead of considering it as an (external) possibility. Availability zones are useful, but not always sufficient.
In which scenarios should the use of a separate geographic region be taken into account? Are there scenarios or requirements that justify a need for a second cloud service provider? The recommendations of the European Banking Authority on Exit and Continuity can be an interesting example to look at from a testing and deliverability perspective.
Finally, as always, PaaS is somewhere between SaaS and IaaS. I think it often depends on a certain platform; some offer options that you can play with when it comes to resilience and others keep complete control. Consider features of SaaS that also affect PaaS from a redundancy perspective. For example, if you use your own PaaS, you cannot just lift and move your data and code.
Above all, when designing for resilience, a risk-based approach. Not all of your assets have the same criticality. Understand the priorities, know your RPO and RTO. Remember that SaaS can be built on top of AWS or Azure, causing you risks for the supply chain.
Even if you assume the worst, you may not have to keep every single service running if the worst really happens. To begin with, it's too expensive – ask your business stakeholders. The worst time to determine your resilience approach is in the middle of an incident that is closely followed shortly after an incident. As with other elements of security in the cloud, the resilience must "shift to the left" and be addressed as early as possible in the delivery cycle.
As the boy scout movement likes to say: be prepared.
About the author: Leron Zinatullin (@le_rond) is an experienced risk consultant specializing in cyber security strategy, management and delivery. He has led large-scale, global, high-quality security transformation projects to improve cost performance and support business strategy. He has extensive knowledge and practical experience in solving information security, privacy and architecture issues in various industry sectors. Go to the Leron blog here: https://zinatullin.com/. For more information about the psychology behind information security, read Leron's book, The Psychology of Information Security.