Identity is the heel of Achilles of the cloud

Published 5 min de lectura 163 reading

In recent years we have seen how you fall on large platforms in the cloud occupy headlines and leave millions of users watching a blank screen. Failed services such as those offered by leading suppliers have caused online stores, delivery apps and even critical business systems to be paralyzed for minutes or hours. For a user the discomfort may be not being able to order food or see a series; for an airline or a bank this translates into lost revenue, detention processes and reputational damage that can last weeks.

Behind these blackouts is not usually just the machine that serves a web: often the real victim is the shared infrastructure and within that category, identity is particularly vulnerable. Modern authentication and authorization architectures are based on a network of services and dependencies - user databases, policy engines, load balance, DNS and cloud control plans - which, if they fail, prevent any application from being verified and approved, even if the identity provider remains operational.

Identity is the heel of Achilles of the cloud
Image generated with IA.

Suppliers publish their incidents and state panels - for example, AWS dashboards in status.aws.amazon.com, de Azure en status.azure.com and of Cloudflare in cloudflarestatus.com- and their postmortem often confirm that these failures often arise by shared dependencies or changes in service chains. Understanding this interdependence is key: it is not enough for the identity service to be deployed in several regions if all such deployments use the same managed database, the same DNS provider or the same cloud control plane.

Identity is not a matter of "login and already": it is a continuous guardian. Modern security models such as Zero Trust, documented by entities such as the NIST in its SP 800-207 ( NIST SP 800-207), are based on the premise of always checking. This means that applications, APIs and services ask for credentials, tokens and authorisation decisions on a constant basis. When the subsystem that emits or validates these tokens no longer responds, a large part of the platform is stopped: micro-services that are called to each other, integration with third parties and internal automations are left without the ability to prove identity or permissions.

In addition, real authentication is more complex than it seems at first sight. A typical authentication event may require reading user attributes from directories, building the session status, emitting tokens with claims and scopes, and perhaps consulting policy engines for fine decisions. These distributed operations depend on infrastructure that, if degraded, blocks the full flow. Therefore, a failure in an apparently smaller component can become a unique point of perceived failure in full crisis.

Traditional high-availability designs do not always solve this problem. Many architectures are made up of regional replication or misswitching between areas. That works against isolated regional failures, but it does not protect when the root is in shared global services - such as a third-party-managed DNS, a cloud control service or a global database service - that affect all replicas alike. Real resilience requires seeing beyond replica: it involves understanding and reducing common domains of failure.

The design of strong identity systems requires intent. Some organizations choose multi-cloud strategies or to maintain controlled alternatives on-premises for the most critical parts of the identity flow. Others implement degraded modes that allow restricted access using cache attributes or pre-computed authorisation decisions, so that the essentials of the operation continue, even with reduced functionality. Not all identity data pieces require the same guarantee of availability, and deciding what can be degraded should be a decision informed by business risk, not architectural comfort.

Identity is the heel of Achilles of the cloud
Image generated with IA.

Planning how a system will fail is as important as making sure it works under normal conditions. The response to identity incidents should be a priority and integrated into business continuity plans: specific monitoring of dependencies, alerts that cross domains and simulation exercises that check scenarios where the issuance of tokens or the validation of authorizations are compromised. Not treating the unavailability of identity as a secondary problem is a necessary cultural and operational change.

If you are looking for references to deepen, in addition to the NIST document on Zero Trust, it is appropriate to review public documents on models of shared responsibility in the cloud, such as the AWS guide ( AWS Shared Responsibility Model), and official postmortem that often reveal the real causes behind major incidents. To better understand the differences between authentication and authorization there are well-explained technical resources, for example in Curity which help to separate concepts and identify which parts of the flow are critical.

In short, the cloud offers scalability and agility, but it also concentrates dependencies that can become systemic risks. Identity is the axis on which security and the availability of services revolves; therefore it deserves a self-building design, with alternatives and degraded modes designed to protect business when infrastructure fails. Decisions on where to locate directories, which services to replicate outside common domains and how to allow minimum operations in case of failure should be made with risk criteria and practiced through real exercises. Only thus is the possibility of discovering, in the middle of the fall, that something as essential as identity verification is actually a single point of failure.

Coverage

Related

More news on the same subject.