Securing the next wave of workload identities in the cloud - Source: www.csoonline.com

Your cloud’s biggest threat is the ghost army of machine IDs you forgot, making a zero-trust strategy for every workload absolutely non-negotiable.

It was a moment of realization for our mid-size law team. As we were constructing a new cloud-native analytics pipeline that spanned our private data center and public clouds, we found ourselves distributing API keys and identity and access management (IAM) roles with abandon. Initially, it seemed like a swift and efficient approach — we were making rapid progress. Little did we know this approach would lead to unforeseen security issues.

However, it soon became a pressing issue. Every new microservice or function has its own identity. We named them things like “svc-backend-prod” or “data-worker-staging,” with minor differences for dev or test. We had scripts rotating our core secrets every week, but nothing for these new accounts. One day, our monitoring flashed a warning: an old service account token was used in two different clouds. My heart sank. It turned out that a developer had copied a long-expired key into a forgotten script.

We scrambled to inventory all the machine identities we had. It looked like dozens of shadow accounts hiding in every cloud and cluster. No wonder misconfigurations and excessive privileges had crept in. Our agile dev process had inadvertently set the stage for a breach.

According to one report, many enterprises are unaware of the number of machine IDs they own — the study found “45 times more machine identities than human ones,” most of which go untracked, as noted in a VentureBeat analysis. In our case, I estimate we had hundreds of these identities, far more than we realized.

Cloud identity sprawl in the multi-cloud era

This is the new battleground in cloud security. While we often hear about threats like phishing or ransomware, a more insidious risk is on the rise — machine identities. In a multi-cloud environment, the number of credentials for each microservice, virtual machine (VM) or serverless function can quickly spiral out of control. We found ourselves managing half a dozen IAM systems without a unified view of them. Roles like “etl-service” in one cloud were performing the same function as “etl-worker” in another, and we were struggling to keep track of the duplicates.

It was easy to make mistakes. In our rush to deliver, we gave many service accounts broad admin rights, planning to narrow them down later. The statistics are clear: In its 2024 Top Threats report, the Cloud Security Alliance ranked IAM as the number one concern. That includes human and machine accounts. In practice, a stolen or misused machine identity lets an attacker move laterally — after all, workloads are supposed to trust each other.

Our near miss proved the danger. In one incident, a background data job accidentally used an administrative key to access production databases. We only caught it because our central logging flagged it. Imagine an attacker discovering the same key in a public repository or intercepting it on the network. The damage could have been far worse.

The root cause was apparent: traditional security models were breaking down. The Cloud Security Alliance notes that serverless functions, ephemeral containers and microservices render static policies ineffective. We saw this firsthand. A container might start with a token and be gone an hour later, leaving almost no footprint unless you continuously audit. We knew we had to treat every workload as untrusted and verify it constantly.

That meant extending zero trust to every machine. We moved to mutual TLS and short-lived tokens for service-to-service calls. For example, our containers in Kubernetes now use OIDC tokens from our identity provider instead of embedded keys. These tokens expire after minutes. This approach aligns with modern frameworks like SPIFFE, which emphasizes short-lived identities to reduce the likelihood of breach through credential compromise, as recommended by the SPIFFE framework.

In short, there are no more perpetual keys sitting on instances. We also enforced the least privilege rigorously. A permissions audit revealed that many roles had too much authority. We stripped away rights that weren’t strictly needed. If a data loader only needed read access, that’s all it keeps. We automated this via policy-as-code: any new service identity is created with minimal scopes by default. Over time, we saw complaints about cross-cloud access drop dramatically. When higher privileges were needed, the service had to request them at run time (just-in-time access) rather than holding them permanently.

For visibility, we centralized identity events across clouds into our SIEM. We now alert on anomalies such as a container in a dev environment suddenly accessing a critical API. AI-driven analytics helps surface real threats among all the noise. Industry trends reflect this: cloud XDR solutions are evolving to give unified visibility across multi-cloud workloads.

Extending zero trust to workloads

Applying zero trust beyond just passwords is crucial. On the human side, MFA and conditional access are standard. For workloads, we implemented a similar approach using tokens, certificates and continuous checks. When one service calls another, it presents a cryptographic token or certificate, and the target service verifies it each time it is called. In effect, every microservice needs a current “badge swipe” to access resources.

One success story was our batch processing cluster. It previously relied on a service account with a static API key. We reworked it so that each node starts with a short-lived client certificate issued by our internal public key infrastructure (PKI). The servers only accept these short-lived certs. Since certificates are renewed daily, any stolen certificate quickly becomes useless. Plus, we log all certificate requests and approvals through our secure pipeline, so we always know which identities are active.

We also embraced “just-in-time” privileges. Maintenance scripts and admin jobs now request elevated roles via our CIEM-based workflow instead of running with full privileges by default. An identity gets a time-bound token when needed. After the job finishes, we revoke the elevated rights. This change significantly reduced the risk window for our most critical accounts.

The payoff has been intense. We went from frantic firefighting to proactive control. Developers now consider whether they need a new identity at all — sometimes, a single well-scoped account serves multiple services. And we sleep better knowing that even if one token is leaked, it remains valid for only a short time.

A roadmap for securing workload identities

Based on our journey, here are the practical steps we took:

Take inventory and classify. Discover every non-human identity. We wrote scripts to pull lists of all service accounts, keys, certs and roles from each cloud and cluster. Then, we tagged them by team and purpose. You can’t secure what you can’t see.
Enforce the principle of least privilege. Audit each identity’s permissions and remove any excess permissions. We used automated tools to compare privileges to needs. Any divergence triggers an alert. This ensures no workload has more rights than necessary.
Use short-lived tokens. Replace static secrets with ephemeral credentials. For example, Kubernetes pods now authenticate to cloud services using short-lived OIDC/JWT tokens or x.509 certificates rather than long-lived keys, as recommended by the SPIFFE framework. This means credentials expire automatically. We also automated the regular rotation of any remaining secrets.
Just-in-time access. Integrate a CIEM or vault for time-bound privilege elevation. Engineers request the needed rights through an approval flow, and an ephemeral token is issued. This reduces the number of privileged tokens standing across clouds.
Continuous monitoring. Feed machine identity activity into a cloud-aware XDR/SIEM. We monitor for anomalies, such as unknown workloads that are called sensitive APIs. By utilizing AI/ML to prioritize alerts, we can quickly identify and address any misuse of identities.
Policy-driven governance. Codify identity policies in code. Whenever a workload is retired, its identity is automatically revoked. We require all identities to have an owner and an expiration date. New identities are created through pull requests, which undergo peer review and approval.

Implementing these took effort, but it built lasting resilience. We moved from reactive fixes to a robust posture. Development teams still innovate quickly but within guardrails. Securing machine identities is now as embedded as securing human users.

Securing the next wave of cloud identities is an ongoing task. As we expand to more cloud providers and edge devices, our policies must adapt. We’ve started exploring decentralized identity proofs (like DIDs) for IoT devices and confidential computing for sensitive workloads. However, the core principle remains: verify every identity and minimize its blast radius.

Today, I’m confident in our next wave of cloud deployments. By treating workloads with the same zero-trust, identity-first rigor we apply to our users, we’ve built a foundation for secure growth. Ultimately, a machine without a validated identity is merely a vulnerable entry point; however, one with the proper controls becomes a trusted part of the system — exactly as it should be in a modern cloud.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

SUBSCRIBE TO OUR NEWSLETTER