CrowdStrike / Microsoft outage leaves me strandedtranslated

Last year, I took 264 meetings in 19 countries. I’m a member of every airline and hotel program that you can name. I’m used to flight delays, cancellations, and lost luggage—it comes with the territory.

But even for a professional traveler like me, last week’s travel catastrophe was on a whole other level. I had been traveling back home from Singapore and had landed in Incheon, Korea when the CrowdStrike IT outage began to affect 8.5 million Windows devices and everything went sideways. Here are some of the lowlights:

  • The local ground staff knew nothing and could do nothing because all their systems were down.
  • The outage caused knock-on security issues that meant we had to leave the terminal. That in turn caused other problems.
  • Because most systems were down, there was no formal way to handle international travelers like me, who now had to enter Korea. Eventually, we got stickers denoting whether we were transfers or local travelers; my “official entry visa” was a piece of paper on which someone wrote “flight canceled.”
  • Because so many of us were waiting in the terminal for our initial flight, we’d made a lot of duty-free purchases. Naturally, we couldn’t enter Korea with those purchases, so the shop had to come back with a pile of cash (the credit cards weren’t working either) to offer returns. It was a bit like the bank run in It’s a Wonderful Life except with perfume and Kinder Eggs.
  • When I called my airline using my loyalty program, I was told that I’d have to wait more than seven hours to speak with a representative.

I eventually made it home (thank you, Denise). I shudder to think what would have happened if I made it to US Border Control before everything was restored. But the trip was obviously a bit longer and more dramatic than I would have preferred.translated

CrowdStrike and Microsoft cause global outagestranslated

My flight was just one of the more than 42,000 delayed last week. But obviously, it was more than just travel that was affected. “Everything from airlines to banks to healthcare systems in many countries was hit,” reported The New York Times, which noted that the outage “affected emergency 911 lines in multiple states,” canceled elective surgeries in Germany, locked doctors out of the NHS in Britain, delayed JPMorgan Chase from processing trades, and kept TD Bank users out of their accounts. CNN estimates that global costs from the outage could top $1 billion, not to mention “demands for renumerations and very possibly lawsuits.”

And, because cybercriminals never waste a good crisis, the outage also already led to an increase in phishing attacks.translated

Where do we go from here?translated

Between my time at the gate, in my hotel, and in the security queue, I had a bit of time to read up on the global Microsoft / CrowdStrike outage and think about how so many organizations got here—and what they can do differently to prevent this from happening again.

Honestly, I can’t understand how so many organizations don’t practice what they preach when it comes to having multiple layers of defense and redundant systems. No one vendor should have this much of an impact on an organization’s operations, period.

Organizations need to expand their security and build resiliency rather than put everything in one basket. Operationally, do you have a fail-over that can keep your business up and running, or are you willing to let your business go down if the cloud does, too?

It’s not just me asking: in the U.S., the Department of Homeland Security said “the best response to Friday’s outage would be to require companies and governments to have redundant systems so they have a backup when their systems go down.”translated

An object lesson in security-first, redundant systemstranslated

I can’t speak to DHS policy, but I can speak to RSA technology. There’s a reason that we’ve built our business on security-first practices, redundant systems, and testing our updates before they go out. We know that cloud outages occur, and we want our customers to be prepared for that eventuality—not left holding the bag.

That’s why RSA® ID Plus provides on-premises failover specifically for situations like these: the capability ensures that organizations can still securely authenticate users and provide access to resources even if the cloud goes down.

It’s one of the most important tools we offer. Because as abstract as tech is at times, it really does have real-world consequences. Last week’s outage shows that when we say something like “a user is locked out of their account,” what we really mean is that patients are locked out of hospitals, families are locked out of their savings, and travelers locked out of their homes.

I think it’s incumbent on tech to start building alternatives—and organizations to look for resilient solutions and have back-ups in place before they’re needed.

It’s also incumbent on me to send some flowers to Denise. Thanks to all the frontline customer service personnel around the world who had to deal with panicky customers. You worked miracles with pen and paper—but you really shouldn’t have to.

###

RSA® ID Plus includes on-premises failover, which allows organizations to continue deploying IAM capabilities even in a cloud outage. Try ID Plus, the complete identity security platform now—for free!   translated