robybret - Fotolia
At approximately 8:30 AM EDT on June 19, all of the Amazon Web Services instances for Websolr.com and Bonsai.io -- both owned by Austin, Texas-based hosted search as a service provider One More Cloud Inc. -- disappeared. It would soon become clear that the instances were maliciously deleted, initiating a weeks-long scramble to restore customers' data and identify the cause of the attack.
Earlier that week, code-hosting firm Code Spaces, another provider based in the Amazon Web Services (AWS) cloud, had been forced out of business when an attacker gained access to its AWS management console credentials and deleted practically all of its customers' data.
Unlike Code Spaces, One More Cloud was able to restore all of its services relatively quickly, though a harrowing account of the incident by one of the company's founders provides illuminating lessons for any business operating in the cloud.
Haunted by old mistakes
Founded in 2009, One More Cloud's search services have long operated on AWS' Elastic Compute Cloud (EC2), and according to co-founder Nick Zadrozny, the business, with hundreds of customers -- many of which also use the cloud application platform Heroku -- had enacted industry standard security best practices from its founding.
Nick Zadroznyco-founder, One More Cloud
In particular, Zadrozny said the company's employees were aware of the dangers that application programming interface (API) keys with unlimited permissions posed to cloud-based organizations like One More Cloud, which is why the firm has long had only two keys with truly privileged access: one for Zadrozny and another for co-founder Kyle Maxwell. All other keys had been generated via Amazon's CloudFormation provisioning system with a limited scope, he noted, and those keys are rotated on a regular basis. He added that One More Cloud also avoided committing credentials to source code for fear that they would be leaked.
However, these security practices were not strictly followed when the compromised AWS account was initially created, according to Zadrozny, which is why attackers were able to gain access to an old AWS API key that was generated in 2006, predating not only the formation of One More Cloud, but also Amazon's identity and access management functionality. With that key, attackers were able to gain full access and delete all of One More Cloud's AWS instances, knocking both the company's Websolr and Bonsai services offline.
Zadrozny said the company essentially set the stage for the breach by committing a two-pronged mistake: First, the old API key had been mislabeled, so it appeared to have much weaker permissions than it actually did; second, the overly powerful key was committed to source code.
One More Cloud also has only three full-time employees, Zadrozny said, meaning the company relies on third-party engineers to collaborate on certain projects as needed -- and those collaborators have generally not been held to the same security standards as internal employees in the past.
Though a third-party security firm is investigating the incident, Zadrozny indicated that the API key at the root of the breach was likely leaked through an insecure system of one of those collaborators that had access to the company's private GitHub repositories.
Investigation leaves lingering question
Zadrozny said he received several queries in the immediate aftermath of the One More Cloud attack questioning whether the service outage was the result of an accident rather than malicious action.
The company was able to quickly confirm that it had experienced an attack, he noted, because it doesn't have any automated systems capable of terminating Amazon EC2 instances. Also, the attack began early in the morning before anyone with the authority to take such drastic action would be working.
"It was pretty easy for me to make that judgment call right away that this was not an accident," Zadrozny said, adding that both the logs and the third-party security firm hired for the investigation concurred with his assessment.
Narrowing the initial attack vector down to the nearly decade-old AWS API key too was a relatively simple process, Zadrozny said, thanks to help from AWS support, which confirmed the key was used to delete the company's instances.
What remains unclear to Zadrozny one week after the incident is why it ever happened.
In the case of the Code Spaces hack, the company noted that attackers had left a note in its AWS management console demanding a ransom in exchange for ceasing a distributed denial-of-service attack (DDoS). Code Spaces, a frequent victim of DDoS attacks, decided not to pay the ransom and instead tried to regain control of its AWS console, with the company ultimately shuttering after the attackers deleted nearly all its data.
One More Cloud never received any sort of ransom demand though, according to Zadrozny, or any communication from attackers. That leaves open the worrying possibility that cloud-based firms are being targeted not only for monetary gain, but also simply because the opportunity exists.
"It's hard to imagine who has anything to gain out of this. Certainly anyone that would be in a position to benefit from us being down would probably have more to lose by doing something that reckless," Zadrozny said. "So I can't imagine the motivation for someone to do it other than probably simple vandalism. Someone knew that they could [take us down] and just decided to go for it."
AWS to share any blame?
With two AWS-based businesses experiencing an attack in the same week, questions have surfaced over the security of Amazon Web Services environments, and whether cloud providers should play a larger role in securing their customers' systems and data.
Zadrozny said providers like AWS are generally responsible for creating tools that "help customers help themselves," and that holds true for security as well. AWS does a good job of offering tools like CloudTrail -- which records API calls and produces corresponding log files -- according to Zadrozny, and indeed might be the security leader among major cloud providers.
Still, he said, there were a couple of areas that AWS could improve based on the One More Cloud incident.
For one, when the company was trying to rule out all possibilities besides an attack, Zadrozny said the company attempted to determine whether someone had gained access to its AWS management console, but was unable to view active sessions. Through further testing, the company was able to confirm that changing the password for an AWS account would not deactivate other currently active sessions.
CloudTrail logs might be able to provide some of that visibility for security-conscious customers, Zadrozny conceded, but in comparison to services like GitHub and Gmail -- both of which provide a view of active sessions and the ability to expire them -- AWS still has room to improve.
Zadrozny said a current AWS policy, namely the lack of human-readable API key identifiers, also played a part in One More Cloud's main failure: the misidentified API key. A key only appears as a random string of characters, he explained, making it impossible to differentiate them at a glance, even though Amazon already provides more easily understood identifiers when creating elastic load balancers or CloudFormation stacks.
"So it's things like that where you could say it's all up to the customer on AWS to understand these things," Zadrozny said, "but I think there is definitely also room for Amazon to help improve their product and take a little more ownership over those edge cases. I'm sure over time that sort of thing will happen."
More isolation, communication needed
Though the incident investigation continues, One More Cloud has restored all services for its Websolr and Bonsai customers. Zadrozny said that doesn't mean the event will be forgotten, as the company plans to improve its security and response strategies.
In particular, he said One More Cloud will focus on expanding a strategy that already served it well during the recent attack: isolating accounts. The company's websites and primary databases that hold customer data already run on different cloud providers and accounts, according to Zadrozny, but plans are in now in place to move its log data collection and data backups into separate accounts with unique credentials.
"It's important to have that kind of stuff isolated from your main production accounts because it's also sensitive," Zadrozny said. "Someone could have done more damage if they had decided to delete all of our logs in our S3 buckets, or at least what logs we were collecting at the time, but that fortunately didn't happen."
Apart from the mislabeled API key at the heart of the incident, Zadrozny said perhaps the biggest mistake made by One More Cloud was failing to establish regular lines of communication with customers outside of Twitter and the company's status page.
In the past, he indicated that the company had relied on individual support tickets, but had neglected to build out the email infrastructure necessary to communicate to customers at a mass scale. That system was quickly overwhelmed as One More Cloud's support traffic spiked dramatically with the service down.
"We've invested a lot in being able to quickly and easily respond with polished tools in an outage, especially as far as recovering the service, but have left the communication side of that a little underdeveloped, and it's definitely something we want to work on," Zadrozny said.
Though One More Cloud now has security policies in place that would prevent a similar incident from occurring in the future, Zadrozny said the incident reinforces why security should basically always come first.
"Especially if you are a small company and just getting started out, any time you have one of these security vs. convenience tradeoffs, you have to make that decision: Am I going to go for security or am I going to go for convenience," Zadrozny said. "As an operations company, we know that our customers are giving us a special level of trust with their data, and so we always choose the security side of that tradeoff."