Anatomy of a Hack: SolarWinds and Ripples Beyond

In this special “3x”-long episode of our (otherwise shortform) news analysis show 16 Minutes — past such 2-3X explainer episodes have covered section 230, Tiktok, GPT-3, the opioid crisis, more — we cover the SolarWinds hack, one of the largest (if not the largest!) publicly known hacks of all time… and the ripple effects are only now starting to be revealed. Just this week, the U.S. Cybersecurity and Infrastructure Security Agency shared (as reported in the Wall Street Journal) that approximately 30% of both private-sector and government victims linked to the hack had no direct connection to SolarWinds. So who was compromised, do they even know, can they even know?!

Because this hack is a supply-chain compromise involving various third-party software and services all connected together in a “chain of chains”, the knock-on effects of it will be revealed (or not!) for years to come. So what do companies — whether large enterprise, mid-sized startup, or small business — do? What actually happened, and when does the timeline really begin? While first publicly revealed in December 2020 — we first covered the news in episode #49 here when it first broke, and there have been countless headlines since (about early known government agency victims, company investigations, other tool investigations, debates over who and how and so on) — the hack actually began not just a few months but years earlier, involving early tests, legit domains, and a very long game.

We help cut through the headline fatigue of it all, tease apart what’s hype/ what’s real, and do an “anatomy of a hack” step-by-step teardown — the who, what, where, when, how; from the chess moves to technical details — in an in-depth yet accessible way with Sonal Chokshi in conversation with a16z expert and former CSO Joel de la Garza and outside expert Steven Adair, founder and president of Volexity. The information security firm (which specializes in incident response, digital forensics/ memory analysis, network monitoring, and more) not only posted guidance for responding to such attacks, but also an analysis based on working three separate incidents involving the SolarWinds hackers. But how did they know it was the same group? And why was it not quite the perfect crime?

image: Heliophysics Systems Observatory spacecraft characterize, in the highest cadence, the constant stream of particles exploding from the sun affect Earth, the planets, and beyond via NASA Goddard Space Flight Center / Flickr

Show Notes

An overview of how SolarWinds was hacked [2:21], the attackers’ methods [5:33], and their impressive sophistication [8:17]
A step-by-step explanation of how the attack took place [14:20] and how the hackers avoided detection [21:18]
Open discussion of what the experts know so far [23:52], including how we know the attack was coordinated by the same group [29:21]
Big picture security questions [33:26] and how businesses and consumers can protect themselves [42:51]

Transcript

Sonal: Hi, everyone. Welcome to this week’s episode of 16 Minutes, our short form show where we talk about the news, tech trends in the headlines, tease apart what’s hype/what’s real, and where we are on the long arc of innovation. I’m Sonal, and today’s episode is actually one of our special “2-3X” long explainer episodes — which I’ve done every so often for topics that keep coming up over and over in the news (most recently on Section 230 and content moderation, previously on TikTok, and even earlier on on the opioid crisis). You can catch all those at a16z.com/16Minutes. But today, we’re covering the SolarWinds hack, one of the largest (at least publicly known) hacks of all time.

Not only has it been in the news a lot since it was first publicly reported in December, with countless headlines since — but the most recent report, from the acting director of the Cybersecurity and Infrastructure Security Agency, was that approximately 30% of both private-sector and government victims linked to the hack had no direct connection to SolarWinds, as reported in the WSJ just yesterday. So, it’s gonna have ripple effects for quite some time.

So, we’re doing an “anatomy of a hack”: a teardown of the specifics we know so far, what went down, and what we need to know — whether big company, small company, or individual.

For quick context before I introduce our experts: Over 18,000 customers downloaded compromised software, though it goes well beyond them. Those customers include several large government agencies (which we covered last year on this show). Private sector victims include companies like Cisco, Intel, Microsoft, NVIDIA, Deloitte, VMware, Belkin, and others. The broad consensus – per a statement issued by the Office of the Director of National Intelligence, the FBI, Department of Homeland Security, and National Security Agency – is that Russia was most likely the origin of the hacking, and more specifically, that the Cozy Bear group (also known as APT29, overseen by Russia’s intelligence service) was responsible.

That’s just a super high level, because we’re actually gonna go deeper to break down the who what when how – and the chess game of it all. So now, let me quickly introduce our experts. Our in-house expert is a16z operating partner for security and former CSO, Joel de la Garza. And our special expert guest is Steven Adair, the president of Volexity, an information security firm that does incident response and forensics (including memory forensics), and they’ve responded to multiple cases of this. Their team actually put out several detailed posts on it and more.

Overview of the SolarWinds attack

Sonal: But first, Steven, can you summarize what happened? Obviously we’ll continue to dig in on the details throughout the episode, but the reason I’m asking is, I’ve started to lose track. I bet a lot of our listeners are getting a little inundated with this headline fatigue too, like — now this, now-what. So, tell me basically what actually happened. What do we know?

Steven: Yeah, sure. So, SolarWinds is a company that creates network and system management software that’s used really heavily by tens of thousands of organizations around the world, so it’s used by large giant commercial companies, Fortune 500. It’s used by small organizations, managed-service providers, and governments. So, it’s a piece of software used to manage these, like, really sensitive important assets. So, think about the IT teams, and people who wanna watch what’s going on key systems, on network devices, and things that are really important within a network. They have a product called Orion, that’s their flagship product.

And what happened is that SolarWinds was basically breached. How exactly, you know that’s not really been published. We don’t know. But attackers were able to compromise SolarWinds, get into what’s called, like, the “build process” of this product. So essentially, the development or the software that’s downloaded and used by all these organizations, they were able to get into SolarWinds’ networks, and modify that build process.

And what’s interesting and notable about this, is, they didn’t go in and modify the source code. What they did is — think about if you’re on an assembly line, and someone made a change like early on, and they all put it together — they actually waited til the very end, the very last step of compiling this package to make this software. It goes out. And they monitored it, they watched it, they looked at it, they learned, they tested. And they ended up compiling in a backdoor — which would give them access to the systems running SolarWinds Orion — for anyone who installed the update, or downloaded it freshly since they did this.

So, they were able to modify SolarWinds and push out this update to organizations all around the world. And basically, they’d create a shopping list and selectively target who it was that they wanted to go into, and basically break into and further their access. They could look through and see, “Oh, this company, or this government agency, I’m very interested in them.” They could actually activate and walk right into their network, and they’re already sitting in and going into a very sensitive part of that network.

So in short, it’s what’s called a “supply-chain compromise,” where, they’re really in the build process. Insert themselves to the backdoor into this legitimate software and expand their access — and do it very stealthily for many many months — until, you know, FireEye came forward and figured this all out in December 2020.

Sonal: Right. And just to quickly, even more high-level-context it — this is playing out against the broader landscape of — for many years now, companies have obviously been using various providers of third party and cloud software and services. We’ll delve into this whole notion of a supply-chain hack, what it means, what it means for the future of security.

But the thing I wanna really pull on from what you said is that this was very unusual because they didn’t go for the source code, they kind of waited for the updates, and then they were very targeted — as opposed to just sort of spray and pray. So, in your assessment of all the hacks that you’ve seen out there — and Joel I wanna hear your thoughts too here — is this really a sophisticated hack? Because obviously, in our show, we not only tease apart what’s hype/what’s real. I often wonder if that word gets thrown about very casually.

Steven: Yeah, so from our opinion, it’s definitely — this aspect of it — is certainly one of the more sophisticated that we’ve seen. And it’s not necessarily that there aren’t a lot of smart people around the world, good and bad, that couldn’t pull off something similar. It’s, you know (1), the fact that they did; (2), they did it so strategically; and (3), you know, even if they had gone in and modified the source code, people would still be talking about how sophisticated it was. But, they took it up a notch and basically said, “Yeah, we modify this code, or someone’s watching it, or they audit it, or someone’s watching a check-in process.” Basically it went to a system where none of that mattered anymore. And they just kind of bypassed all that and went, like, straight for the jugular, in what would — I would argue a much more difficult way to go about it, but a lot more likely to meet with success and go undetected in that. I think they gambled correctly in this case.

Joel: I mean, I think with these kinds of operations — and this is ultimately an espionage, you know, nation-state professional-type operation — from my perspective, the duration and the extent to which these things can run undetected is usually the indicator of how sophisticated they are. And so, like, these long running, you know, really successful campaigns that avoid detection really belies like a level of sophistication.

Because operational security, right — like, covering your tracks — is actually just about as hard as getting in. And so, you know, the fact that they exercised their ability to cover their tracks for so long, to know where to insert in the process, and to lay low, is just indicative of a level of discipline that you don’t necessarily see in a lot of attackers.

Sonal: Not just get in, but be able to cover their tracks, which is what both of you guys say. And by the way, we’ve only talked about the duration of when the hack was revealed by FireEye, and that it had been you know several months before. Do you guys have specifics on what the latest date-point is, in that timeline?

Steven: Yeah, at first, essentially what they did was an experiment early on — and this has been posted publicly. The code in SolarWinds Orion was modified in late 2019. Where basically they made some initial modifications, which actually didn’t do anything malicious or put a backdoor, allow any type of access.

Software went out. And they basically were able to prove, like, “Hey, I succeeded at doing this, it existed, no one noticed anything” — and essentially waited at some point to move on to phase two, which was, “Okay I can get in and go undetected, I can have it build, it all works, stuff makes it into production, no one notices.” And they said, “Okay, well, I’m satisfied with that. Now it’s time to, you know, go for broke and put the actual code in there, and open the floodgates.”

Sonal: What you just described, Steven, sounds exactly the way a company builds a product. Like, “Hey, we’re gonna test it out. We’re gonna try an experiment, an MVP, a minimum viable product, if you will. Then we’ll, based on that, decide how to deploy it and target it, and blah blah blah.” I mean, I hate to say that, but that’s exactly what you just described sounded like.

Steven: Yeah, it honestly wouldn’t surprise me if they had done some way of trying to basically clone their development environment too, and probably tested this — I would guess — probably pretty thoroughly before they even <Sonal: wow> ran the tests within their network

The hackers’ sophisticated methods

Sonal: So, they were incredibly savvy in certain ways, in terms of how targeted they were, and the choices they made.

In the Microsoft blog post, one line in particular really struck me. It said that the threat actors were savvy enough to avoid giveaway terminology like backdoor, keylogger, etc. Instead, they gave their tampered code an innocuous name, “Orion Improvement Business Layer,” that would fit right into a marketing brochure. (This is from an Axios post summarizing it.) “The attack’s crucial door-opening exploit was a small chunk of ‘poisoned code’” — which is what Microsoft dubbed it – “all of five lines long or roughly 160 characters.” And then Ina Fried at Axios goes on to comment (which I had to chuckle, even though it’s sad), was, “This could well be the most damage per character yet achieved in the short history of cyber warfare.”

So, I am curious if you have any thoughts on some of those — honestly quite clever — things that they did, to hide undetected. And any more specifics you could share there. And then we’ll go into the step-by-step in a moment, too.

Joel: The fact that they’re not naming variables and naming things that are commonly used in attacks is mostly a credit to the existing kind of antivirus and anti-malware industry. You’ve got a lot of tools that are out there that are looking for this stuff. And you would imagine any adversary that’s relatively sophisticated is gonna run their changes through all those tools to make sure they don’t get detected before they deploy it.

And so, that’s just table stakes for this kind of activity. It doesn’t really show any kind of real sophistication.

Sonal: Of course, it just depresses me to hear that — and we’ll talk about this at the end, which is what companies and people can do. Because I’m, like, great — the better and better we get, the more and more sophisticated they get, and it just becomes this like never-ending back-and-forth, back-and-forth escalation.

Joel: Espionage 101.

Steven: Yeah. To be completely honest, that stuff doesn’t surprise, especially when their job is to, like, blend in as much as possible.

But I’ll add to one of the things — and make sure that we give credit — some of the analysis of things we’re talking about today are obviously from — a lot of security communities have come together and published a lot of detail, which has been great. But this is one of the other things that they did, is, they actually used an existing config file that is part of SolarWinds Orion, that’s there legitimately — it was there five years ago, it was there two years ago, it’s there right now — but they actually repurposed that exact config file. They created a specific value and said, if this is a three, you shouldn’t beacon it, you’re basically turned off. And they use values in fields within this to then leverage that file that’s already being read and used by the program, to then also inform it on some of what it should do.

So they use, like, native, existing files and functionality and things that are very innocuous-looking. And then they did a couple of other stuff beyond that, that are pretty stealthy –– although they’re not necessarily rocket science, they are very uncommon.

~ One of them is the fact that this backdoor, once it’s loaded, it wouldn’t start its beaconing or calling out for this DNS activity (which I know we haven’t explained yet), but basically, the mechanism by which it actually gives that avenue of control back into the system. You have to meet certain criteria before [you can] even, you know, beacon. For example, if you weren’t “domain joined” — meaning you’re less likely to be an actual corporate asset. You’re someone testing it on a computer, you’re a workstation at home. You’re not even gonna pass the sniff test.

~ But what they then do is actually set a timer. And so, it might be actually up to two weeks before it actually starts doing anything. I might be under scrutiny from QA, or a build, or someone might be looking at it when they first install it, make sure it’s not malicious — so they actually say, “Hey, I’m just gonna wait two weeks. I’m in this environment, this is for the long haul, I’m not in a rush to immediately get access to these systems.” So that’s an interesting aspect. It’s actually fairly uncommon to see malware that is on any timer of significance, or driven by a specific event that’s likely to happen very soon.

~ The other thing that was really interesting: The malware basically would activate when a certain response was given to its query. “Hey, go connect to this domain name,” or “go connect to this website.” And, those domains that they used were actually domains that had expired. One of the telltale signs when you’re looking into malware and things is, like, “Oh, it was just registered last week or last month or earlier today.” So, this would pass that sniff test, all day long. Some of them had five or six years they had existed. It might even have, like, a website. They picked up infrastructure that had a history to it. They actually owned and controlled these domains. They weren’t, like, hacked domains or things like that, where they were using compromised infrastructure. So, just kind of an interesting note on that front.

Sonal: It’s interesting and, honestly, a little creepy. I got goosebumps while you were talking, because it makes me think of every long game. The patience, and waiting, and stalking — that really skilled predators do. And I don’t mean to glorify it by any means, but I am just sharing that what you just shared in technical terms — it gave me goosebumps, quite literally. I don’t know how you think about it.

Steven: When we first saw this in July of last year, we had I think three domains that we had seen used in that actual attack. And as we looked into them, we said wow. Like, we kind of noticed it’s just, like, yeah, these things have a real history. You know, what the hell is going on here? And then we found a way to find more of their infrastructure (even if we hadn’t seen it used in the attack), and they all had this in common. Like, we had a way which we could figure out and find some infrastructure from some mistakes that they had made. That’s why in our post we actually were able to provide a lot of indicators. Like, DHS included that in their list and everything.

But, other than that, each one of the domains we looked into, we just instantly knew at that point — I mean, we already knew we were dealing with an advanced threat actor, but — we were kind of thinking to ourselves, like these guys have really stepped it up a notch. This was actually the third time we had dealt with them in an incident-response engagement. But this was, like, a little bit different than the other two rounds. There’s a number of things that just made it stand out, and that was definitely one of them.

Sonal: This might be the first a16z Podcast Network show to be optioned for a movie. I’m just gonna say it right here, on air. Joel, anything to add to that before I switch into the detailed step-by-step?

Joel: I mean, only if Matthew McConaughey plays me. <Sonal laughs> No, I’m just kidding.

Sonal: I listen to him on the Calm app every other night or so.

Joel: Yeah no, I mean I think that’s exactly it. Just the level of preparation, and just the long game that these guys are playing.

You know, this malware stuff is pretty common on the financial crime-ware type side, right, people trying to steal money. But those actors typically register domain names within a day, it’s just all very phish-y and suspicious. But to see someone build these, like, really advanced, large, complicated infrastructures, years ahead of using it — it just belies a real level of sophistication, you don’t really see every day.

How the attack took place

Sonal: Okay. So, just to recap for listeners where we are and where we’re going — we’ve covered what happened at a high level, including some of what’s hype/what’s real, and interesting or undercovered in the media.

You did a great job summarizing, Steven, but let’s now spiral into that a bit deeper and fill in some blanks that you haven’t covered. Both technical details — you mentioned the beacon, DNS — I want all of it. How folks figured things out — so we can then know what the open questions still are, ripple effects and implications, and then more on supply-chain compromises and what we can all do. But I especially want to know the anatomy of how they got access to the emails. But start from the very beginning of the timeline.

Steven: Yeah, so the story of the SolarWind supply-chain compromise obviously starts with SolarWinds — and that’s probably where some of the question marks are currently, and they might remain that way. They were breached sometime at least as of late 2019, and then ultimately — what came out later in May of 2020 — pushed out an actual backdoored version of their software. A backdoor meaning, a piece of software that shouldn’t be there, that allows this foreign adversary to have control or remote access into these systems. So we’re talking in late May, that happened. From the cases we’ve been involved in and things that have been published publicly, we’re seeing that a lot of the threat activity started in June and July.

The SolarWind software would send out this DNS query. So, when you want to go to a website, you wanna go to a16z.com, you type that in. There’s a system called DNS, it says, “Hey, where is this located?” A DNS server says, “Oh, it’s located over here.” It’s the basis [through] which kind of you can find things on the internet, so you’re not memorizing these numeric IP addresses.

So, the malware — all it did, once it finally activated — it waited between 10 and 14 days before it would start creating these DNS queries — it would do these DNS queries from the SolarWinds Orion server. And those DNS queries contained encoded data. And if you decoded that data, it gave you different information, but one was information about the network that that machine is joined to. So for — in the example of, you know, say, Microsoft, it might show Microsoft.com or Microsoft.internal. Or, you know, one of these government agencies, it might say treas.gov.

But it would give this indicator, so that the attackers could actually see who these victims were — because remember, they were indiscriminately pushing out this software, potentially tens of thousands machines. That is an untenable thing to manage, and go and manually look at everything, and try and actually install software and do something of significance. And their goal is to stay under the radar, and not get caught. And then now they have to decide who it is they want to go after further.

So, they probably have a shopping list that they started with, and they probably have a new shopping list of things — they’re walking into the grocery store and didn’t even know they wanted that, but now they know they do. And they essentially issued commands, and allowed them to initiate this backdoor on who it was that they wanted to attack. And they did this through a specific DNS response called a C-name value. So, it says, “Hey, where’s this host name?” It responds back. They would actually send a specific response to prep it, so that the malware would be waiting to know that next time something happens that it should take a specific action and open the backdoor.

It would respond with these domains. And these domains would basically be the control points of where the attackers within have the hands-on keyboard — a human is doing this at this point. Someone says, “I am ready to take a look at this system,” and now hackers that are behind this are actually involved, and they’re saying, “Now I wanna look around and figure out. Is this a test machine? Is this a real network I’m interested in? Is this a lab environment? Is this a staging environment?” You know, things like that. And they can figure out, “Is this the real deal? Does this have access where I want? Do I want to proceed?”

And they did this for — we don’t know how many organizations, and that’s the real scary part in all this, is — you have all these people that have come forward, and they’re, like, big companies or they’re these government agencies, and, that’s just the ones we know about. I don’t think anyone has a real notion of the size and scope of where they took a further interest and then actually did something. In our particular case, we got permission to write up and share details of our incident investigation. The attackers were very focused on getting access to email of specific individuals. So, their goal was maintain access, move around — you know, get what they need — having access to specific individuals, and what they’re writing, who’s sending them, why they’re communicating — was a key focus of what they’re doing. We were able to see that they did that.

The interesting part, in kind of stepping away slightly from SolarWinds — and why the intel community and law enforcement says it’s likely tied to Russia (APT29, or the Dukes) — we’ve been tracking a group we call Dark Halo, just because we’ve dealt with APT29 on many occasions in the past, but we just have no real way to link the two.

But what was interesting to us, is the story of this group didn’t start with SolarWinds. We worked three separate incidents involving these SolarWinds attackers, who we called Dark Halo — so, this is a story that starts well before, and has multiple other avenues.

We had actually dealt with them back in 2019. We had an organization we were doing work with, and we kicked the group out. They went away. In our initial response, we had determined they’d been in that organization for 4-5 years prior. They came back in Q1 2020 through an Exchange control panel vulnerability. You know, mail service — they had a vulnerability that attackers will take advantage of. Got back in, stole email for certain individuals. They were kicked out and removed again. That’s what we did. And then they came back a third time with SolarWinds in July of 2020 again. We didn’t have a good way to prove it, and we took steps and mitigations in place to deal with it.

So to say, “Hey, how did they get into, you know, SolarWinds,” or wherever else they’re operating — well, this isn’t their only trick. They have a lot of tricks up their sleeve, and they’ve been able to do this and operate for quite some time.

Sonal: Wait, so how did you make that link across those separate incidents that it was the same group?

Steven: I’ll tell you, and it was something interesting, is — if we had worked them at three different organizations, we actually wouldn’t have come to the conclusion that this was a single threat group. We wouldn’t have linked the three things.

Any advanced attacker, anyone in network, they have certain commands and things that they’re gonna do — but they changed enough between each of the attacks, that the actual techniques, the tools — there’s a custom malware, or a commercial script, or a public script, like <inaudible> or a pin-testing framework, or these different toolings, or a web shell — they changed it between each one of the hacks, where it was able to be very non-obvious it’s the same group.

But what they did is they went after the email of the same people each time — and why we are 100% certain it’s the same group, is — when they would steal email. They would only take a certain amount of email. They would specify, “I want all the emails since the last time I took it.”

Sonal: Oh, so it’s like incrementally building on the total — oh my God, that’s so fascinating. <Steven: Exactly!> Keep going, yes.

Steven: So in early 2020 they got back in, and they said, “Okay, well, I want all the email for these particular users since, you know, a specific date in 2019.” And then when they came back in through the SolarWinds vulnerability, they basically said, “Hey, I want every email for these people, and I only want it starting from this specific date range starting in early 2020.”

So, we had each time they came back and asked for the email since the last time they did it. So in the one case, obviously, they had an intimate and previous knowledge. The other cases we worked, they didn’t have as much knowledge. They had to work their way and kind of figure out the way of the land. So, we’re dealing with the same group in all three incidents — that’s an interesting tidbit.

Sonal: I was about to say, I still have goosebumps. That’s incredible. That was so good, Steven.

How the hackers covered their tracks

Joel: Pretty impressive analysis and work there.

The things that really jump out to me is, this is something that is linked together over a 4-plus year campaign, trying to maintain persistent access to the communications of high-value individuals.

I think the other thing that really jumps out to me is that they have a big data problem. They got access to tens of thousands of computers, and potentially thousands of organizations. It sounds like the kind of analysis that Steven has done is pretty unique. There aren’t a whole lot of people in the world that can do that sort of thing. And so, this is probably an incident that we’ll be continuing to understand for the coming months, if not maybe years.

There’s probably gonna be a really long tail on that. These people are still out there, they’re still operating. What are they doing now? That’s particularly concerning.

Sonal: It’s interesting because Martin Casado — you know, our general partner, who’s also a security expert — he mentioned to me that he thinks it’s super interesting how interactive the attackers are during the attack. Because it’s obviously a very sophisticated team of people gathering data and making chess moves in real time.

And it’s so fascinating because when we report and talk about and communicate these types of attacks, we kind of make it seem like it’s malware that does all the work — but it’s really the people that are at the center of it. And then on the other side of it, you have this whole interesting dance, on your end — as sort of this forensics expert with your team, going in, and trying to figure it out, and the puzzles, and everything involved.

Joel: Well, you know, I heard chess is popular now.

Sonal: <laughs> Queen’s Gambit, right.

Joel: This is exactly like playing a game of chess. The difference is that you don’t see the moves immediately — they get revealed over time, and then you’re left kind of piecing other things together.

Sonal: That’s exactly the analogy.

Steven: Yeah, I definitely agree — that their goal was to actually not have their moves — what they did never be understood. You know, we noticed the versions of their software that were downloaded. There was an update to SolarWinds Orion — I believe it was in August of 2020 — and that version wasn’t backdoored anymore. It didn’t have the malicious code. So we initially speculated, “Oh, did the bad guys remove it? Did SolarWinds find it? Did it inadvertently get removed?” We didn’t know how it was going down at the time.

So, they removed the code. They got in, got all this access, and basically said I’m gonna try and remove this now and, like, fly under the radar. So, if they had their way, they would have pulled off like the perfect caper, done all this stuff — no one would have known how it happened. And then the Orion product, basically, would have nothing malicious in it. <Sonal: wow> So, just a, kind of like an interesting other thing that they did.

Sonal: It is. It’s a very vivid contrast to the analogy of chess, especially given the popularity of Queen’s Gambit, when you see them recording their moves, and the spectators watching — it’s a real contrast to this idea that you’re literally making the move, peeling it back, making the move, peeling it back — it’s really stunning.

Open questions for the experts

Okay. So my next question before we talk about some things we can expect to see moving forward — what are some of the open questions still on the table? Like, we know SolarWinds was compromised, but the big open question there is obviously we don’t know how. Then the second big thing in the Microsoft post that I saw (and Steven Sinofsky pointed this out), which is, you know, they do this outline, but we still don’t know how the signed code was signed, so that whole idea of “sign the code” is a bit of a mystery still.

I want to hear from you guys, what are your open questions — or what are the open questions the industry is still looking at, or that people should or shouldn’t look at?

Steven: Sure, yeah. So, how was SolarWinds compromised? Obviously one of the open questions. You could spend as much time and resources — you could use infinite resources, and you may not ever be able to answer that question because that system is gone. It was wiped. All the logs are here, that was never logged, or it happened five years ago. So, I would say the scariest part of this — people are finding out about this in December, for something that was operationally live in May. They had a looong headway into breaking into different organizations, doing that shopping list. And there are going to be — and there have been — from this very group, and as a result of the SolarWinds compromise, more supply-chain breaches.

Some people are breathing a sigh of relief, “Ahh! I didn’t run, you know, SolarWinds Orion software. I’m safe.” That’s not necessarily true. We’re not trying to sow fear, uncertainty, and doubt that everything is untrusted — which arguably, you need to go to a typewriter, send pigeons now — but it’s IT companies, it’s security companies, it’s managed service providers, it’s managed security service provider. There’s these different people that were running SolarWinds that then had this level of access to either directly get into networks, get into email, get into authentication systems to provide software or software updates or software downloads. They 100% certain had access to numerous networks and systems that would allow them to rinse-and-repeat SolarWinds, probably on numerous different scales, in numerous different ways. It doesn’t have to be through a build-time compile. It could be, they change a download, they change an update process. They took keys, or secrets, or remote access protocols, or passwords that got them into like other networks or other systems.

So, the scary part is, is that the supply-chain compromise here is just causing a chain reaction that’s probably already impacting other organizations that have no idea. I think that’s one of the biggest questions, is — who else was victimized that we don’t know about, and what do they do?

Sonal: So what you’re basically describing is, like, this complex, adaptive system — like, everyone sort of networked and connected trying to tease apart the scope and ripples of this is gonna take ages. And we might never, ever get to the bottom of all of that, because of that connectivity.

It’s interesting because General Paul Nakasone, or Nakasone — I’m not quite sure how to pronounce it — he heads both the NSA, the National Security Agency, and the military’s U.S. Cyber Command. One of the things that they talked about is that developing a coherent, unified picture — what you just described, Steven, of the extent of the breaches — has been difficult. The challenge is that, “He’s expected to know how all the dots are connected, but he doesn’t know how many dots there are, or where they all are” — which is kind of a distillation of what you just described. What are the other open questions that are on the table?

Joel: For me, the big open question — and with all of these really sophisticated breaches, the first is, how many stupid things led up to this? Like, how many ridiculously, easy-to-solve problems, like applying security patches, or using two-factor authentication — like, how many of those kinds of things we know we should always do are responsible for this — is always front of mind when we see this.

Because I think when you double-click on these, a lot of the times it starts off in a fairly innocuous way, which is, like, someone guessed an account, or someone got access to some account. But as this event shows you, if you give a sophisticated actor a toehold in your organization, they’re just gonna run through it. So, that’s the first one.

And then the second one is, we think of these breaches — because of just the way the media covers them, and the fact that they kind of show up sporadically — we think of them as, like, events in time that have a start and finish. But, in reality, these groups are still running, and we’re still facing them. You don’t know the implications of any of this stuff for a while. Like, you don’t know if they were getting into the Department of Energy to read, you know, Rick Perry’s old emails, or if they were getting in there to steal futuristic bomb designs. Maybe there’s gonna be some new weapon that pops up in 15 years and it’s, like, linked to this breach. And we’ve seen from these breaches — like, if you go all the way back to some of the first ones that have been publicly reported — you know we’ve often seen that the goal of these is either to spy on individuals and get some intelligence there, or to steal the designs for things that people want to go recreate.

Sonal: Right. And don’t forget that oftentimes — I think we often forget to talk about, when we talk about intelligence — it’s often in the form of blackmail, right? Like, we’re not just talking about stealing IP and obvious secrets.

Because a lot of people dismiss this as, “Oh, email. I just book events and share, like, photos with the family in my email.” I don’t think they realize that it’s such a vector to all these ways of really exposing who you are. It’s your identity, in many ways. So, that’s another way to think about that too.

Joel: Absolutely.

Sonal: Anything else on the open question side?

Joel: So, a bunch of other secondary breaches are now being reported on. Some of the Microsoft stuff, you saw that there were people creating reseller accounts, or trying to get reseller access to people’s Office 365 enterprises. And then there were certificates that were compromised for things like Mimecast, and maybe perhaps other services that are out there.

And so, like, this picture starts to emerge that there’s these — lots of fires just started burning. And it’s always really difficult to tell if it’s one fire massing together, or just a bunch of different people that are acting independently.

Sonal: That’s actually something I wanted to really quickly touch on before we go into the rest of this. Because the thing that was confusing to me is, okay — so, I read the Microsoft post. You know, like, there’s some intrusions. That there was a partner for Microsoft, actually, that handles cloud access services. We don’t know how connected or not connected it is. Then you have a reseller gaining access to Microsoft customers’ Azure accounts. Then you have this reported Russian state-sponsored effort exploiting a VMware flaw that the NSA warned about last month, that takes advantage of a recently announced vulnerability in VMware Workspace One access. Access Connector, Identity Manager, etc. And, this is according to the NSA, that they’ve had at least one case — that they’ve successfully accessed protective systems by exploiting the flaw.

And then you have, like, you know, one after another, and they issued a patch. I mean, I am reading all these at the same time and I’m like, is it all the same thing or not? I think that’s what you’re saying, Joel, about — we don’t know if it’s all one fire, or a bunch of fires. And do you guys have any thoughts on how to connect those dots, if at all?

Steven: So, as a general statement, I would say what we know about this attacker that we call Dark Halo — the people behind the SolarWinds hacker — they’re extremely adept in methods that allow them to gain access to email or systems involved with email. So, things like trying to get access to an Office 365 or Azure AD environment through a partner organization. Or by stealing some, you know, SAML tokens or some kind of authentication mechanism. Or, trying to get access through some other — possibly through a vendor — to get access to that same data to email data, essentially by any means necessary. I would say all of those are very on par with what we’ve seen this attacker do and focus on, and what others have seen. A very good chance that they are related.

But even if they weren’t, it just kind of underscores that there’s a lot of people trying to get access to this data. And now you need to focus a lot more on the cloud, on the technologies that are used to secure the cloud or that have access into it. And the things and places where people don’t always look — because it’s new to them, or they never looked at it, or they didn’t know to look at it — so, I think this event will actually end up advancing security in many ways, because it’s causing people to think about and do things that they weren’t realizing before. And as you can see, the bar’s been set higher to where they can’t walk right in the front door anymore, right? They’re not easily able to get right into these organizations by compromising, you know, the core network or the system administrator and the other ways which you could get there.

So, in some ways, it’s a sign that security has improved a lot — but also that there’s a massive amount of work to do at the same time.

Sonal: It makes me again think of the chess analogy, and when you have a player that comes to the table that has a set of moves — like, patterns that are well beyond what the human mind can even comprehend — and that makes me think a little bit of even, like, AlphaGo playing Go with a real chess player in Korea. And how you know the system made moves that they considered very alien, but that a human being would never have done, but that still follow the rules of the game — the constraints of the game, that is — and yet were completely novel. And if you just keep seeing more and more moves kind of grow and become more and more sophisticated on both sides — even as we may improve, like, there are gonna be alien moves at some point.

Steven: Well, to be completely honest, they’re undoubtedly highly skilled and disciplined — which, if you think about it — okay, if we go back to the chess analogy. You know, are they a master, are they a grandmaster? In some ways you can say, okay, they’re a grandmaster — but most of their opponents are unranked. So, they have this kind of lower skill, and their strategy is easier. But then they’ve been able to go to these people who maybe their security defenses are much higher ranked, and they’re using that skill set, that knowledge, and that kind of cat-and-mouse, to still get into those organizations. But to have to do that, it shows that people have leveled up quite a bit — which is a good thing for these companies and the security industry.

But at the end of the day, they still managed to either capture that king or get them to knock it down. I guess no one’s really thrown in the towel. No one has surrendered, that I’ve seen so far. But, I would say they’re winning a lot of matches, and they’re playing a lot of them simultaneously.

Sonal: Right. But they are not (to be clear, to your point), an alien player, like an AlphaGo. They’re still moves that are human, just very skilled.

Steven: At least from what we’ve seen, but who knows, like, what we’re missing though, right?

Broader security trends

Sonal: Right. Okay, so now the big picture questions. We’ve covered what happened, how it happened, the details. We talked about this, you know, phenomenon of supply chain attacks, chain-of-chains, what it means. I would love to hear what you think about this, when you think about the broader trends at play.

Joel: Yeah, absolutely. I think on the podcast several times — I know I sound a bit like a broken record — but we’ve talked about the biggest challenge being securing the supply chain. And how all these businesses that are becoming software businesses are actually becoming reliant on other people’s software. And so, it’s not just a matter of the stuff that you write to run your company, it’s also the matter of the stuff that your suppliers are writing.

And as everyone knows, security is really difficult, and it’s hard to secure your own things — and then having to worry about the security of your suppliers is adding an additional layer of complexity.

And so, over the last couple of years, there’s been a lot of investment in trying to understand third-party risk management, vendor-risk management. How to glue these things together. There are several different approaches, everything from private systems that will look for vulnerabilities and report on the risk. There are publicly available standards, different trade groups are trying to develop their own standards for security — and then certain vendors are trying to come up with their own standards. There is no easy answer, and so what you’ve got is a lot of different approaches that are being tried, and a lot of experimentation that’s taking place. This is probably the first breach at such a size, scale, and scope. So this is kind of the watershed moment for that third-party risk management.

And there’s any number of other suppliers that are out there that are in very similar positions, right, and it could be a company like SolarWinds, or it could be an open-source repository that a bunch of people are building into their applications. There are any number of different ways.

The thing that’s really difficult for me — based on where I sit and what I see — is if you play through all the different potential solutions that are out there, it’s really hard to know which one of them would have actually prevented this? So, like, if I went to any of SolarWinds’ customers and said, “Hey, what’s your vendor risk-review report on SolarWinds?” You know, before the breach, I’m sure they would have said it was a wonderful company, it was doing everything, they passed our review, they answered our questionnaire. You know, they’ve got the people hired, they have a program. And so, it really comes down to how do you actually measure these things, and how do you measure the risk in that third party, and how do you effectively mitigate against it?

Steven: The third-party risk or the vendor-risk management or how that someone evaluates this — it can only go so far, right? Like, how would you evaluate SolarWinds and the Orion product any differently than you would Microsoft Windows and Defender and how it updates and things like that, right? So there’s limitations to what you can do. I mean, you could audit them, or find out their code-review process and all that stuff — and they could have passed that all with flying colors. Or does your checklist say, “Are you looking for advanced adversaries, you know, injecting themselves into your build process at the highest levels of sophistication and espionage?” But even if they check yes to that, which they might not, they probably aren’t having an effective way or mechanism to do that.

Sonal: One of the things that Alex Stamos — people tend to over-quote him, but he did have a good tweet about this, which is — “There is no good reason for most enterprise software products to talk to random internet hosts all day. It might be time to move on to an outbound network-permission model for Windows servers, so connections only allowed to domains and signed manifest plus internet as defined in GPO.” Is that the right thing to do? Should people be air-gapping? Like what should people be doing?

Steven: We deal with sophisticated breaches all the time, and this can even apply for, like, crimeware and other stuff, but that is a recommendation that Volexity has been giving for years and years and years to organizations. And it’s often in an incident that we say, “Hey, your domain controller, for example, doesn’t need to be able to talk to the internet.” There’s obviously exceptions to the rules and everything, but usually those can be defined, especially with next-generation firewalls or modern firewalls — you can define what is actually needed, and allow them to do those things, and not allow them to do anything they’re not explicitly required [to do].

And that’s a model that is the least privileged, it’s like the least-access type model. That’s a little bit harder, depending on your organization, to enforce for users and workstations where they need to browse the web and do all this stuff. And that’s what content filters and certain restrictions are for. You know, unless you’re into, like, a DOD environment or something where it’s a lot more locked down. But that’s usually accepted in a lot of, like, commercial organizations.

And a server is where — an attacker, if they’re gonna install malware and do things — usually go for it, because that’s where the supply chain, that’s one of the big areas to get to it. Or those are the machines that are not at home, or requiring a VPN. They’re always on, you know, they don’t get rebooted frequently. That’s where malware gets installed a lot, because it’s something that they can count on, and it’s regular. Being able to prevent that, and limit what those can do — that model, if that had been put in place for organizations with SolarWinds — in this specific instance, it would have mitigated that threat.

Now, if I start thinking outside the box, and this attacker used DNS — but what if they had done command and control activity, and issued commands, and had done that all over DNS? So, the SolarWinds server talks to its local DNS server, your local DNS server goes out to the internet. If they had modified this malware and actually did all the command and control over DNS, instead of doing it over this connection, that paradigm and that shift would have been a lot more difficult to mitigate.

But that’s the type of issue and security item we need to think about. You could proactively try to address that, or just say, “Hey that’s a lower likelihood, and I’ll address it if that happens.” But by and large, it’s a best practice with regards to minimal access, specifically for servers connecting to the internet and different resources.

Joel: It’s funny talking about this, because it’s like the history of the security industry is the history of unreasonable requests.

I know that a lot of people are jumping up and down talking about, like, don’t let production talk directly to the internet. And if you worked at a bank you know for the last 20 years, that’s been the case, right. Like, highly-regulated industries, and people that have invested heavily on security, have always focused on doing these rather idiosyncratic things that don’t make a lot of sense — but made a lot of sense to people who either come from an incident-response or a deep security background.

You know, back in the 90s, I remember being involved in strenuous debates about why you need to encrypt traffic moving within your data center. And everyone thought it was the most asinine thing because it’s a private link. You’ve got MPLS, no one’s gonna listen to you — and then Snowden released his documents. And it became really obvious why you want to encrypt your data within your data center.

So, this is just another example where people have been giving best-practice advice, saying, “Hey, you need to make sure that random servers, random production systems, can’t just talk arbitrarily to the internet.” And the response to that has generally been well, that’s an unreasonable request, that takes a lot of work, I don’t know that we necessarily wanna do it. And, there was never a particularly great reason or piece of evidence to point you to say “well this is why”. So, this is why — why you wanna limit that access. And there’s probably a list of other things that are equally unreasonable requests that security people would ask you to do, and eventually they’re gonna have their “this is why” moment.

Steven: Something that Joel mentioned earlier, which I think is really important, is — a lot of organizations aren’t doing blocking and tackling. They don’t have two-factor authentication on the remote access to their network. They’re using weak passwords, they’re not patching. They don’t know where their assets even are, and their build process is not secured. They don’t even do code auditing or check-in their code. I mean there’s a lot of low-hanging fruit for most organizations. They haven’t even been able to kind of get into some of the basics.

But I think a big problem that a lot of organizations — whether that’s a government, commercial organization, or really anyone, whether they’re a small company or these massive companies with huge budgets — a problem that they’re facing is that if you had certain security data, you could immediately and very easily answer, “Did I have a problem?” One, did I run up vulnerable software? Because maybe, you patched. You know, I don’t know. Maybe I never ran, and I skipped a version. If you had all your DNS queries logged and the responses, you would say, did I get a C-name? Did I even call out to that command and control activity? There’re certain logs from the endpoints that SolarWinds has instrumented in these event-log data. If you had been capturing that data, you could answer that question.

Sonal: Most companies do capture that data, don’t they?

Steven: It depends. If you went into SMBs and mid-sized businesses, even some large businesses, I would say a lot of them aren’t actually logging or keeping DNS logs. And if they are keeping DNS data, it may not be query-and-response. And event logs — the vast majority of organizations don’t have a centralized and long-running retention policy for event logs. But even if they do, their data retention of how long they were keeping this data did not go back far enough.

They actually had data — they have data going back 30 days, they have data back 60 days, 90 days — so they’re finding out in December about a breach inside of activity that happened and then potentially initiated in May. And, “Oh, I kept all this great data, but I can only go back three months.” Three months from December, it’s September. And for a breach that happened in June or July, that’s, in some respects, useless. That’s a scary place to be in, to not know if you were compromised, or if you were when it started, or what happened, or where did they go, how did they pivot. It’s a missed opportunity, and probably a bit scary for some of these companies is that I was collecting all the right data, but I didn’t have it for long enough, so I don’t actually know.

Sonal: Wow.

Steven: We’re helping a lot of companies right now to see what resources they have. You know, we specialize in memory forensics. We’re acquiring memory from their SolarWinds server, acquiring disk artifacts, or full disk images, you know, any log sources. And we have some stuff that we can potentially go in and say “doesn’t look like it” or “definitely, yes you were.” You know, we see these items that clearly indicate that you got a second-stage breach, and you need to expand this out. But we can’t give anyone, if they’re on limited data, a confirmed clean bill of health.

Sonal: It’s a little bit like going to the doctor and having, like, maybe a continuous glucose monitor for the last year — but you only have the data for the last three weeks stored. And it’s sort of like, “Okay, here’s what’s happening. I’m getting sick, but I only have the three weeks.” It’s just, like, a really tough thing to figure out.

Advice for businesses and consumers

I wanna break this down by advice for big companies — like, large enterprises — advice for small and medium-sized businesses, and advice for consumers. So, let’s start with the big companies, because the best threat actors, they understand the reality of modern enterprise IT. What are pieces of advice — or mindsets, even — that you have to offer for how chief security officers, CEOs, leaders should be thinking about the implications of this for their business?

Joel: I mean, I’ve spent a lot of my career in big companies, and I think the thing to do right now is to think about strategy. Like, the tactics are great, and there’s gonna be a lot of people chasing a lot of actions over the next days, weeks, months. But I think the strategic view of how an organization wants to think about security — as we start to understand what happened, and how it happened — we’ll consistently see in some organizations that security either wasn’t funded, it wasn’t empowered, it didn’t have a remit to act. It may have been under assault. People often view security as being a cost center, as something that you know contributes to the lack of performance in a business. And that is an attitude that is still quite popular.

So, I would say that, like, it’s really gonna be about figuring out strategically, where does security sit, what’s the right amount to spend on it, how do you effectively empower it, and then how do you partner and build security into your business so that it’s something that helps enable it, versus something that holds it back.

Steven: Yeah. Generally, no one really thinks like security is not important. I don’t think we ever hear that. Now, action may speak louder than words sometimes. But I think a lot of people think about, “Oh, it’s an afterthought. I’m gonna add it later,” or “Oh, yeah, yeah, well, you know, we’ll do that one day.”

And I think, like, our main advice to a lot of these different organizations — whether it’s a startup or a midsize company, a company that’s growing really rapidly – is not necessarily that they need to come out of the gate and have to have every imaginable security product, they need to be auditing all their source code on day one, they need to have everything locked down, and the latest firewalls, and this filter and all these EDR products. But it’s like, think about that stuff. Are you doing the two-factor? Are you lazy, like, “Ah, I don’t need to put, you know, two-factor on my Salesforce account where all my most sensitive contacts and information is in my organization.” Or, “Ahh, I don’t really need to put it on email. It’s like, it’s easier if everyone can just log straight in.” Or, “I’m just gonna share this root, you know, Amazon key to get into AWS, because that’s just how our organization’s growing, and we’re not formal.” There’s things that people can do — best practices, actions that organizations can take — see what you can do now, see what you can do along the way, and put that on your radar, so you’re not in a position where you’re starting from scratch, or trying to investigate a breach, or figure out if you even had a breach. We all knew [what] we should have done, and we knew that two years ago. And we run into that, a lot.

Sonal: Don’t wait till later. And now advice for advice for consumers, like, just day-to-day people like family members, etc. What would your advice be for how to think about things like this?

Joel: We wrote a really excellent blog post last year called the “16 Things You Can Do to Protect Yourself,” and I would strongly recommend that people do all of those 16 things. It’s all really basic stuff, and it starts with two-factor authentication, patching your systems, and goes all the way down to how you want to think about securing your potential social media accounts, etc. So.

Steven: Yeah, we issued some guidance, and it’s a couple of intersections of prevention and detection, and then remediation, if you have an actual threat or concern.

From the prevention side, prevent unnecessary access from your servers — like, your SolarWind server, other devices — from talking to the internet. That’s a prevention mechanism. You know, monitor your assets, see where they’re logging in from, if you have that centralized logging or like a SIM, same thing. Make sure you’re capturing either from event logging or your endpoint security products or that the actual commands being run on the system are being logged. Because that can be pivotal and be critical to 1) detection — but even if you’re not actively monitoring it, you can go back and say, “Hey, what commands are running on this server that’s not consistent with what our system-admin or the typical activity would do.”

But take a look at your mail server, look at where your email is going, because that’s where the attackers, I believe — they’re way ahead of the game with regards to the things that they can do in Office 365 and Azure AD, where they are so familiar with the administrative commands and what to do from a sys-admin aspect. They’re able to do a bunch of things and hide in ways that people have never even thought about and encountered. And it’s not necessarily, like, they’re ghosts or they can’t be found, people just don’t know to even look for it.

And then, just from a general remediation perspective — once a device has been backdoored or compromised, it’s an untrusted system now. Don’t just, like, roll back to an earlier version, or, I’m just gonna upgrade to the new version. We say, hey — blow that whole system away. Start with a fresh, clean install. If you’re putting SolarWinds Orion back on it, download the newest version that’s not backdoored, and start everything from scratch.

Anything you used on that server, if your SolarWinds set up for the Orion had credentials, change all those passwords, and make sure those passwords aren’t similar to, like, old passwords that were used. You know, another thing, too, is — any sensitive API key integrations and things — like, we saw two-factor bypass to get into email by this threat actor. Because they had taken a secret key, and would generate cookies and skip into the email system while not actually being challenged for two factor.

You’ve got to think about the stuff that someone could steal if they’re in your network, related to this — but also that advice extends well beyond this threat actor and SolarWinds specifically.

Sonal: That’s great. I’ll include links to Volexity’s blog posts as well as the “16 Things That You Can Do to Secure Yourself” in the show notes. Bottom-line it for me — what’s your takeaway?

Joel: It’s consistent with what we’ve been saying for a while now. The hardest problem to solve is third-party risk, and this is probably the most significant third-party breach that we’ve seen in history. And so, I think it’s gonna take us months to really understand what happened, and probably years to fix it.

Sonal: Thank you so much, you guys, for joining this episode of 16 Minutes, which is a 3X 16 minutes.

Steven: Definitely, thanks for having me.

Joel: Yeah, thank you so much. And, Steven, it seems that we’re always catching up when the world is burning down.

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

Posted January 31, 2021