Causality 52: Colonial Pipeline

10 December, 2023


In 2021 many of Colonial Pipelines IT systems were locked by malware and out of caution they shutdown the fuel pipelines feeding nearly half of the Eastern US leading to chaos at the gas pump and a state of emergency being declared. We look at how poor off-boarding hygiene led to an easily preventable cyber-attack.

Transcript available
(soft music) - Chain of events, cause and effect. We analyze what went right, what went wrong, as we discover that many outcomes can be predicted, planned for, and even prevented. I'm John Chidjie, and this is Causality. Causality is entirely supported by you, our listeners. If you'd like to support us and keep the show ad-free, you can by becoming a premium supporter. Premium supporters have access to high quality versions of episodes, as well as bonus material from all of our shows not available anywhere else. Just visit to learn how you can help this show to continue to be made. Thank you. Colonial Pipeline. The Colonial Pipeline Company, based in Alpharetta, Georgia, in the United States of America, owns, operates, and maintains the largest pipeline system for refined oil products in the United States, and has approximately 950 employees in total. The pipeline system consists of a primary trunk line with multiple spurs as it traverses between Houston, Texas, and the Port of New York and New Jersey. The pipeline actually consists of three separate main pipelines run in parallel with construction beginning in 1962 to its fully finished length of 5,500 miles, that's 8,850 kilometers in total. The pipeline connects Texas, Louisiana, Mississippi, Alabama, Tennessee, Georgia, South and North Carolina, Virginia, Maryland, Delaware, Pennsylvania, and New Jersey. The three main lines are 1,000 millimeter or 40 inch and 910 millimeter or 36 inch in diameter, one primarily for gasoline, with the other two carrying different blends, including diesel, jet fuel, and heating oil. The pipeline connects 15 oil terminals, including airports along its route, with an aggregate storage of more than 4.5 gigaliters, that's 1.2 billion US gallons of fuel, which provides approximately 45 days of local supply under normal consumption rates. This represents about half of all fuel consumed on the East Coast of the United States that is conveyed by this pipeline system. Colonial is also the largest refined petroleum pipeline operator in the United States by volume. Now let's talk about the incident itself. At approximately 5 a.m. US Eastern Daylight Time on Friday, the 7th of May, 2021, a ransomware note was observed on a business network computer by a Colonial control room operator in the Colonial Pipeline's Alpharetta, Georgia control room. It demanded a ransom of 75 Bitcoin, which at the time was worth 4.4 million US dollars to be paid in exchange for a decryption key to unlock multiple IT systems the hackers had encrypted. In addition, it claimed to have stolen 100 gigabytes of data and threatened to release it onto the internet if the ransom wasn't paid. The message was reported immediately to the control room supervisor and after a lengthy consultation with the Colonial IT department, the decision was made under Colonial's stop work authority to begin shutting down the pipeline at approximately 6 a.m. local time. At the time, the organization had separate IT and OT systems however, the CEO stated that, and I quote, "We did not know the point of origination of the attack "nor the scope of it, so bringing the entire system down "was the surest way to contain any potential damage." End quote. Joseph Blount, the president and CEO of Colonial Pipeline, authorized the payment of the ransom as demanded in Bitcoin late that day. However, it was not actually paid until the following day. Once paid, the hackers provided a decryption tool that could be used to restore machines that had been locked out by the malware. As news of the shutdown spread, it triggered panic buying and with now constrained fuel supply, pushed fuel prices above $3 US a gallon. Five airports were impacted with some flights canceled and others redirected with additional fuel stops now required. On Sunday, the 9th of May, 2021, the Department of Transportation, Federal Motor Carrier Safety Administration, issued an emergency declaration covering 17 states as well as Washington, DC to ensure all available fuel supply lines remained open. Also on Sunday, the US president, Joe Biden, declared a state of emergency that allowed the lifting of restrictions on how much fuel could be transported by road, rail, and sea to alleviate the fuel supply constraints. Now back to Colonial. With the decryption tool at hand, on Saturday, Colonial and Madient began restoring IT systems and confirmed OT systems were clean, gradually. The decryption tool provided by DarkSide was found to take longer to execute than a restore from backups, and hence by this time, since the IT and OT backup systems had been determined as being clean from any malware, they chose not to rely on the decryption tool after all. During the shutdown period, large numbers of staff moved some product manually from the field terminals where supported by the system and some locations. Approximately 12,000 gas stations were affected directly by the shutdown, with 71% of stations in Charlotte running out of fuel on the 11th of May, four days after the shutdown, and 87% in Washington, DC on the 14th of May, one week later. Once Colonial were confident that there was no remaining compromise to the OT systems, they began a restart process on the 12th of May, 2021. During the restart process, that took over 24 hours, the company added more air surveillance, as well as many personnel on the ground to visually inspect the pipeline performance, traveling some 29,000 miles in total for all personnel. Operations were fully restored on the 13th of May, 2021, though supply levels took several days to settle back to nominal levels once again. Let's talk a bit about the investigation. On the 7th of May, 2021, in the morning of the attack, Madient were engaged by Hunton Andrews Kurth LLP on behalf of the Colonial Pipeline Company. The investigators found the earliest evidence of compromise to the business network occurred on the 29th of April, 2021, only eight days before the attack. The attackers had gained access to the Colonial Pipeline's business network by logging into a virtual private network or VPN appliance. A former Colonial Pipeline employee's account was used, and this particular profile was a legacy VPN profile on an older VPN appliance that did not require multi-factor authentication. The former employee had used the same password for accessing the VPN account on a different website whose account passwords had been compromised and shared amongst many others on an underground or dark web forum. An interesting point of note, however, was that the password actually did comply with Colonial's strict password complexity requirements, including having mixed case, symbols, and of being of a sufficient length. The legacy VPN profile was disabled as part of Colonial Pipeline's remediation process. Several IT server machines were impacted. However, there were no reports anywhere of OT systems being impacted. One of the issues with this incident is that there is no publicly available report on the exact details of the incident that I can find. I'm friends with people inside different related companies and have asked to see if there's an internal copy of that report that I could look at, but there isn't. Even if there was a private report that I could get my hands on, causality is based on publicly available information that everyone can learn from. It's not about leaking internal reports. As best I can tell, the Colonial Pipeline company have not released nor authorized the public release of the official report into the exact details behind how the hack progressed specifically. Everything in this episode is based on public testimony, lawsuits, news interviews, and information that has been made public. When we look at the prepared statement by Charles Carmichael, SVP and CTO of FireEye Madient, before the United States House Committee on Homeland Security on the 9th of June, 2021, it was so high level and vague to tell me practically nothing. Reading through the hearing before the Committee on Homeland Security House of Representatives where both Carmichael and Blount faced questions from a committee with, shall we say, a somewhat less than technical background, was borderline farcical insofar as there were about 300 useful words out of the 33,500 spoken. Save yourself and don't read that transcript, just don't. So what went wrong? There are three components I'd like to explore here. User account practices, service migration, and ITOT segregation. Let's start with user accounts. The account that was used was that of a former employee. They'd left the company sometime earlier and yet their account remained active. When an employee leaves a company, there is usually a checklist you follow as a manager. It's your responsibility to ensure the now former employee turns in their company issued equipment like phones, laptops, and so on. IT then archive their emails, documents, and so on. And then someone cancels all of their accounts, access passes, and so on. Where systems are linked into an active directory system, which these days is almost everywhere, disabling accounts and locking them, even with a basic idle timeout for their profile, would be enough. Of course, without more detail, we can't be certain if an idle profile timeout would have come into play here since we don't know how long this ex-employee had been away from the company. Having said that, truly organized companies don't rely on timeouts. They will either manually trigger a disable script for that login at the end of their last day, or they'll use an integration system like Savient or another similar product in the identity access management or IAM space. IAM tools allow workflows for adding and removing access privileges between groups with email approval and cancellation workflows. And it makes that very easy to achieve and to do it in a very timely manner. In this case, it's clear that such a system was not in use and neither the IT department nor the ex-employees manager pulled any manual triggers or processes, or if they did, those processes failed to off-board the ex-employee. Let's talk a bit about service migration. Whilst it's not precisely clear from Maddient's public commentary whether or not a new remote access system was migrated to or whether an external two-factor authentication system was integrated into the existing remote access system, it's still useful to consider service migration. In my time at the current employer, we shifted from a non-2FA VPN to two different MFA systems between IT and OT. When we migrated from the old system to the new system, the use of MFAs was only just becoming popularized around the world, and there was some resistance from some users being annoyed that they needed to always have their phone if they wanted system access remotely. They should have tried the RSA key rotating codes. They were fun. Ultimately, though, the migration took several weeks before all access was forced via MFA, and the complaining died down. Everyone got used to it. No mess, no fuss. We move on. In the case of Colonial, it's clear that a non-MFA system was in use in the year or so before the incident, so adding MFA was relatively recent for them. That alone speaks volumes, considering this was 2021. Whether it was a new system or an extension to the existing VPN system, just tying it in and enforcing MFA use, and then to complete that cutover, I would have expected all user accounts to have been forced to use MFA on the new system, no exceptions. If it was executed as a script, it's possible there was an item that excluded unused accounts, but then if there had been a timeout policy, the account would have been disabled. Irrespective of what actually happened in this specific situation, to complete a migration, all accounts must be migrated or disabled, no exceptions, and when you're done, if you've gone from old to new, kill the old one, kill it now. Once it's just sitting there idle, it becomes a very likely unmonitored backdoor into your system, and that's not what you want. Let's talk a little bit about IT and OT segregation. It's been an interesting decade I've witnessed with OT systems, traditionally kept completely separate from IT systems, with IT departments sticking with IT, and OT being run by engineers usually. There are two sets of machines in an operational control room, IT on the one side, OT on the other, separate keyboards, mice monitors, and never the twain shall meet, or something like that. However, with the push for more remote work and more remote access, and with the cost of labor increasing all the time, there's been two pushes that have been dragging IT and OT together, and not always in a good way. The IT department generally have more experience in dealing with operating systems, patching, computer equipment. Although it's not a hard and fast rule, it's a generally fair statement, but with less understanding of the risks and pressures of dealing with mission-critical systems that must function 24/7. The OT department have become more strapped for cash and labor, and hence these groups are increasingly brought together, and the pressure for OT systems to look and act more like IT systems is increasing every year. More connections between the systems means more firewall rules. Even control system software running in some cloud environments via direct connects and such into virtual private clouds that extend the OT IP address ranges far beyond the traditional end of the formerly standalone OT system. This solves some problems, but creates all new ones in the process, particularly around segregation, if it's not done properly. What's unclear in the Colonial case is how far their systems were in fact segregated. It's not clear if DarkSide chose to leave the OT alone, or whether they were unable to penetrate the OT system laterally. Beyond firewalling, there should at least be two separate login systems for each such that one user account should have a different password and potentially also a different username with access via independent portals. Again, not clear if this was the case, or even if the ex-employee was an OT engineer, but irrespective, it's still good advice. So who was behind this? DarkSide was, that's a possible spoiler there, a ransomware as a service, or what is becoming referred to as an RAAC, ransomware as a corporation, that enables a network of different groups to conduct cyber intrusions operating under the name DarkSide. Like many other financially motivated threat actors, the criminals affiliated with the DarkSide service conducted multifaceted extortion schemes to coerce their victims into paying large amounts of money. They extract data from the target, deploy DarkSide ransomware encryptors, and threaten to publish stolen data as an added incentive for payment. Since initially surfacing in August 2020, DarkSide had launched multiple attacks impacting organizations in at least 15 countries that are known of, and attacked multiple industry types. Some have seen that DarkSide are different from many other ransomware groups where they state on their website that they, and I quote, "Will never target critical and vulnerable bodies such as schools, hospitals, or even governments," end quote, further stating that they will focus on targeting for maximum financial revenue. Well, that's nice. DarkSide created target-specific malware for an affiliate to then carry out the attack in exchange for a percentage of funds earned from the attack. It's an interesting business model. There's a few other concerns, however, that have come to light as a result of this incident. Yet the problem has been that not all private companies are reporting openly and honestly for a myriad of reasons, both political or perceived. Transparency versus secrecy is a problem. In the United States, the Transportation Security Administration, or TSA, set about establishing a security directive and requirements for the US pipeline industry. Additionally, they offered review and consultation to many pipeline operators in the years leading up to this incident, including to Colonial. The TSA had contacted Colonial multiple times to undertake both a validated architecture design review, or a VADR, and a critical facility security review, CFSR. However, Colonial stated that it had been, and I quote, "Simply a function of timing on when to do the assessment," and that there had never been a refusal, end quote, to meet with the TSA. According to the TSA, Colonial had postponed a CFSR multiple times since March of 2020, and a VADR since October 2020 as well. Whilst these requests came during the height of the COVID-19 pandemic, virtual assessments were suggested by the TSA. However, these were also postponed by Colonial. Their next planned VADR was in July of 2021, some two months after the incident had occurred. During the hearing, upon Colonial's CEO repeatedly denying that Colonial had declined the assessments, Mrs. Bonnie Watson Coleman of New Jersey put it quite plainly, and I quote, "Delaying these assessments for so long amounts to declining them," end quote. Yep, pretty much. Colonial also had no CISO, Chief Information and Security Officer. CISOs have been a formalized role since about 1995, when Citigroup appointed Mr. Steve Katz, whose role was to maintain the security of information and operations within the company. The CISO would be a common point of contact in such a scenario. However, it was unclear who that common point in Colonial should have been. CISOs also help to focus, coordinate, and seek funding for cybersecurity improvements, educational campaigns, and a set strategy to address emerging cybersecurity threats. It's becoming increasingly rare for large organizations not to have a CISO. Colonial also did have an emergency response process, which had pre-planned responses to various threats and potential situations. However, it did not specifically call out a cyber attack with a ransom. I'm not entirely sure that's a specific issue at the time for this specific case. However, as this sort of scenario is becoming more and more prevalent as it has recently, it might be worthwhile extending your ERPs to include that scenario if you haven't done it already. There's a lot of equipment in the private sector. And during the hearing, a member of the committee stated that there's an estimated 85% of the United States' critical infrastructure being operated and maintained by the private sector, with most of those companies having OT equipment connected in some manner to the internet. Going back 50 years, most of it would have been government-run, in whole or in part, at least. But the push for privatization changed that mix considerably. But then it wouldn't have been connected to the internet 50 years ago 'cause it didn't exist. I digress. Why that's a potential concern is that public companies are often listed on the stock market or have investors, whereby cyber attacks represent negative publicity. Either way, sharing that information publicly is against their business's best interests. Mind you, not getting into that situation in the first place could avoid bad publicity too, but oh well. Hence, governments need to introduce more legislation that requires disclosure. Otherwise, attacks like this will only get more frequent and much worse. Having said that though, going back to a government-run set of infrastructure departments, I'm reasonably confident that bureaucracy would also impede any free sharing of information too. So maybe that'd probably be no better. Let's talk a bit about the aftermath. On the 7th of June, 2021, the Department of Justice in the United States announced that they had managed to recover 63.7 Bitcoin from the ransom payment. Unfortunately for Colonial, the value of Bitcoin had plummeted in that intervening period to almost half of its value at the time of the ransom payment, meaning they only recovered 2.3 million US dollars in the end. On the 9th of May, two days after they demanded a ransom from Colonial, the DarkSide group posted a statement which said, and I quote, "Our goal is to make money and not creating problems for society. From today, we introduce moderation and check each company that our partners want to encrypt to avoid social consequences in the future." End quote. Following the security incident at Colonial Pipeline and the FBI's public attribution to DarkSide, Maddient has observed multiple actors cite an announcement on the 13th of May, 2021, which appeared to be shared with DarkSide affiliates by the operators of the service less than one week after the attack. And the announcement stated, "A couple of hours ago, we lost access to the public part of our infrastructure, in particular to the blog, payment server, CDN servers. At the moment, these servers cannot be accessed via SSH and the hosting panels have been blocked." The hosting company stated the reason they disabled the server access was at the request of law enforcement authorities. In addition, a couple of hours after the seizure, funds from the payment server belonging to us and our clients were withdrawn to an unknown account. Whilst DarkSide haven't been found under that name since, it's not like they're an incorporated company with a page on LinkedIn with named employees. It's likely that the group simply lifted and shifted and renamed themselves. Whatever their ultimate fate, they clearly poked the wrong bear. During the hearing, Colonial claimed to have spent approximately 200 million US dollars on cybersecurity hardening over the prior decade. However, it was not specific about those areas where that money was spent, nor about where additional funds would be directed in response to this incident. In late May, 2021, the TSA released a security directive requiring critical pipeline owners and operators to do the following. One, report confirmed and potential cybersecurity incidents to CISA. Two, designate a cybersecurity coordinator to be available 24 hours a day, seven days a week. Three, review current practices. And four, identify any gaps and related remediation measures to address cyber-related risks and report the results to TSA and CISA within 30 days. This was amended in July, 2021, again in May, 2022, and again on the 28th of July, 2023, and remains in force today. So what do we conclude from all of this? In many ways, Colonial didn't do a heck of a lot wrong, really. They paid the ransom, which was against FBI recommended guidance, and they copped a fair bit of flack for that one. But if you're trying to get your system back online after something like this, and you're not sure in that moment if your backups are clean, well, I might've made the same decision to get that decryption tool myself. It highlighted the Eastern United States dependence on the Colonial pipeline infrastructure, that's for sure. But it also highlighted the non-mandatory policies for disclosure at the time of the incident. People have mixed feelings about government oversight, and I understand why that is. When you're responsible for running critical infrastructure in your country, you have to accept that there's going to be government bodies you need to deal with for compliance and for the ongoing right to operate. Get used to it. If you can't, take up cross-stitch and sell stuff at the markets. The changes the TSA made afterwards were long overdue, and this incident was the trigger to bring them into existence, and they needed to exist because given the choice of transparency versus secrecy, most companies will tend to secrecy, fearing reputational damage, which leads to loss of income. So if you don't mandate it, you'll just get meetings postponed for all eternity, like Colonial did to the TSA for almost a year before this happened. I love how the decryption tool was slower than just restoring it from backups because it highlights just how important backups are, regular, networked, and offsite. And while you're at it, some clean automated builds from a vanilla ISO, that'd be great too. Those backups significantly reduce the duration of the restoration of the pipeline and ultimately negated the need for the ransom to be paid, or at least to get the decryption tool. But all of this could have been avoided if only they disabled that ex-employee's account. A single checkbox, none of this would have happened. So if you're in IT or OT, and you're a manager and you're offboarding someone, take it seriously, please. They say a chain is only as strong as its weakest link, and in this case, failing to disable an ex-employee's account was that weak link, and it was all over. If you're enjoying "Causality" and you'd like to support us and keep the show ad-free, you can by becoming a premium supporter. Just visit to learn how you can help this show to continue to be made. Thank you. A big thank you to all of our supporters, a special thank you to our silver producers, Mitch Bilger, Lesley, Shane O'Neill, Jared Roman, Joel Maher, Katerina Will, Chad Jering, Dave Jones, Kellen Fredelius-Fujimoto, and Ian Gallagher. And an extra special thank you to both of our gold producers, Stephen Bridle, and our gold producer known only as R. "Causality" is heavily researched and links to all materials used for the creation of this episode are contained in the show notes. You can find them in the text of the episode description of your podcast player or on our website. "Causality" is a Podcasting 2.0 enhanced show, and with the right podcast player, you can choose to stream value and boost with a message if you like. There's details on how, along with a Boostergram leaderboard on our website. You can follow me on the Fediverse at [email protected], or the network at [email protected]. This was "Causality." I'm John Chigi. Thanks so much for listening. (gentle music) (gentle music continues) (gentle music continues) (gentle music) you
Duration 26 minutes and 12 seconds Direct Download

Show Notes


General Information:

Episode Gold Producers: 'r' and Steven Bridle.
Episode Silver Producers: Mitch Biegler, Shane O'Neill, Lesley, Jared Roman, Joel Maher, Katharina Will, Chad Juehring, Dave Jones, Kellen Frodelius-Fujimoto and Ian Gallagher.
Premium supporters have access to high-quality, early released episodes with a full back-catalogues of previous episodes


John Chidgey

John Chidgey

John is an Electrical, Instrumentation and Control Systems Engineer, software developer, podcaster, vocal actor and runs TechDistortion and the Engineered Network. John is a Chartered Professional Engineer in both Electrical Engineering and Information, Telecommunications and Electronics Engineering (ITEE) and a semi-regular conference speaker.

John has produced and appeared on many podcasts including Pragmatic and Causality and is available for hire for Vocal Acting or advertising. He has experience and interest in HMI Design, Alarm Management, Cyber-security and Root Cause Analysis.

You can find him on the Fediverse and on Twitter.