The BP Texas City Oil Refinery disaster in 2005 shocked the world. With safe-guards in place and lost-time injury rates so low, how could so many people be killed in a wholly preventable disaster?
Chain of events, cause and effect. We analyze what went right and what went wrong as we discover that many outcomes can be predicted, planned for, and even prevented. I'm John Chidgey and this is Causality. This episode is brought to you by Extrasensory Devices and their amazing Luxi4All that no modern photographer should be without. We'll talk more about them during the episode. Causality is part of the Engineered Network. For other great shows, visit engineered.network today. So for the first episode of Causality, I wanted to talk about a disaster that has greatly affected me and my professional career. Not because I'm currently working in the Oil & Gas industry for the last 3 years, but more from the point that it's had a profound impact on the way I see safety, safety engineering, and user interface design, particularly in SCADA systems, because that played a part in this disaster. So today we're going to talk about the BP refinery at Texas City, the disaster in 2005. Specifically at 1.20pm on Wednesday, March the 23rd, 2005. BP's refinery in Texas City in Texas, just outside of Galveston, there was a massive explosion in one of the ISOM units. 15 people were killed, 180 were injured, most of them seriously injured, requiring lengthy hospitalization. The blast was about 3/4 of a mile or 2 kilometers away from the site, shattering windows at that distance. Forty-three thousand residents were forced to stay indoors until the fire was able to be brought under control. Investigation into the incident lasted for 2 years, and the Chemical Safety Board in the United States, at that point in 2005, it was the largest investigation in their entire history. So BP had an oil refinery, it was very, very large. It employed approximately 1,800 staff, and it refined about 470,000 barrels of oil a day. So it was actually the largest BP refinery in North America at that point in time. The incident occurred in something called the ISOM unit, I-S-O-M. And ISOM isn't actually an acronym, it's an abbreviation and it's an abbreviation of isomerization. And that's designed to change the iso-configuration of a hydrocarbon from one isoform to another or to separate it out. So regarding, and this is all got to do with octane ratings and octane boosting for unleaded gasoline or petrol. And the idea is of course when we have hydrocarbons starting off with methane, which is CH4, single carbon, four hydrogen atoms surrounding it. As we increase the number of carbon atoms and to create longer and longer hydrocarbon chains becoming ethane with two hydrocarbons and as we add more we get to higher order hydrocarbons like Butane, Pentane, Hexane, all the way up to Nonane and Octane up there and so on and so forth. The specific part of the ISOM unit where the incident occurred was referred to as the raffinate tower. Raffinate is called RAF for short and what it is, is it's essentially a either, it's either an unseparated or partly separated part of the crude oil process, the refining process. And the idea is you have a tall tower. In this particular case, it was about 170 feet tall. That's 52 meters high. And of that height, the majority of that height is supposed to be empty. You inject a small amount, and I say small, about 6 feeta worth of liquid in the bottom. It's about 2 meters worth of height of liquid in the bottom of that tower. And as you heat it up, then what tends to happen is the higher density and higher energy hydrocarbons, such as the Octane, Hexane and so on, they will actually come out of solution and as vapors will then accumulate at the top of the tower, which is the reason why the tower is so tall. It needs that additional space. So considering it's 170 feet tall, but you only need 6 feet of liquid in the bottom, It seems a bit odd, but that's the way that it works. And anyway, the idea is that once you have the accumulation at the top, the Pentane, Hexane, whatever you're accumulating at the top, the lighter components then stored in the light Raffinate storage tank and the heavier components in the heavy Raffinate storage tank. The overall volume of this tank is huge. It's about 3,700 barrels worth, so it's quite decent. The investigation into the incident identified that there was organizational and safety deficiencies at all levels of the organization at that point. But where do we begin getting ahead of ourselves? There was an 11 hour period leading up to the explosion. Several units, ISOM units, were shut down for maintenance works. Over a thousand subcontractors were on site during these works. Now BP had placed 10 portable offices near the Ultra Cracker unit and several others near other nearby process units for convenience, primarily for convenience purposes during the maintenance works. Now, these particular portable offices, if you're not familiar with them, are typically a wooden frame and they'll have a light sheet metal on the outside and some insulating paneling on the inside, generally fitted with some air conditioning units to keep them cool in the summer. and they are essentially arrive in pieces and assembled on site. And sometimes you'll see them on the back of a truck going down the street in pieces. And these temporary offices are very popular because you have permanent facilities on site. What during construction, you need more space because you have more documentation, people coming and going for meetings and so on. And it's, they're basically an economical way of housing all of the maintenance staff and all of the construction staff during those works. So very popular, very commonplace. Some people refer to them in Australia as "Dongers". Irrespective, these particular buildings were placed where they were because they were physically close, they were approximate to the works being undertaken so that people could walk, you know, several hundred feet from where they are actually doing the work back to the site office to have a look at drawings, have a meeting and then go back out to the site again. So from a convenience point of view, that sounded great. Unfortunately, however, they were only 120 feet or 35 meters away from the base of the blowdown drum, which would later become somewhat of a problem. The odd thing is, if there is an explosion, you're actually safer out in the open atmosphere than you are actually inside a building, because rather like a cyclone or a hurricane or a typhoon, whatever you want to call it, or a tornado in that case...it's not so much the explosion, but it's the flying debris that'll end up killing you. Notwithstanding blast pressure, of course, and pressure differential, because that'll also cause kidneys and internal organs to rupture if you're close enough to the actual point of the explosion. But that won't matter if you're inside or outside of a building. And when I say building, obviously I'm not talking about a concrete bunker. You put a concrete bunker in an explosion, next to an explosion, you're probably going to fare okay if you're inside there. But most of these buildings aren't bomb shelters. The guidelines at the time of the incident, they weren't actually strict enough to disallow temporary structures, but these were amended after this incident occurred. There was actually a management of change review when they placed the buildings in the location next to the ISOM unit, but they didn't follow up on several action items from that management of change review meeting, and they never completed a full risk assessment of a potential explosion during their review meetings. BP's also had its own procedures requiring SIMOPs, simultaneous operations is what we call it these days, and an evacuation of those buildings and nearby surrounding areas during the startup, and that wasn't followed. So the resulting explosion actually impacted trailers that that were 479 feet, which is about 150 meters away, and people in those trailers still resulted in injuries. They still had injuries. Anyway, the closest office was a double wide wooden frame trailer. It had 11 offices in it, and they were primarily used for meetings. When the ISOM unit was ready for recommissioning, there was no notice given to any of the users of those buildings, anyone in that area that it was about to start up. So they'd done their works, the maintenance was complete, it was time to turn this thing back on and well essentially fire it up quite literally. So at 2.15am, the overnight operators injected some Raffinate into the splitter tower with the level transmitter as being mounted at the bottom of the unit to measure that 6 foot operational level I mentioned earlier. The tower unit had a maximum liquid operating level of about 9 feet, which is just under 3 meters. That's from the bottom of the tank. The normal operational level, as I said before, was about 2 meters or 6 feet. But during the startup process, that level could fluctuate and lower levels in the tower was thought by the operators at the time could cause damage to the furnace or the burners. So the idea is we heat up the liquid in an external set of burners and these burners, as they warm up the Raffinate, it's then pumped into the base of the tower and circulated through. Once it's in the towers, of course, the vapors are then extracted off the top. But the problem is the relative level of the burner was not quite low enough, or rather relative to the bottom of the base of the tank. So if the level of the base of the tank got too low, there was a concern that the level in the burners would also get too low and that that would then cause a subsequent damage. to the furnace. What's not clear in the aftermath is actually whether or not that was true. Irrespective of whether it was true, it was never thoroughly investigated during the lead-up as to whether or not it was actually a problem or if it was just a perceived problem. The whole idea of the monkeys and the banana in a cage, you train 3 people, 3 of the monkeys, sorry, people, monkeys. You train 3 monkeys with the fire hose. Don't go up and don't get the banana. You walk up to the banana, you spray them with the fire hose. You do that, all 3 of them over and over again. After a while, they realize, "Well, we don't get the banana or we get sprayed by the hose." You bring a 4th person in, a 4th monkey into the cage. The other 3 will stop that 4 monkey from going to get the banana. That's an institutional learning thought experiment. No monkeys were harmed in the making of that analogy. Anyway, so rather than investigate that, they simply deviated from the documented procedure. The documented procedure said not to fill it beyond the normal operational level of 6 feet. Instead, they went to the 9 feet because that's just the way that they had learned to do it because they were concerned it might damage the burner unit. At 3.09am, the audible high level alarm went off at the 8 foot mark. That's driven by the level transmitter, of course, mounted on the side. A level transmitter measures the level in an analog sense. So 0 would be 0% up to 100%, probably at 9 feet of operational level. However, for redundancy purposes, they had fitted a secondary fixed position high level alarm, So a level switch, essentially, and it was set just above the analog high-level set point. So just above that 8 foot mark. However, that high-level alarm never went off. Now, after the incident had occurred, they found that the level switch had actually been listed as faulty as early as 2 years prior. The worst part of it, the maintenance work orders that had been raised to address that issue each time that maintenance came around, they identified it was faulty, but then they closed the work order each time it had been reported and never actually repaired it. There's no evidence as to why they did that. By 3.30am in the morning, the level indicator was showing a full level of 9 feet of liquid And the operators shortly thereafter noticing this stopped filling the tower at that point. During their investigation, the chemical safety board found that the level had actually reached as high as 13 feet, that's 4 meters, which is 4 feet above the actual maximum level. And the operators were not aware of this because the level transmitter only reported a maximum of 9 feet of level. Beyond that, there was no sensing. And the high-level switch wasn't working. So they had no way of knowing if it was 9.1 feet or 900 feet. It's not possible to know. You may ask yourself, why on earth would you do that? Why would you set the operational level of a level transmitter to such a small fraction of the overall height of the tower? Normally you do that to provide better resolution. So you'd say, I would like more accuracy because I'm turning this into a 16-bit integer. So I have, let's say, 0 to 32,767 discrete points that I can indicate my range. I convert that into a percentage and it's maybe 1 decimal point, maybe 2 tops, I don't know. Sometimes you want as much resolution as possible. If you were to take the same level, scale it over the entire 170 feet, the resolution would be much less, significantly less. However, at least that would have the virtue of knowing exactly how much liquid was in it. Clearly, when they did the design, they didn't think that was a problem. So at this point in time, the startup procedure, the restart procedure, whatever you want to call it, for this particular ISOM unit was being handled by the lead operator, but they were in a secondary control room... and this is still the night shift operator. At 5am, the handover and the status of the restart with the operator that was currently on duty in the main control room 1 hour before the end of the shift was due to finish was essentially done over the radio. So the duty operator in the main control room did take enough time to make a note before they left at 6am. So at 6am the day shift operator came into the main control room. There was a brief handover. Now he was working day 30 in a row. So he'd done 29 days in a row at that point. This was now day 30 and each day 12 hour days. And this is not uncommon during restart maintenance procedures where there's more work to be done. Sometimes they'll work extended long shifts. Now these days, most industry standard and certainly the company I currently work for is this limitation of the maximum 21 days straight. So you can't do more than 3 weeks straight without several days off because they recognize that long term that becomes a fatigue issue. There's even stricter rules regarding number of hours in a day. The note that the main control room operator, who let's not forget, only had it verbally handed over to him an hour before he left, basically wrote down the logbook, "ISOM," and this is exactly what he wrote, exactly what he wrote, he wrote, "brought in some Raff to unit to pack Raff with," which is kind of, yeah, a bit vague. It didn't say how long, it didn't say what to do with the "Raff", it didn't say whether the burners are on or off or if it had been preheated or at what level, nothing, nothing at all, just brought some wrath. Okay, not particularly descriptive. So, at 7.15am, the day shift supervisor arrived and we note that that was more than an hour late. He was supposed to be there at 6:00am for the handover. Ordinarily, and certainly in my experience, verbal face-to-face handover is a requirement, such that when the day shift operator comes in, they speak face-to-face and go through the list of issues, what's going on with the night shift operator before the night shift operator is allowed to leave. Sometimes when day shift operators are late, the night shift operator simply is not allowed to leave until there is a handover, which makes sense, seems reasonable. The problem was that the day shift supervisor who had the most experience, because he was over an hour late, unfortunately, some of the information was lost and all they had to work with was that 1 line that was rather vague in the logbook. At 9:51am the operators decided it was time to resume the startup sequence and they began recirculating the Raffinate in the lower loop. At 9:51am operators resumed the startup sequence by starting to recirculate the the liquid, the Raffinate in the lower loop, and they added more liquid to the ISOM unit, thinking that it still needed to be topped up a bit more. Ordinarily, the level in the tank is controlled by a discharge valve, an automatic level control valve. That had been set into manual in a SCADA system. And because the final routing destination for the circulated Raff was not made clear, there was actually conflicting opinions as to where it should go. The operators decided to leave the valve shut for nearly 2 hours while they continued to fill the tank put more Raff in the tank in the tower. Just prior to 10:00am, the furnace was turned on to begin heating up. Just before 11:00am, the shift supervisor was called away on an urgent personal matter, leaving just the 1 operator in control. Now this particular operator was nowhere near as experienced as the supervisor who had just had to leave due to personal concerns. And there was no one else available to replace them. So this operator on their 30th day straight of this swing was in charge of 3 refinery units, one of which one of the ISOM units was actually going through the startup sequence, which requires more attention than something that's currently running. Now, 6 years previously, BP had bought this particular site from Amoco. And when that happened, as part of the merger, they made a decision at a board level to reduce headcount by 25%, essentially, a 1/4 fixed cost reduction across all the refineries. There's this mentality when you have a merger between two corporate organizations that there is going to be duplication and role duplication. So, of course, what's the first thing you want to do? Well, we're going to cut back. Why? Well, because there must be a lot of duplication, I guess. Problem is, you don't always let go some of the people that are perhaps not the best performers. Good people are the first to leave usually. And unfortunately, there were some people that were let go that that played key roles in that organization. There was a panel investigating it independent of the CSB called the "Baker Panel", following this specific incident in investigating it. And they concluded that restructuring following the merger resulted in a significant loss of people, expertise and experience. So at that point, they had cut back from 2 operators on the panel to 1, as a part of their 25% cost reduction. A second panel operator, I think, would have definitely made a difference. Unfortunately, there was just the 1. And he was fatigued I think it's reasonable to assume. So at this point in time, just before 11:00am, there had been no regulation of the level in the tank at all. The indicator still showed an incorrect level and they had no way of knowing that the amount of Raffinate in that tower had actually reached by lunchtime, it had reached 98 feet. 98. The level transmitter, the particular kind of level transmitter that was used, was actually now starting to go backwards. Some level transmitters that work on pressure, what ends up happening is that they will start to get anomalous readings based on the liquid that's used. What's happened is this particular meter had been calibrated, but it was calibrated against data that had been mapped out in 1975. I presume, although it was not clear from the report when they originally fitted that specific level gauge. The worst part of it though, was that it was for a completely different process with a different kind of liquid, meaning that the calibration from data from 1975 was completely wrong anyway. The external sight glass, which is essentially just what it sounds like, a piece of glass that allows you to see through it, and you can visibly see the level in the tank. Well, that sight glass was so dirty, it was completely unreadable. It hadn't been cleaned or maintained. Other option perhaps it was became dirty during the maintenance work but was never clean prior to startup. Now the particular level transmitter was never tested against a level well above the 9 foot mark because that wasn't normal. It wasn't normal to fill the tower that high. That was supposed to be left as free space for the vapor. And now for the SCADA component. When you have a tower, a tank, any kind of vessel that can store liquid, the flow in and the flow out of that tank are critical pieces of information. You put the two together with a level indication and you can very quickly see flow in minus flow out. And if your level is changing, you can tell if one of those three does not agree. So if you're putting in lots and lots of fluid and there's no fluid coming out and the level isn't changing, you know something is wrong. Unfortunately, the SCADA screens were designed where the flow in and out of this tower was shown on completely different screens. You couldn't have them up up at the same time. So it was very, very difficult to tell if anything was actually wrong. Had they been on the same screen, it would have been glaringly obvious that something was wrong. And if I'd been designing this thing, it's just normal practice to put the flow in and the flow out either horizontally opposed to each other, one on the left and one on the right hand side of the tank, or to have them above or below each other so that you can easily visually compare them and deduce based on the level indication or volume indication if something's wrong. In this case, they didn't have that. More advanced alarming and controls might have even included an automatic sanity calculation that actually does track flow in flow / out versus volume. I've done something like that in the past. You know, control systems are there to help, so you could do that calculation automatically and make it even easier for the operator again didn't exist in this case. So at around about lunchtime, several of the contractors that weren't involved with the recommissioning of the ISOM unit, they were there for different projects going on at the same at the time. They'd left the site for a team lunch. And that team lunch was celebrating one month LTI free. LTI is Lost Time Injury. And how's that for a bit of irony for you? Before we go any further, I'd like to talk about Extrasensory Devices and their amazing Luxi4All. Now Extrasensory Devices are an innovative company and they're based in Palo Alto, California. They've recently released their new Luxi4All. It's an incident light meter attachment for your smartphone or tablet. If you're a photographer that likes to take the best possible shot, or even if you aspire to be a better photographer, then precise control of your exposure is critical and to figure that out, you need a reliable and accurate light meter. Standalone light meters can cost upwards of $100, take up a lot of space in your camera bag, need their own batteries, and can be a pain to use. But Luxi4All aims to solve all of these problems by utilising the front facing camera on your smartphone or tablet and provides accurate readings using its freely available Luxi app from the App Store or Google Play Store. Luxi4All is easy and quick to calibrate on your first use and then you can set and forget with spot metering and instant light modes the Luxi app will suggest ideal settings for your f-stop time, time and ISO settings for your SLR or you can lock or unlock any of those and it will figure out the optimal values for your other settings. for all is a passive device so there's no batteries to worry about. It's small, measures of just 1.5 x 0.5 x 1 inch, that's 38 x 13 x 25mm and it's light. It's only 28g or 1oz, so it fits easily in your camera bag. It's also really easy to put it on and take it off really quickly too and there's a nice touch that when you actually put it on, the Luxi app will detect that you've attached it and it will automatically change the mode of the app to light monitoring for you. I've suggested before it's not as expensive as the standalone meters, it's only $29.95 USD and if you'd like to check one out, just head over to https://esdevices.com/engineered to learn more and enter the coupon code ENGINEERED for 15% off your Luxi4All. Photographers always want to take better pictures and taking better pictures starts with your Luxi. Thank you to Extrasensory Devices for supporting The Engineered Network. At this point now, in our chain of events, we're up to 12:41pm, so just after Midday. At that point, a high pressure alarm went off in the top of the tower because the liquid level had now become so high that the gases that were escaping at the top were now compressing. And that high pressure should have been an indicator that something was really not right. However, the problem was the level reading was still incorrect. It was still showing about 8-1/2 feet, and it was dropping because the calibration on the meter was completely wrong, despite the fact that there was over 100 feet of liquid in the tower. So because the reading was incorrect, the operator was confused. Why would they be getting a high pressure alarm at the top of the tower? It made no sense. So what they decided to do is the rather obvious thing, which is you open the manual valve and that's designed to vent high pressures to the emergency relief system. Now, the actual emergency relief system was a little bit antiquated. It's what we refer to as a cold vent and in that particular case, a blow down drum, same idea. The idea, though, is it's not flared and a flare essentially is a cold vent with a torch on the end. And all we do is, as we vent hydrocarbons, we put a match to them. It may sound a bit crazy, but it's actually oddly a safer thing to do, because if you have hydrocarbons escaping cold into the atmosphere, if the concentration of those hydrocarbons is just right and they meet an ignition source that's unintended, then you will have a fire and a fireball and an explosion and people will probably get killed. In this particular case they were considering putting a flare on the top of this blow-down drum. It was actually built in the 1950s but it was not fitted at this point so it was a cold vent or blow-down drum. So in addition to opening up the manual release valve that then gave the high pressure gases a secondary path to escape and reduce the pressure at the top of the tower to the blow-down drum. What they also did is they also turned off 2 of the burners in the furnace and the idea was that by reducing the temperature in the furnace it'll reduce the temperature of the Raffinate which would then of course ultimately reduce the pressure in the top of the tower as well. I mean that's gonna make little or no difference essentially because the volume of hot Raffinate in that tower is huge at that point. There had been so much in there that it was just, it wasn't going to make any difference. At this point in time, they knew something wasn't quite right. They didn't go and check the level of the sight glass, they couldn't have if they wanted to, because it was so dirty, you couldn't tell the difference. They trusted their instrument. But then they started to check their flows. And then they noticed that the flow out of the tower at that point was not right. It was clearly there was something not quite right. So what they decided to do was open a drain valve to some of the Raffinate heavy storage tanks. But the problem was that those drain valves were not designed to operate when the Raffinate was at this sort of temperature. There was a heat exchanger designed to reduce the temperature, but because this liquid essentially was coming off straight after the furnace or not far after that, it ended up actually preheating some of the other liquids by as much as 141 degrees Fahrenheit or 60 degrees Celsius. So unfortunately, that then had a cumulative effect, which we'll see in a minute. At 1:00pm, the workers that had gone to their LTI lunch, they returned from their lunch and they started a meeting in the demountable that was right near...right next to the ISOM unit blowdown drum. At this time, the temperature of the liquid was so high, it began to boil at the top of the tower. The boiling and all of that turbulence then spilled liquid into the vapor line. Outside, so essentially the outside of the cylinder of the tank, there was essentially a smaller vapor pipe and that pipe was supposed to take the Hexane and higher, the more dense hydrocarbon vapors and extract them from the top. That was the design, that was the whole idea of doing it. But now, because the liquid was so high, it boiled over into that vapor line, going back down the outside of the tower and into the vent system. At 1.14pm, the three pressure relief valves at the base of the tower opened. And that then was the last thing. It released the high temperature liquid and vapor into the blowdown drum. The liquid started to fill the blowdown drum and it very, very quickly overflowed through an overflow line into the process sewer. That then set off more alarms in the control room. but another point of interest there was in the blowdown drum yet another level switch and that level switch indicates a high level alarm ordinarily it should never but it's there as a last resort but it did not activate again it was broken just like the other high level switch the liquid at that point had nowhere else to go but up out the vent stack. It was now 1:20pm. Liquid was spraying out of the top of the blowdown stack. Several staff on the ground level described it as a geyser of boiling gasoline and vapor. The vapor cloud spilled from essentially an entire petrol gasoline tanker truck's worth of gasoline out of the top. It took it about 90 seconds to find an ignition source. source. By the time it found an ignition source, the vapor cloud was encompassing the entire ISOM unit, the demandable trailers nearby, as well as a parking area for vehicles adjacent to the ISOM unit. And that was where it found its ignition source. There were 2 workers. They were parked 25 feet, only 7 meters away, from the base of the blowdown drum in a diesel powered pickup truck. As the engine began to race because the engine was now sucking in through the airline, it was now sucking in all of this gasoline vapor in addition to the diesel that was being injected into the engine through the normal combustion process through the fuel injectors. They tried to turn the vehicle off, but they couldn't, so they got out of the vehicle and they ran, which is exactly what I'd do. The problem with diesels, I say the problem, but one of the reason that the engine wouldn't shut off is because diesels, they use high compression ratio inside the combustion chamber to ignite the fuel. There's no spark plugs. So the air intake however was now bringing the gasoline vapor in as well as the air and then that was mixing with the diesel from the injectors. So if you turn the engine off, you turn the ignition off, that'll turn off the fuel pump, it'll turn off your injectors but since the fuel was no longer coming in just from the fuel pump and the injectors, it was actually coming in through the air intake, which has no cut off. The engine will theoretically never stop running. And that's exactly what happened. They couldn't turn off the engine. So they ran. In a situation like this, the amount of fuel coming into that engine was so great and the engine raced and essentially went beyond its maximum rev-limit. What was happening is the sequencing was no longer going to work properly and some non-combusted fuel escaped down the exhaust. A backfire is when that fuel combusts outside of the combustion chamber. And it was a backfire from that engine that ultimately lit the gasoline vapor cloud. When that happened, the force of the explosion killed 12 of the people in the nearest trailer. Another 3 people were killed in a nearby trailer. The devastation caused by that explosion took two years to rebuild, not to mention the lives that were lost. There were internal BP reports in the years leading up to that incident, and one of them was tabled to executives in London stating that they had serious concerns about the potential for a major site incident. There were 80, that's 8...0...hydrocarbon releases in the year prior to that alone. That's 80 and that's a lot. So, in other words, they were venting situations where they had released hydrocarbons in an unplanned fashion for unplanned reasons. During 2004, there were 3 major accidents at BP, Texas City. March 30th, one of the incidents, $30M USD worth of damage, but no fatalities. The most disturbing part of it, though, the other 2 incidents occurred, each had 1 fatality, and that was in 2004. I mean, when I read that, I couldn't believe it. But the worst part beyond the fatalities is that the lost time injuries specifically excluded fatalities. I have no idea how that makes any sense. Because if someone's dead, they're not showing up to work, so surely that's lost time. But I suppose the lost time would go on for an infinite period of time. So therefore, maybe they decided, well, we'll have a separate measure for casualties at work or something, or just didn't include them in the statistics. That's just mind-blowingly terrible. They also did some digging into maintenance budgets. We mentioned earlier about the 25% cuts. Well, they increased their funding for maintenance in the 2003-2004 year, but that money was actually primarily directed to environmental compliance retrofitting and accident response because they kept having accidents, not preventative maintenance. Of course, preventative maintenance would have helped. One of their own internal safety surveys just prior to the 2005 disaster noted that production and budget compliance gets recognized and rewarded above anything else. And that's just a little disturbing. The only safety metric used to calculate executive bonuses was the personal injury rate from a lost time injury rate. So there was no recognition given for preventative maintenance plans and strategies for a reduction in the number of unplanned venting, none of that. And certainly not with fatalities either, which is insane. They also noted the high executive turnover rates in the decade leading up to the incident. There was an average of only 18 months for an executive to stay at that facility. Just 18 months. Year and a half. And since executives are rewarded based on profit, and that's driven by production and reducing spending, there were no major preventative maintenance initiatives implemented during that decade. And BP essentially at that point had developed a culture that many industrial sites do these days of focusing on personal safety, like slips, trips, falls, paper cuts of all things. And they were focusing on that rather than process safety, which is to say the automation process safety systems, risk analysis, hazard analysis from a process point of view, as in the plant process. What are we processing? Well, in this case, oil, hydrocarbons. Are they explosive? Yeah, that's their problem from that point of view. Yeah. And the problem is that the culture sort of in time, what happens is you get an erosion. You erode away the employees that are focused on process safety and they get taken over by people that are far more focused on personal safety, like slips, trips and falls. Because in many respects, that's easier for a lot of people to understand than just process safety, which can be quite complicated if you don't understand. If you're not a chemical process engineer or a mechanic or electrical engineer, sometimes process safety is very hard to get your head around because it's more complex. Anyway, over the years, these people sort of were eroded out of the organization, leaving the majority of the high level executives blind to the process safety risks that were present at that facility. Beyond that, BP had an internal culture that didn't reward or acknowledge reporting of potential safety risks. Sometimes we refer to them as near misses. And the Texas City maintenance manager noted in an email prior to the incident, an email to executives that "...BP has a ways to go to becoming a learning culture and away from a punitive culture..." I've worked in a few different places where if you raise an issue, it's a defect or something's wrong or potential safety issue, where you get sometimes in extreme cases laughed out of the room. In other cases, you get told, "You're just being a nitpicky so-and-so. We have other things to worry about. We have real problems to worry about." And that sort of punitive culture that doesn't encourage people from flagging issues and process safety issues in particular, that's poisonous. I'm happy to say that my current employer is not like that. My current employer is actually very good at taking that seriously, which is great. In the 19 previous startups for ISOM units at that facility, in the vast majority of instances, the operators were running the ISOM units beyond their capable range of the level transmitter. So above the 9 foot mark, but it was never investigated as to why, because these weren't considered near misses and it became normal operational procedure. Like I said, monkeys and the banana and the fire hose. These days, we investigate near misses as thoroughly as though they were an actual serious incident where someone was injured. Because near misses are an indicator of a future problem. Where it will no longer be a near miss. Finally, how can we prevent this accident? How could we have prevented this accident? Accident? Disaster. First of all, in safety systems, we do a safety study. And the safety studies are designed to look at all the risk factors and consequences if things go wrong. They incorporate reaction times, damage to the environment, personal injuries, process damage. All of these factors and several more are all rolled up into a series of what we call Safety Instrumented Levels, SIL ratings. And the SIL ratings drive the level of protection that's required. In particular, this sort of an application would require redundant level transmitters. A single level transmitter with a level switch would not be sufficient. We would have two or possibly three level transmitters identical, mounted nearby, independently reporting their positions. They would require 6-12 monthly calibration with an independently verified calibration source. But that was not what was installed. Safety interlocking to prevent overfilling would also be a common thing. The pumps that circulated the raffinate could have been cut off by the high pressure, sorry, the high level indicators. If 1 out of 2 (1oo2), 2 out of 3 voted (2oo3), they could have done that, built into a safety interlocking system. That's what we would have installed had it been designed and built today, even in the last 10 years. Flaring seems environmentally questionable, but in terms of safety, because it consumes hydrocarbons as they exit to atmosphere, you prevent that buildup that could lead to an explosion. So whether it's environmentally questionable one way or the other, flaring versus cold venting, that's a separate discussion, not fit for this show perhaps, but ultimately from a safety point of view, there's no question that flaring done safely in a controlled fashion is the better option by far. Now flaring was actually proposed on five separate occasions in the 2 years leading up to the accident but due to production pressures and cost they were never fitted because you'd have to be offline in order to fit them plus they would cost money to fit. Production was king so they didn't. Prior to the startup all of the alarms and indications were supposed to be checked that's part of an internal BP procedure that existed at the time but they were not performed pretty much for most of those 19 prior startups that the Chemical Safety Board investigated. They just skipped it. Then again, not sure how you would validate against the level without a functioning sight glass or an independently calibrated level transmitter or a test liquid. All right, beyond that handover, another big problem. In the industry, a lot of companies have a policy that there must be a face to face handover, certainly we do, and detailed notes from changing shifts or swings. And that changeover period is crucial. Training budgets have been cut to a point where the training was delivered via standard computerized system rather than face-to-face. And whilst computerized systems have their place for consistency and simplicity in many cases, budgetarily as well, of course, but no, face-to-face allows question and answer and probing. It's essentially a poor replacement for face to face training. Simulation, process simulation. Now, 5 years before this incident, simulators were recommended for operator training purposes. Unfortunately, due to cost, they were never implemented. They could have had a simulation system, much like my current employer does, where you can train operators in how to safely start up and shut down a plant. You can inject errors, faults into the system during startup and shutdown, and test them, operators, to see how they respond by following the appropriate procedures and proving that they are paying attention during their training. Now the impacts of mergers and cost reductions, as you can see from this incident. They aren't felt immediately. These things have a habit of taking time before they're felt. The same with preventative maintenance. The reason we do preventative maintenance is to make sure that when there is that 1 in 1,000th time when all of the other elements line up, so that these safety systems that are designed to protect us will function correctly. We have a thing called CFTs, and that's short for Critical Function Testing. CFTs are run on a 6-12 monthly basis on most mining oil and gas plants. Why? Because you need to know that your safety system is going to operate when it should, how it should. And they have to be done all the time, regularly. Production doesn't matter. You have to make sure your CFT works. Fatigue prevention is another problem. There was no fatigue prevention policy at BP at the time or in the industry as a whole at the time but now there definitely is. So in closing, it may or may not be of interest but on the 1st of February, 2013, BP sold the refinery at Texas City to Marathon Petroleum Corporation. That ended BP's 15 years operating that facility. It is now called the Marathon Galveston Bay Refinery and it's still in operation today. I think it's important to reflect one last comment about this disaster and about organizational learnings and the value of experience and the cost of cutbacks. There's a statement made by a man called Trevor Kletz in 1993 in his book, "Lessons from Disaster". He says, "Organizations have no memory. Only people have memory." We write reports, we do investigations, we talk about everything that went wrong and how these people who are essentially innocent were killed because of an accident. I hate that word "accident" because it suggests that, you know, it was accidental. It was, you know, but it wasn't, it could have been prevented. Like, they don't call traffic accidents...accidents, they call them traffic collisions because of what it was...it was a collision. Was this an accident? No, it was a disaster, and the disaster could have been prevented. I'd like to thank Extrasensory Devices and their Luxi4All for sponsoring this episode. Visit https://esdevices.com/engineered for more information about their Luxi4All and use the coupon code ENGINEERED for 15% off exclusively for Engineered Network and Causality listeners. Taking better pictures starts your Luxi. This was Causality. I'm John Chidgey. Thanks for Listening. [Music] [BLANK_AUDIO]