﻿1
00:00:00,000 --> 00:00:15,160
Chain of events, cause and effect. We analyze what went right and what went wrong as we

2
00:00:15,160 --> 00:00:20,480
discover that many outcomes can be predicted, planned for, and even prevented.

3
00:00:20,480 --> 00:00:25,680
I'm Jon Chidgey and this is Causality. Causality is part of the Engineered Network. To support

4
00:00:25,680 --> 00:00:29,600
our shows including this one, head over to our Patreon page and for other great shows

5
00:00:29,600 --> 00:00:33,400
visit https://engineered.network. today. For the first episode of

6
00:00:33,400 --> 00:00:38,160
Causality, I wanted to talk about a disaster that has

7
00:00:38,160 --> 00:00:43,280
greatly affected me and my professional career. Not because

8
00:00:43,280 --> 00:00:45,840
I'm currently working in the oil and gas industry for the last

9
00:00:45,840 --> 00:00:50,840
three years, but more from the point that it's had a profound

10
00:00:50,840 --> 00:00:55,960
impact on the way I see safety, safety engineering, and user

11
00:00:55,960 --> 00:00:58,480
interface design, particularly in SCADA systems, because that

12
00:00:58,480 --> 00:01:01,560
played a part in this disaster.

13
00:01:01,560 --> 00:01:07,120
So today we're going to talk about the BP refinery at Texas

14
00:01:07,120 --> 00:01:10,240
City, the disaster in 2005.

15
00:01:10,240 --> 00:01:16,600
Specifically at 1:20pm on Wednesday, March 23, 2005, BP's

16
00:01:16,600 --> 00:01:19,920
refinery in Texas City in Texas just outside of Galveston.

17
00:01:19,920 --> 00:01:23,720
There was a massive explosion in the ISOM unit in one of the

18
00:01:23,720 --> 00:01:31,720
ISOM units. 15 people were killed. 180 were injured, most of them seriously injured requiring

19
00:01:31,720 --> 00:01:40,080
lengthy hospitalization. The blast was about three quarters of a mile or two kilometers

20
00:01:40,080 --> 00:01:46,320
away from the site, shattering windows at that distance. 43,000 residents were forced

21
00:01:46,320 --> 00:01:50,880
to stay indoors until the fire was able to be brought under control. Investigation into

22
00:01:50,880 --> 00:01:54,480
the incident lasted for two years and the Chemical Safety

23
00:01:54,480 --> 00:01:57,840
Board in the United States at that point in 2005, it was the

24
00:01:57,840 --> 00:02:05,520
largest investigation in their entire history. So BP had an oil

25
00:02:05,520 --> 00:02:08,400
refinery, it was very, very large, it employed approximately

26
00:02:08,400 --> 00:02:14,880
1,800 staff and it refined about 470,000 barrels of oil a day.

27
00:02:14,880 --> 00:02:18,040
So it was actually the largest BP refinery in North America at

28
00:02:18,040 --> 00:02:24,120
that point in time. The incident occurred in something called the ISOM unit.

29
00:02:24,120 --> 00:02:28,720
ISOM isn't actually an acronym, it's an abbreviation.

30
00:02:28,720 --> 00:02:33,320
It's an abbreviation of Isomerisation and that's designed to change the "ISO"

31
00:02:33,320 --> 00:02:37,080
configuration of a hydrocarbon from one isoform to another or to separate it out.

32
00:02:37,080 --> 00:02:41,920
This is all got to do with Octane ratings and Octane

33
00:02:41,920 --> 00:02:47,760
boosting for unleaded gasoline or petrol. The idea is of course when we

34
00:02:47,760 --> 00:02:53,200
hydrocarbons starting off with Methane which is CH4, single carbon, four hydrogen atoms

35
00:02:53,200 --> 00:02:59,120
surrounding it. As we increase the number of carbon atoms and to create longer and longer

36
00:02:59,120 --> 00:03:04,480
hydrocarbon chains becoming ethane with two hydrocarbons and as we add more we get to

37
00:03:04,480 --> 00:03:12,240
higher order hydrocarbons like Pentane, Butane, Pentane, Hexane, Pentane, oh I said Pentane,

38
00:03:13,520 --> 00:03:21,200
all the way up to Nonane and Octane up there and so on and so forth. The specific part of the ISOM

39
00:03:21,200 --> 00:03:27,840
unit where the incident occurred was referred to as the Raffinate tower. Raffinate is called

40
00:03:27,840 --> 00:03:34,720
"Raff" for short and what it is is it's essentially a either it's either a not unseparated or partly

41
00:03:34,720 --> 00:03:41,600
separated part of the crude oil process, the refining process. And the idea is you have a

42
00:03:41,600 --> 00:03:47,920
tall tower, in this particular case, it was about 170 feet tall, that's 52 meters high.

43
00:03:47,920 --> 00:03:51,760
And of that height, the majority of that height is supposed to be empty.

44
00:03:51,760 --> 00:04:00,080
You inject a small amount, and I say small, about six feet's worth of liquid in the bottom.

45
00:04:00,080 --> 00:04:04,960
It's about two meters worth of height of liquid in the bottom of that tower. And as you heat it up,

46
00:04:05,600 --> 00:04:12,000
then what tends to happen is the higher density and higher energy hydrocarbons, such as the

47
00:04:12,000 --> 00:04:18,520
Octane, Hexane and so on, they will actually come out of solution and as vapors will then

48
00:04:18,520 --> 00:04:23,320
accumulate at the top of the tower, which is the reason why the tower is so tall.

49
00:04:23,320 --> 00:04:26,660
They need that additional space.

50
00:04:26,660 --> 00:04:32,580
So considering it's 170 feet tall, but you only need six feet of liquid in the bottom,

51
00:04:32,580 --> 00:04:35,540
It seems a bit odd, but that's the way that it works.

52
00:04:35,540 --> 00:04:38,320
And anyway, the idea is that once you have

53
00:04:38,320 --> 00:04:40,880
the accumulation at the top, the Pentane, Hexane,

54
00:04:40,880 --> 00:04:42,500
whatever you're accumulating at the top,

55
00:04:42,500 --> 00:04:43,780
the lighter components then stored

56
00:04:43,780 --> 00:04:45,140
in the light Raffinate storage tank

57
00:04:45,140 --> 00:04:45,960
and the heavier components

58
00:04:45,960 --> 00:04:48,260
in the heavy Raffinate storage tank.

59
00:04:48,260 --> 00:04:50,260
The overall volume of this tank is huge.

60
00:04:50,260 --> 00:04:54,100
It's about 3,700 barrels worth.

61
00:04:54,100 --> 00:04:55,260
So it's quite decent.

62
00:04:55,260 --> 00:05:00,460
The investigation into the incident

63
00:05:00,460 --> 00:05:05,340
identified that there was organisational and safety deficiencies at all levels of the organisation

64
00:05:05,340 --> 00:05:12,740
at that point. But where do we begin getting ahead of ourselves? There was an 11-hour period

65
00:05:12,740 --> 00:05:18,660
leading up to the explosion. Several units, ISOM units, were shut down for maintenance

66
00:05:18,660 --> 00:05:24,900
works. Over a thousand subcontractors were on site during these works. Now BP had placed

67
00:05:24,900 --> 00:05:31,060
10 portable offices near the Ultra Cracker Unit and several others near other nearby

68
00:05:31,060 --> 00:05:37,540
process units for convenience purposes during the maintenance works.

69
00:05:37,540 --> 00:05:41,880
Now these particular portable offices if you're not familiar with them are typically a wooden

70
00:05:41,880 --> 00:05:47,660
frame and they'll have a light sheet metal on the outside and some insulating paneling

71
00:05:47,660 --> 00:05:52,660
on the inside generally fitted with some air conditioning units to keep them cool in the

72
00:05:52,660 --> 00:05:58,900
summer and they are essentially arrive in pieces and assembled on site and

73
00:05:58,900 --> 00:06:02,060
sometimes you'll see them in the back of a truck going down the street in pieces

74
00:06:02,060 --> 00:06:08,100
and these temporary offices are very popular because you have permanent

75
00:06:08,100 --> 00:06:11,180
facilities on site, but during construction you need more space because

76
00:06:11,180 --> 00:06:15,020
you have more documentation, people coming and going for meetings and so on

77
00:06:15,020 --> 00:06:20,180
and it's it's they're basically a great economical way of housing all of the

78
00:06:20,180 --> 00:06:25,380
maintenance staff and all of the construction staff during those works.

79
00:06:25,380 --> 00:06:29,740
It's a very popular, very common place. Some people refer to them in Australia

80
00:06:29,740 --> 00:06:34,620
as "Dongas". Irrespective, these particular buildings were placed where

81
00:06:34,620 --> 00:06:37,980
they were because they were physically close, they were approximate to the works

82
00:06:37,980 --> 00:06:43,580
being undertaken, so that people could walk, you know, several hundred feet from

83
00:06:43,580 --> 00:06:46,460
where they are actually doing the work back to the site office to have a look

84
00:06:46,460 --> 00:06:51,580
drawings, have a meeting and then go back out to the site again. So from a convenience point of

85
00:06:51,580 --> 00:06:59,580
view that sounded great. Unfortunately however they were only 120 feet or 35 meters away from

86
00:06:59,580 --> 00:07:04,700
the base of the blowdown drum which would later become somewhat of a problem.

87
00:07:04,700 --> 00:07:14,540
The funny thing is, the odd thing is, if there is an explosion you're actually safer out in

88
00:07:14,540 --> 00:07:18,740
in the open atmosphere than you are actually inside a building because rather like a cyclone

89
00:07:18,740 --> 00:07:25,780
or a hurricane or a typhoon, whatever you want to call it, or a tornado in that case,

90
00:07:25,780 --> 00:07:30,720
it's not so much the explosion, but it's the flying debris that'll end up killing you.

91
00:07:30,720 --> 00:07:33,880
Notwithstanding blast pressure, of course, and pressure differential, because that'll

92
00:07:33,880 --> 00:07:38,980
also cause kidneys and internal organs to rupture if you're close enough to the actual

93
00:07:38,980 --> 00:07:43,220
point of the explosion. But that won't matter if you're inside or outside of a building.

94
00:07:43,220 --> 00:07:46,740
I say building, obviously I'm not talking about a concrete bunker. You put a concrete bunker in an

95
00:07:46,740 --> 00:07:50,340
explosion, next to an explosion, you're probably going to fare okay if you're inside there.

96
00:07:50,340 --> 00:07:57,060
But most of these buildings aren't bomb shelters. So, anyway. The guidelines at the time of the

97
00:07:57,060 --> 00:08:02,580
incident, they weren't actually strict enough to disallow temporary structures, but these were

98
00:08:02,580 --> 00:08:09,140
amended after this incident occurred. There was actually a management change review when they

99
00:08:09,140 --> 00:08:15,140
placed the buildings in the location next to the ISOM unit, but they didn't follow up on several

100
00:08:15,140 --> 00:08:19,220
action items from that management change review meeting and they never completed a full risk

101
00:08:19,220 --> 00:08:26,660
assessment of a potential explosion during the review meetings. BP's also had its own procedures

102
00:08:26,660 --> 00:08:33,220
requiring SIMOPs, Simultaneous Operations is what we call it these days, and an evacuation

103
00:08:33,220 --> 00:08:38,180
of those buildings in nearby surrounding areas during the startup and that wasn't followed.

104
00:08:38,180 --> 00:08:45,740
So the resulting explosion actually impacted trailers that were 479 feet, which is about

105
00:08:45,740 --> 00:08:47,380
150 meters away.

106
00:08:47,380 --> 00:08:50,540
And people in those trailers still resulted in injuries.

107
00:08:50,540 --> 00:08:52,940
They still had injuries.

108
00:08:52,940 --> 00:08:57,340
Anyway, the closest office was a double wide wooden frame trailer.

109
00:08:57,340 --> 00:09:00,360
It had 11 offices in it and they were primarily used for meetings.

110
00:09:00,360 --> 00:09:04,300
When the ISOM unit was ready for recommissioning, there was no notice given to any of the users

111
00:09:04,300 --> 00:09:09,180
of those buildings, anyone in that area that was about to start up. So they'd done their

112
00:09:09,180 --> 00:09:14,660
works, the maintenance was complete, and it was time to turn this thing back on and, well,

113
00:09:14,660 --> 00:09:25,100
essentially fire it up, quite literally. So at 2:15am, the overnight operators injected

114
00:09:25,100 --> 00:09:29,980
some Raffinate into the splitter tower with the level transmitter being mounted at the

115
00:09:29,980 --> 00:09:34,940
bottom of the unit to measure that 6 foot operational level I mentioned earlier.

116
00:09:34,940 --> 00:09:41,700
The tower unit had a maximum liquid operating level of about 9 feet, which is just under

117
00:09:41,700 --> 00:09:45,340
3 meters, that's measured from the bottom of the tank.

118
00:09:45,340 --> 00:09:49,740
The normal operational level, as I said before, was about 2 meters or 6 feet.

119
00:09:49,740 --> 00:09:55,620
But during the startup process, that level could fluctuate and lower levels in the tower

120
00:09:55,620 --> 00:10:01,620
was thought by the operators at the time could cause damage to the furnace or the burners.

121
00:10:01,620 --> 00:10:08,980
So the idea is we heat up the liquid in an external set of burners and these burners as

122
00:10:08,980 --> 00:10:15,060
they warm up the the Raffinate it's then pumped into the base of the tower and circulated through.

123
00:10:15,060 --> 00:10:19,780
Once it's in the towers of course the the vapors are then extracted off the top.

124
00:10:21,060 --> 00:10:27,140
But the problem is the relative level of the burner was not quite low enough or rather relative to the

125
00:10:27,140 --> 00:10:32,260
bottom of the base of the tank. So if the level of the base of the tank got too low, there was a

126
00:10:32,260 --> 00:10:37,460
concern that the level in the burners would also get too low and that that would then cause a

127
00:10:37,460 --> 00:10:45,220
subsequent damage to the furnace. What's not clear in the aftermath is actually whether or not that

128
00:10:45,220 --> 00:10:50,740
that was true. Irrespective of whether it was true, it was never thoroughly investigated

129
00:10:50,740 --> 00:10:54,220
during the lead up as to whether or not it was actually a problem or if it was just a

130
00:10:54,220 --> 00:11:00,340
perceived problem. You know, the whole idea of the monkeys and the banana in a cage. You

131
00:11:00,340 --> 00:11:05,940
train three people, three of the monkeys, sorry, people, monkeys. You train three monkeys

132
00:11:05,940 --> 00:11:10,660
with the fire hose. Don't go up and don't get the banana. You walk up to the banana,

133
00:11:10,660 --> 00:11:14,020
you spray them with the fire hose, you do that, all three of them over and over again.

134
00:11:14,020 --> 00:11:17,300
And after a while they realize, "Well, we don't get the banana or we get sprayed by

135
00:11:17,300 --> 00:11:18,300
the hose."

136
00:11:18,300 --> 00:11:23,020
You bring a fourth person in, a fourth monkey into the cage, the other three will stop that

137
00:11:23,020 --> 00:11:25,940
fourth monkey from going to get the banana.

138
00:11:25,940 --> 00:11:30,280
And that's an institutional learning thought experiment.

139
00:11:30,280 --> 00:11:35,020
No monkeys were harmed in the making of that analogy.

140
00:11:35,020 --> 00:11:42,980
Anyway, so rather than investigate that, they simply deviated from the documented procedure.

141
00:11:42,980 --> 00:11:47,300
The documented procedure said not to fill it beyond the normal operational level of

142
00:11:47,300 --> 00:11:48,300
6 feet.

143
00:11:48,300 --> 00:11:51,620
Instead, they went to the 9 feet because that's just the way that they had learned to do it

144
00:11:51,620 --> 00:11:58,100
because they were concerned it may damage the burner unit.

145
00:11:58,100 --> 00:12:04,700
At 3:09am, the audible high level alarm went off at the 8 foot mark.

146
00:12:04,700 --> 00:12:07,900
That's driven by the level transmitter of course mounted on the side.

147
00:12:07,900 --> 00:12:11,220
A level transmitter measures the level in an analog sense.

148
00:12:11,220 --> 00:12:17,620
0% up to 100% probably at 9 feet of operational level.

149
00:12:17,620 --> 00:12:25,100
However, for redundancy purposes, they had fitted a secondary fixed position high level

150
00:12:25,100 --> 00:12:31,060
alarm, so a level switch essentially, and it was set just above the analog high level

151
00:12:31,060 --> 00:12:35,060
set point, so just above that 8 foot mark.

152
00:12:35,060 --> 00:12:38,580
However, that high level alarm never went off.

153
00:12:38,580 --> 00:12:46,580
Now, after the incident had occurred, they found that the level switch had actually been listed as faulty as early as two years prior.

154
00:12:46,580 --> 00:12:52,940
The worst part of it, the maintenance work orders that had been raised to address that issue,

155
00:12:52,940 --> 00:12:58,140
each time that maintenance came around, they identified it was faulty,

156
00:12:58,140 --> 00:13:04,700
but then they closed the work order each time it had been reported and never actually repaired it.

157
00:13:06,020 --> 00:13:08,740
There's no evidence as to why they did that.

158
00:13:08,740 --> 00:13:12,780
By 3:30am in the morning,

159
00:13:12,780 --> 00:13:14,920
level indicator was showing a full level

160
00:13:14,920 --> 00:13:16,140
of nine feet of liquid.

161
00:13:16,140 --> 00:13:20,340
And the operators shortly thereafter noticing this

162
00:13:20,340 --> 00:13:22,480
stopped filling the tower at that point.

163
00:13:22,480 --> 00:13:25,220
During their investigation,

164
00:13:25,220 --> 00:13:27,780
the Chemical Safety Board found

165
00:13:27,780 --> 00:13:31,500
that the level had actually reached as high as 13 feet.

166
00:13:31,500 --> 00:13:33,020
That's four meters,

167
00:13:33,020 --> 00:13:36,700
which is four feet above the actual maximum level.

168
00:13:36,700 --> 00:13:39,720
And the operators were not aware of this

169
00:13:39,720 --> 00:13:42,260
because the level transmitter only reported

170
00:13:42,260 --> 00:13:44,760
a maximum of nine feet of level.

171
00:13:44,760 --> 00:13:46,460
Beyond that, there was no sensing.

172
00:13:46,460 --> 00:13:50,240
And the high level switch wasn't working.

173
00:13:50,240 --> 00:13:55,240
So they had no way of knowing if it was 9.1 feet or 900 feet.

174
00:13:55,240 --> 00:13:57,840
It's not possible to know.

175
00:13:57,840 --> 00:14:00,200
You may ask yourself, why on earth would you do that?

176
00:14:00,200 --> 00:14:07,700
Why would you set the operational level of a level transmitter to such a small fraction of the overall height of the tower?

177
00:14:07,700 --> 00:14:10,740
Normally you do that to provide better resolution.

178
00:14:10,740 --> 00:14:17,420
So you'd say, I would like more accuracy because I'm turning this into a 16-bit integer.

179
00:14:17,420 --> 00:14:24,500
So I have, you know, let's say 0 to 32,767 discrete points that I can indicate my range.

180
00:14:24,500 --> 00:14:29,380
You know, convert that into a percentage and it's, you know, maybe one decimal point, maybe two tops.

181
00:14:29,380 --> 00:14:33,460
I don't know, sometimes you want as best as much resolution as possible.

182
00:14:33,460 --> 00:14:37,540
If you were to take the same level, scale it over the entire 170 feet,

183
00:14:37,540 --> 00:14:41,340
the resolution would be much less, significantly less.

184
00:14:41,340 --> 00:14:46,060
However, at least that would have the virtue of knowing exactly how much liquid was in it.

185
00:14:46,060 --> 00:14:49,740
Clearly, when they did the design, they didn't think that was a problem.

186
00:14:49,740 --> 00:14:57,380
So at this point in time, the startup procedure, the restart procedure,

187
00:14:57,380 --> 00:15:02,900
whatever you want to call it, for this particular ISOM unit was being handled by the lead operator,

188
00:15:02,900 --> 00:15:04,540
but they were in a secondary control room.

189
00:15:04,540 --> 00:15:08,100
And this is still the night shift operator.

190
00:15:08,100 --> 00:15:13,020
At 5:00am, the handover and the status of the restart with the operator that was currently on

191
00:15:13,020 --> 00:15:18,900
duty in the main control room, one hour before the end of the shift was due to finish, was

192
00:15:18,900 --> 00:15:21,940
essentially done over the radio.

193
00:15:24,020 --> 00:15:30,340
So the duty operator in the main control room did take enough time to make a note before they left

194
00:15:30,340 --> 00:15:37,140
at 6am. So at 6:00am the day shift operator came into the main control room. There was a brief handover.

195
00:15:37,140 --> 00:15:45,300
Now he was working day 30 in a row. So he'd done 29 days in a row at that point. This was now day 30

196
00:15:45,300 --> 00:15:51,460
and each day 12 hour days and this is not uncommon during restart maintenance procedures where

197
00:15:51,460 --> 00:15:53,300
where there's more work to be done.

198
00:15:53,300 --> 00:15:55,800
Sometimes they'll work extended long shifts.

199
00:15:55,800 --> 00:16:00,020
Now these days, most industry standard

200
00:16:00,020 --> 00:16:02,040
and certainly the company I currently work of

201
00:16:02,040 --> 00:16:05,460
is there's a limitation of the maximum 21 days straight.

202
00:16:05,460 --> 00:16:07,380
So you can't do more than three weeks straight

203
00:16:07,380 --> 00:16:09,380
without several days off

204
00:16:09,380 --> 00:16:12,180
because they recognize that long-term

205
00:16:12,180 --> 00:16:14,260
that becomes a fatigue issue.

206
00:16:14,260 --> 00:16:15,220
There's even stricter rules

207
00:16:15,220 --> 00:16:16,980
regarding number of hours in a day.

208
00:16:18,540 --> 00:16:27,040
The note that the main control room operator, who let's not forget only had it verbally handed over to him an hour before he left,

209
00:16:27,040 --> 00:16:34,440
basically wrote down the logbook, ISOM, and this is exactly what he wrote, he wrote:

210
00:16:34,440 --> 00:16:44,840
"...brought in some Raff to unit to pack Raff with," which is kind of, yeah, a bit vague.

211
00:16:44,840 --> 00:16:47,640
It didn't say how long, it didn't say what to do with the Raff,

212
00:16:47,640 --> 00:16:52,300
It didn't say whether the burners are on or off or if it had been preheated or at what level.

213
00:16:52,300 --> 00:16:53,560
Nothing, nothing at all.

214
00:16:53,560 --> 00:16:54,720
Just brought in some Raff.

215
00:16:54,720 --> 00:16:58,280
Okay. Not particularly descriptive.

216
00:16:58,280 --> 00:17:08,440
So, at 7:15am, the day shift supervisor arrived and we note that that was more than an hour late.

217
00:17:08,440 --> 00:17:11,040
He was supposed to be there at six o'clock for the handover.

218
00:17:12,360 --> 00:17:18,840
Ordinarily, and certainly in my experience, verbal face-to-face handover is a requirement,

219
00:17:18,840 --> 00:17:23,960
such that when the day shift operator comes in, they speak face-to-face and go through the list

220
00:17:23,960 --> 00:17:28,360
of issues, what's going on with the night shift operator before the night shift operator is

221
00:17:28,360 --> 00:17:33,400
allowed to leave. Sometimes when day shift operators are late, the night shift operator

222
00:17:33,400 --> 00:17:38,680
simply is not allowed to leave until there is a handover, which makes sense, seems reasonable.

223
00:17:39,400 --> 00:17:43,160
The problem was that the day shift supervisor who had the most experience,

224
00:17:43,160 --> 00:17:50,280
because he was over an hour late, unfortunately some of that information was lost and all they

225
00:17:50,280 --> 00:17:54,120
had to work with was that one line that was rather vague in the logbook.

226
00:17:54,120 --> 00:18:01,080
At 9:51am operators resumed the startup sequence by starting to recirculate the liquid,

227
00:18:01,080 --> 00:18:07,080
the Raffinate in the lower loop and they added more liquid to the ISOM unit, thinking that it

228
00:18:07,080 --> 00:18:08,760
it still needed to be topped up a bit more.

229
00:18:08,760 --> 00:18:14,200
Ordinarily, the level in the tank is controlled by a

230
00:18:14,200 --> 00:18:17,400
discharge valve, an automatic level control valve that had

231
00:18:17,400 --> 00:18:19,520
been set into manual in a SCADA system.

232
00:18:19,520 --> 00:18:24,000
And because the final routing destination for the circulated

233
00:18:24,000 --> 00:18:27,600
Raff was not made clear, there was actually conflicting

234
00:18:27,600 --> 00:18:29,400
opinions as to where it should go.

235
00:18:29,400 --> 00:18:34,520
The operators decided to leave the valve shut for nearly two

236
00:18:34,520 --> 00:18:37,880
hours while they continued to put more raft in the tower.

237
00:18:37,880 --> 00:18:43,040
Just prior to 10:00am, the furnace was turned on to begin

238
00:18:43,040 --> 00:18:48,480
heating up. Just before 11:00am, the shift supervisor was

239
00:18:48,480 --> 00:18:52,720
called away on an urgent personal matter, leaving just

240
00:18:52,720 --> 00:18:54,360
the one operator in control.

241
00:18:54,360 --> 00:18:58,400
Now, this particular operator was nowhere near as experienced

242
00:18:58,400 --> 00:19:01,360
as the supervisor who had just had to leave due to personal

243
00:19:01,760 --> 00:19:08,000
concerns and there was no one else available to replace them. So this operator on their 30th day

244
00:19:08,000 --> 00:19:13,920
straight of this swing was in charge of three refinery units, one of which one of the ISOM

245
00:19:13,920 --> 00:19:19,120
units was actually going through the startup sequence, which requires more attention than

246
00:19:19,120 --> 00:19:26,240
something that's currently running. Six years previously BP had bought this particular site

247
00:19:26,240 --> 00:19:30,040
from Amoco. And when that happened, as part of the merger,

248
00:19:30,040 --> 00:19:35,440
they made a decision at a board level to reduce headcount by 25%,

249
00:19:35,440 --> 00:19:40,400
essentially a 25, like one quarter, fixed cost reduction

250
00:19:40,400 --> 00:19:45,880
across all the refineries. There's this mentality, when you

251
00:19:45,880 --> 00:19:49,560
have a merger between two corporate organizations, that

252
00:19:49,560 --> 00:19:54,160
there is going to be duplication and role duplication. So of

253
00:19:54,160 --> 00:19:56,280
course, what's the first thing you want to do? Well, we're

254
00:19:56,280 --> 00:19:59,120
going to cut-back. Why? Well, because there must be a lot of

255
00:19:59,120 --> 00:20:04,400
duplication, I guess. Problem is, you don't always let go some

256
00:20:04,400 --> 00:20:09,180
of the people that are perhaps not the best performers. Good

257
00:20:09,180 --> 00:20:13,520
people are the first to leave usually. And unfortunately,

258
00:20:13,520 --> 00:20:16,660
there were some people that were let go that played key roles in

259
00:20:16,660 --> 00:20:22,560
that organization. There was a panel investigating it

260
00:20:22,560 --> 00:20:26,640
independent of the CSB called the Baker Panel, following this

261
00:20:26,640 --> 00:20:29,760
specific incident in investigating it. And they

262
00:20:29,760 --> 00:20:32,760
concluded that restructuring following the merger resulted in

263
00:20:32,760 --> 00:20:38,280
a significant loss of people, expertise and experience. So at

264
00:20:38,280 --> 00:20:42,400
that point, they had cut back from two operators on the panel

265
00:20:42,400 --> 00:20:49,440
to one as a part of their 25% cost reduction. A second panel

266
00:20:49,440 --> 00:20:52,520
operator, I think would have definitely made a difference.

267
00:20:52,520 --> 00:20:59,320
Unfortunately there was just the one and he was fatigued I think it's reasonable to assume.

268
00:20:59,320 --> 00:21:09,720
So at this point in time just before 11am there had been no regulation of the level in the tank

269
00:21:09,720 --> 00:21:17,800
at all. The indicator still showed an incorrect level and they had no way of knowing that the

270
00:21:17,800 --> 00:21:25,000
the amount of Raffinate in that tower had actually reached by lunchtime, it had reached 98 feet.

271
00:21:25,000 --> 00:21:27,280
98.

272
00:21:27,280 --> 00:21:36,200
The level transmitter, the particular kind of level transmitter that was used, was actually now

273
00:21:36,200 --> 00:21:43,480
starting to go backwards. Some level transmitters that work on pressure, what ends up happening is

274
00:21:43,480 --> 00:21:47,360
that they will start to get anomalous readings based on the liquid that's used

275
00:21:47,360 --> 00:21:52,160
because the density of the liquid will change as the head pressure changes and

276
00:21:52,160 --> 00:21:53,240
that will affect the reading.

277
00:21:53,240 --> 00:21:59,280
So what's happened is this particular meter had been calibrated, but it was

278
00:21:59,280 --> 00:22:04,000
calibrated against data that had been mapped out in 1975.

279
00:22:04,000 --> 00:22:08,480
I presume, although it was not clear from the report when they originally fitted

280
00:22:08,480 --> 00:22:09,960
that specific level gauge.

281
00:22:11,280 --> 00:22:16,280
The worst part of it though was that it was for a completely different process with a different kind of liquid,

282
00:22:16,280 --> 00:22:20,680
meaning that the calibration from data from 1975 was completely wrong anyway.

283
00:22:20,680 --> 00:22:29,280
The external sight glass, which is essentially just what it sounds like, a piece of glass that allows you to see through it,

284
00:22:29,280 --> 00:22:32,880
and you can visibly see the level in the tank.

285
00:22:32,880 --> 00:22:36,980
Well, that sight glass was so dirty, it was completely unreadable.

286
00:22:36,980 --> 00:22:39,280
It hadn't been cleaned or maintained.

287
00:22:40,280 --> 00:22:46,280
other option, perhaps it was became dirty during the maintenance work but was never clean prior to startup.

288
00:22:46,280 --> 00:22:56,320
Now, the particular level transmitter was never tested against a level well above the nine foot mark because that wasn't normal.

289
00:22:56,320 --> 00:23:02,640
It wasn't normal to fill the tower that high. That was supposed to be left as free space for the vapor.

290
00:23:02,640 --> 00:23:06,960
And now for the SCADA component.

291
00:23:08,400 --> 00:23:17,360
When you have a tower, a tank, any kind of vessel that can store liquid, the flow in and the flow out

292
00:23:17,360 --> 00:23:23,520
of that tank are critical pieces of information. You put the two together with a level indication

293
00:23:23,520 --> 00:23:29,520
and you can very quickly see flow in minus flow out and if your level is changing you can tell

294
00:23:29,520 --> 00:23:35,280
if one of those three does not agree. So if you're putting in lots and lots of fluid and

295
00:23:35,280 --> 00:23:38,160
and there's no fluid coming out and the level isn't changing,

296
00:23:38,160 --> 00:23:39,920
you know something is wrong.

297
00:23:39,920 --> 00:23:44,360
Unfortunately, the SCADA screens were designed

298
00:23:44,360 --> 00:23:46,960
where the flow in and out of this tower

299
00:23:46,960 --> 00:23:49,760
was shown on completely different screens.

300
00:23:49,760 --> 00:23:52,660
You couldn't have them up up at the same time.

301
00:23:52,660 --> 00:23:55,280
So it was very, very difficult to tell

302
00:23:55,280 --> 00:23:58,040
if anything was actually wrong.

303
00:23:58,040 --> 00:24:00,220
Had they been on the same screen,

304
00:24:00,220 --> 00:24:02,640
it would have been glaringly obvious

305
00:24:02,640 --> 00:24:04,280
that something was wrong.

306
00:24:04,280 --> 00:24:06,120
And if I'd been designing this thing,

307
00:24:06,120 --> 00:24:08,240
it's just normal practice to put the flow in

308
00:24:08,240 --> 00:24:11,120
and the flow out either horizontally opposed to each other,

309
00:24:11,120 --> 00:24:13,560
one on the left and one on the right-hand side of the tank,

310
00:24:13,560 --> 00:24:15,260
or to have them above or below each other

311
00:24:15,260 --> 00:24:18,200
so that you can easily visually compare them

312
00:24:18,200 --> 00:24:20,040
and deduce based on the level indication

313
00:24:20,040 --> 00:24:22,160
or volume indication if something's wrong.

314
00:24:22,160 --> 00:24:25,400
In this case, they didn't have that.

315
00:24:25,400 --> 00:24:29,640
More advanced alarming and controls

316
00:24:29,640 --> 00:24:33,280
might've even included an automatic sanity calculation

317
00:24:33,280 --> 00:24:37,280
that actually does track flow in flow out versus volume. I've done something like that in the past.

318
00:24:37,280 --> 00:24:43,360
You know, control systems are there to help. So you could do that calculation automatically and

319
00:24:43,360 --> 00:24:50,000
make it even easier for the operator. Again, didn't exist in this case. So at around about lunchtime

320
00:24:50,000 --> 00:24:53,920
several of the contractors that weren't involved with the recommissioning of the ISOM unit,

321
00:24:53,920 --> 00:24:58,720
they were there for different projects going on at the same at the time, they'd left the site for a

322
00:24:58,720 --> 00:25:06,820
team lunch and that team lunch was celebrating one month LTI free. LTI is

323
00:25:06,820 --> 00:25:12,880
Lost Time Injury. And how's that for a bit of irony for you?

324
00:25:12,880 --> 00:25:19,300
At this point now in our chain of events we're up to 12:41pm so just after

325
00:25:19,300 --> 00:25:25,200
midday. At that point a high-pressure alarm went off in the top of the tower

326
00:25:25,200 --> 00:25:30,640
because the liquid level had now become so high that the gases that were

327
00:25:30,640 --> 00:25:36,240
escaping at the top were now compressing and that high pressure should have been

328
00:25:36,240 --> 00:25:41,600
an indicator that something was really not right. However the problem was the

329
00:25:41,600 --> 00:25:45,160
level reading was still incorrect it was still showing about eight and a half

330
00:25:45,160 --> 00:25:49,560
feet and it was dropping because the calibration on the meter was completely

331
00:25:49,560 --> 00:25:53,560
wrong despite the fact that there was over a hundred feet of liquid in the

332
00:25:53,560 --> 00:25:58,000
tower. So because the reading was incorrect, the operator was

333
00:25:58,000 --> 00:26:00,640
confused, why would they be getting a high pressure alarm at

334
00:26:00,640 --> 00:26:03,560
the top of the tower? It made no sense. So what they decided to

335
00:26:03,560 --> 00:26:08,800
do is the rather obvious thing, which is you open the manual

336
00:26:08,800 --> 00:26:12,200
valve, and that's designed to vent high pressures to the

337
00:26:12,200 --> 00:26:17,800
emergency relief system. Now, the actual emergency relief

338
00:26:17,800 --> 00:26:20,920
system was a little bit antiquated. It's what we refer

339
00:26:20,920 --> 00:26:28,000
to as a cold vent and in that particular case a blow-down drum. Same idea. The idea

340
00:26:28,000 --> 00:26:34,440
though is it's not flared and a flare essentially is a cold vent with a torch

341
00:26:34,440 --> 00:26:40,720
on the end and all we do is as we vent hydrocarbons we put a match to them. It may

342
00:26:40,720 --> 00:26:45,120
sound a bit crazy but it's actually oddly a safer thing to do because if you

343
00:26:45,120 --> 00:26:50,320
have hydrocarbons escaping cold into the atmosphere if the concentration of those

344
00:26:50,320 --> 00:26:54,560
hydrocarbons is just right and they meet an ignition source that's unintended

345
00:26:54,560 --> 00:26:59,080
then you will have a fire and a fireball and an explosion and people will probably

346
00:26:59,080 --> 00:27:04,840
get killed. In this particular case they were considering putting a flare on the

347
00:27:04,840 --> 00:27:10,680
top of this blowdown drum. It was actually built in the 1950s but it was

348
00:27:10,680 --> 00:27:19,000
not fitted at this point so it was a cold vent or blowdown drum. So in

349
00:27:19,000 --> 00:27:23,880
In addition to opening up the manual release valve

350
00:27:23,880 --> 00:27:27,200
that then gave the high pressure gases a secondary path

351
00:27:27,200 --> 00:27:30,040
to escape and reduce the pressure at the top of the tower

352
00:27:30,040 --> 00:27:32,860
to the blow down drum,

353
00:27:32,860 --> 00:27:34,480
what they also did is they also turned off

354
00:27:34,480 --> 00:27:36,320
two of the burners in the furnace.

355
00:27:36,320 --> 00:27:39,280
And the idea was that by reducing the temperature

356
00:27:39,280 --> 00:27:41,040
in the furnace, it'll reduce the temperature

357
00:27:41,040 --> 00:27:42,620
of the Raffinate, which would then of course,

358
00:27:42,620 --> 00:27:44,240
ultimately reduce the pressure

359
00:27:44,240 --> 00:27:46,120
in the top of the tower as well.

360
00:27:46,120 --> 00:27:48,300
I mean, that's gonna make little or no difference,

361
00:27:48,300 --> 00:27:53,900
essentially because the volume of hot Raffinate in that tower is huge at that point.

362
00:27:53,900 --> 00:27:58,460
There had been so much in there that it was just, yeah, it wasn't going to make any difference.

363
00:27:58,460 --> 00:28:03,540
At this point in time, they knew something wasn't quite right.

364
00:28:03,540 --> 00:28:08,660
They didn't go and check the level of the sight glass, they couldn't have if they wanted to,

365
00:28:08,660 --> 00:28:11,220
because it was so dirty, you couldn't tell the difference.

366
00:28:11,220 --> 00:28:12,620
They trusted their instrument.

367
00:28:13,540 --> 00:28:15,220
But then they started to check their flows.

368
00:28:15,220 --> 00:28:21,820
And then they noticed that the flow out of the tower at that point was not right.

369
00:28:21,820 --> 00:28:24,500
It was clearly there was something not quite right.

370
00:28:24,500 --> 00:28:30,060
So what they decided to do was open a drain valve to some of the Raffinate heavy storage tanks.

371
00:28:30,060 --> 00:28:37,700
But the problem was that those drain valves were not designed to operate when the Raffinate was at this sort of temperature.

372
00:28:38,500 --> 00:28:41,620
It was a heat exchanger designed to reduce the temperature.

373
00:28:41,620 --> 00:28:47,060
But because this liquid essentially was coming off straight after the furnace or not far after that,

374
00:28:47,060 --> 00:28:54,340
it ended up actually preheating some of the other liquids by as much as 141°F or 60°C.

375
00:28:54,340 --> 00:29:01,380
So unfortunately, that then had a cumulative effect, which we'll see in a minute.

376
00:29:15,620 --> 00:29:22,220
At this time, the temperature of the liquid was so high, it began to boil at the top of the tower.

377
00:29:23,140 --> 00:29:28,580
The boiling and all of that turbulence then spilled liquid into the vapor line.

378
00:29:28,580 --> 00:29:35,620
Outside of the cylinder of the tank, there was essentially a smaller vapor

379
00:29:35,620 --> 00:29:42,740
pipe and that pipe was supposed to take the Hexane and higher

380
00:29:42,740 --> 00:29:48,980
the more dense hydrocarbon vapors and extract them from the top. That was the design, that was the

381
00:29:48,980 --> 00:29:54,140
whole idea of doing it. But now because the liquid was so high it boiled over

382
00:29:54,140 --> 00:29:58,780
into that vapor line going back down the outside of the tower and into the vent

383
00:29:58,780 --> 00:30:06,380
system. At 1:14pm the three pressure relief valves at the base of the tower

384
00:30:06,380 --> 00:30:13,220
opened and that then was the last thing it released the high temperature liquid

385
00:30:13,220 --> 00:30:20,340
and vapor into the blowdown drum. The liquid started to fill the blowdown drum and it very,

386
00:30:20,340 --> 00:30:26,920
very quickly overflowed through an overflow line into the process sewer. That then set off more

387
00:30:26,920 --> 00:30:32,680
alarms in the control room. But another point of interest, there was in the blowdown drum yet

388
00:30:32,680 --> 00:30:38,120
another level switch. And that level switch indicates a high level alarm ordinarily,

389
00:30:38,120 --> 00:30:46,360
it should never but it's there as a last resort but it did not activate again it

390
00:30:46,360 --> 00:30:51,960
was broken just like the other high-level switch the liquid at that

391
00:30:51,960 --> 00:31:00,720
point had nowhere else to go but up out the vent stack so at this point in time

392
00:31:00,720 --> 00:31:04,080
It was now 1:20pm.

393
00:31:04,080 --> 00:31:08,180
Liquid was spraying out of the top of the blowdown stack.

394
00:31:08,180 --> 00:31:16,280
Several staff on the ground level described it as a geyser of boiling gasoline and vapor.

395
00:31:16,280 --> 00:31:26,580
The vapor cloud spilled from essentially an entire petrol gasoline tanker truck's worth of gasoline out of the top.

396
00:31:26,580 --> 00:31:31,580
It took it about 90 seconds to find an ignition source.

397
00:31:31,580 --> 00:31:37,580
By the time it found an ignition source, the vapor cloud was encompassing the entire ISOM unit,

398
00:31:37,580 --> 00:31:44,580
the demandable trailers nearby, as well as a parking area for vehicles adjacent to the ISOM unit.

399
00:31:44,580 --> 00:31:46,580
And that was where it found its ignition source.

400
00:31:46,580 --> 00:31:53,580
There were two workers. They were parked 25 feet, it's about, it's only 7 meters away,

401
00:31:53,580 --> 00:31:57,580
from the base of the blowdown drum in a diesel powered pickup truck.

402
00:31:57,580 --> 00:32:01,580
They were just sitting there idling.

403
00:32:01,580 --> 00:32:05,580
Waiting for reasons unexplained, but that's okay. They had no

404
00:32:05,580 --> 00:32:09,580
reason to suspect anything was wrong. However,

405
00:32:09,580 --> 00:32:13,580
as the engine began to race, because the engine was now sucking in

406
00:32:13,580 --> 00:32:17,580
through the air line, it was now sucking in all of this gasoline vapour in addition

407
00:32:17,580 --> 00:32:21,580
to the diesel that was being injected into the engine through the normal combustion

408
00:32:21,580 --> 00:32:26,380
process through the fuel injectors. They tried to turn the vehicle off but they couldn't.

409
00:32:26,380 --> 00:32:34,220
So they got out of the vehicle and they ran, which is exactly what I'd do. The problem with diesels,

410
00:32:34,220 --> 00:32:39,100
I say the problem, but one of the the reason that the engine wouldn't shut off is because diesels,

411
00:32:39,100 --> 00:32:44,140
they use high compression ratio inside the combustion chamber to ignite the fuel.

412
00:32:44,140 --> 00:32:49,660
There's no spark plugs. So the air intake however was now bringing the gasoline vapor in

413
00:32:50,220 --> 00:32:55,180
as well as the air and then that was mixing with the diesel from the injectors so if you turn the

414
00:32:55,180 --> 00:33:00,620
engine off you turn the ignition off that'll turn off the fuel pump it'll turn off your injectors

415
00:33:00,620 --> 00:33:07,500
but since the fuel was no longer coming in just from the fuel pump and the injectors it was

416
00:33:07,500 --> 00:33:12,540
actually coming in through the air intake which has no cut off the engine will theoretically

417
00:33:12,540 --> 00:33:18,060
will never stop running and that's exactly what happened they couldn't turn off the engine

418
00:33:19,660 --> 00:33:28,860
So they ran. In a situation like this, the amount of fuel coming into that engine was so great and

419
00:33:28,860 --> 00:33:33,220
the engine raced and essentially went beyond its maximum rev limit.

420
00:33:33,220 --> 00:33:40,100
What was happening is the sequencing was no longer going to work properly and some non-combusted

421
00:33:40,740 --> 00:33:51,940
fuel escaped down the exhaust. A backfire is when that fuel combusts outside of the combustion

422
00:33:51,940 --> 00:34:00,900
chamber and it was a backfire from that engine that ultimately lit the gasoline vapor cloud.

423
00:34:00,900 --> 00:34:10,340
When that happened, the force of the explosion killed 12 of the people in the nearest trailer.

424
00:34:10,340 --> 00:34:18,280
trailer. Another three people were killed in a nearby trailer. The devastation caused

425
00:34:18,280 --> 00:34:30,160
by that explosion took two years to rebuild, not to mention the lives that were lost.

426
00:34:30,160 --> 00:34:37,760
So the investigation.

427
00:34:37,760 --> 00:34:42,320
There were internal BP reports in the years leading up to that incident.

428
00:34:42,320 --> 00:34:46,840
And one of them was tabled to executives in London stating that they had serious concerns

429
00:34:46,840 --> 00:34:50,460
about the potential for a major site incident.

430
00:34:50,460 --> 00:34:59,540
There were 80, that's 8...0...hydrocarbon releases in the year prior to that alone.

431
00:34:59,540 --> 00:35:06,580
80 and that's a lot. So in other words they were venting situations where they had released

432
00:35:06,580 --> 00:35:15,700
hydrocarbons in an unplanned fashion for unplanned reasons. During 2004 there were three major

433
00:35:15,700 --> 00:35:25,220
accidents at BP Texas City. March 30th one of the incidents 30 million dollars worth of damage

434
00:35:25,220 --> 00:35:32,260
but no fatalities. The most disturbing part of it though, the other two incidents occurred

435
00:35:32,260 --> 00:35:38,900
each had one fatality and that was in 2004. I mean when I read that I couldn't believe it

436
00:35:38,900 --> 00:35:49,380
but the worst part beyond the fatalities is that the lost time injuries specifically excluded

437
00:35:49,380 --> 00:35:58,020
fatalities. I have no idea how that makes any sense. Because if someone's dead, they're not

438
00:35:58,020 --> 00:36:03,060
showing up to work, so surely that's lost time. But I suppose the lost time would go on for an

439
00:36:03,060 --> 00:36:08,980
infinite period of time. So therefore maybe they decided, well, we'll have a separate measure for

440
00:36:08,980 --> 00:36:13,300
casualties at work or something, or just didn't include them in the statistics.

441
00:36:14,900 --> 00:36:18,060
That's just mind-blowingly terrible.

442
00:36:18,060 --> 00:36:24,900
So they also did some digging into maintenance budgets.

443
00:36:24,900 --> 00:36:28,300
We mentioned earlier about the 25% cuts.

444
00:36:28,300 --> 00:36:33,500
Well, they increased their funding for maintenance in the 2003-2004 year.

445
00:36:33,500 --> 00:36:38,620
But that money was actually primarily directed to environmental compliance retrofitting

446
00:36:38,620 --> 00:36:43,900
and accident response because they kept having accidents, not preventative maintenance.

447
00:36:43,900 --> 00:36:47,900
Of course preventative maintenance would have helped.

448
00:36:47,900 --> 00:36:59,900
One of their own internal safety surveys just prior to the 2005 disaster noted that production and budget compliance gets recognised and rewarded above anything else.

449
00:36:59,900 --> 00:37:02,900
And that's just a little disturbing.

450
00:37:02,900 --> 00:37:10,900
The only safety metric used to calculate executive bonuses was the personal injury rate from a lost time injury rate.

451
00:37:10,900 --> 00:37:18,880
injury rate. So there was no recognition given for preventative

452
00:37:18,880 --> 00:37:22,800
maintenance plans and strategies for a reduction in the number of

453
00:37:22,800 --> 00:37:28,480
unplanned venting, none of that. And certainly not with fatalities

454
00:37:28,480 --> 00:37:34,720
either, which is insane. They also noted the high executive

455
00:37:34,720 --> 00:37:37,960
turnover rates in the decade leading up to the incident,

456
00:37:38,380 --> 00:37:43,900
There was an average of only 18 months for an executive to stay at that facility.

457
00:37:43,900 --> 00:37:46,260
Just 18 months, a year and a half.

458
00:37:46,260 --> 00:37:51,740
And since executives are rewarded based on profit, and that's driven by production and reducing

459
00:37:51,740 --> 00:37:57,340
spending, there were no major preventative maintenance initiatives implemented during that

460
00:37:57,340 --> 00:38:04,380
decade. And BP essentially at that point had developed a culture that many industrial sites do

461
00:38:04,380 --> 00:38:09,500
these days are focusing on personal safety like slips, trips, falls and

462
00:38:09,500 --> 00:38:11,500
paper cuts of all things.

463
00:38:11,500 --> 00:38:17,100
And they are focusing on that rather than process safety, which is to say

464
00:38:17,100 --> 00:38:22,420
the automation process safety systems, risk analysis, hazard

465
00:38:22,420 --> 00:38:27,060
analysis from a process point of view, as in the plant process.

466
00:38:27,060 --> 00:38:30,980
What are we processing? Well, in this case, oil, hydrocarbons.

467
00:38:31,060 --> 00:38:34,900
Are they explosive? Yeah, that's, they're a problem from that point of view. Yeah.

468
00:38:34,900 --> 00:38:42,820
And the problem is that then that the culture sort of in time, what happens is you get an erosion.

469
00:38:42,820 --> 00:38:53,460
You erode away the employees that are focused on process safety, and they get taken over by people that are far more focused on personal safety, like slips, trips and falls.

470
00:38:53,500 --> 00:38:56,340
Because in many respects, that's easier for a lot of people to

471
00:38:56,340 --> 00:38:58,940
understand than just than process safety, which can be

472
00:38:58,940 --> 00:39:00,980
quite complicated if you don't understand if you're not a

473
00:39:00,980 --> 00:39:04,660
Chemical, Process Engineer, or a Mechanical or Electrical Engineer,

474
00:39:04,660 --> 00:39:07,780
sometimes process safety is very hard to get your head around,

475
00:39:07,780 --> 00:39:12,860
because it's more complex. Anyway, over the years, these

476
00:39:12,860 --> 00:39:18,520
people sort of were eroded out of the organization, leaving the

477
00:39:18,520 --> 00:39:21,660
majority of the high level executives blind to the process

478
00:39:21,660 --> 00:39:26,000
safety risks that were present at that facility. Beyond that,

479
00:39:26,000 --> 00:39:28,560
BP had an internal culture that didn't reward or acknowledge

480
00:39:28,560 --> 00:39:31,380
reporting of potential safety risks. Sometimes we refer to

481
00:39:31,380 --> 00:39:35,720
them as near misses. And the Texas City maintenance manager

482
00:39:35,720 --> 00:39:41,520
noted in an email in prior to the incident, in an email to

483
00:39:41,520 --> 00:39:44,840
executives that BP has a ways to go to becoming a learning

484
00:39:44,840 --> 00:39:50,160
culture and away from a punitive culture. I've worked in a few

485
00:39:50,160 --> 00:39:55,880
different places where if you raise an issue, it's a defect or something's wrong or potential safety issue,

486
00:39:55,880 --> 00:40:03,240
where you get sometimes in extreme cases laughed out of the room. In other cases, you get told you're just

487
00:40:03,240 --> 00:40:10,520
being a nitpicky so and so. We have other things to worry about. We have real problems to worry about, you

488
00:40:10,520 --> 00:40:17,680
know. And that sort of punitive culture that doesn't encourage people from flagging issues and process

489
00:40:17,680 --> 00:40:23,320
safety issues in particular. You know, that's poisonous. I'm

490
00:40:23,320 --> 00:40:26,560
happy to say that my current employer is is not like that. My

491
00:40:26,560 --> 00:40:29,040
current employer is actually very good at taking that

492
00:40:29,040 --> 00:40:35,960
seriously, which is great. In the 19 previous startups for

493
00:40:35,960 --> 00:40:40,200
ISOM units at that facility, in the vast majority of instances,

494
00:40:40,200 --> 00:40:43,280
the operators were running the ISOM units beyond their capable

495
00:40:43,280 --> 00:40:47,160
range of the level transmitter, so above the nine foot mark. But

496
00:40:47,160 --> 00:40:53,420
But it was never investigated as to why, because these weren't considered near misses and it

497
00:40:53,420 --> 00:40:55,360
became normal operational procedure.

498
00:40:55,360 --> 00:41:00,240
Like I said, monkeys and the banana and the fire hose.

499
00:41:00,240 --> 00:41:05,120
These days we investigate near misses as thoroughly as though they were an actual serious incident

500
00:41:05,120 --> 00:41:07,600
where someone was injured.

501
00:41:07,600 --> 00:41:14,080
Because near misses are an indicator of a future problem, where it will no longer be

502
00:41:14,080 --> 00:41:17,000
a near miss.

503
00:41:17,000 --> 00:41:21,740
So finally, how can we prevent this accident?

504
00:41:21,740 --> 00:41:24,740
How could we have prevented this accident?

505
00:41:24,740 --> 00:41:26,460
Accident?

506
00:41:26,460 --> 00:41:27,300
Disaster?

507
00:41:27,300 --> 00:41:31,160
First of all, in safety systems,

508
00:41:31,160 --> 00:41:34,740
we recognize a, we do a safety study,

509
00:41:34,740 --> 00:41:36,820
and the safety studies are designed

510
00:41:36,820 --> 00:41:38,960
to look at all the risk factors and consequences

511
00:41:38,960 --> 00:41:40,440
if things go wrong.

512
00:41:40,440 --> 00:41:42,540
They incorporate reaction times,

513
00:41:42,540 --> 00:41:44,420
damage to the environment, damage,

514
00:41:44,420 --> 00:41:49,780
you know, personal injuries, you know, process damage. All of

515
00:41:49,780 --> 00:41:54,460
these factors and several more are all rolled up into a series

516
00:41:54,460 --> 00:41:59,060
of what we call safety instrument levels, SIL ratings.

517
00:41:59,060 --> 00:42:03,120
And the SIL ratings drive the level of protection that's

518
00:42:03,120 --> 00:42:07,760
required. In particular, this sort of an application would

519
00:42:07,760 --> 00:42:10,520
require redundant level transmitters. A single level

520
00:42:10,520 --> 00:42:13,460
transmitter with a level switch would not be sufficient. We

521
00:42:13,460 --> 00:42:20,620
would have two or possibly three level transmitters identical mounted nearby

522
00:42:20,620 --> 00:42:26,660
independently reporting their positions. They would require 6 to 12

523
00:42:26,660 --> 00:42:30,980
monthly calibration with an independently verified calibration

524
00:42:30,980 --> 00:42:37,420
source. But that was not what was installed.

525
00:42:37,420 --> 00:42:42,140
Safety interlocking to prevent overfilling would also be a common thing.

526
00:42:42,140 --> 00:42:50,140
The pumps that circulated the Raffinate could have been cut off by the high level indicators.

527
00:42:50,140 --> 00:42:56,140
If 1oo2, 2oo3 voted, they could have done that. Built into a safety interlocking system.

528
00:42:56,140 --> 00:43:00,140
That's what we would have installed had it been designed and built today.

529
00:43:00,140 --> 00:43:03,140
Even in the last 10 years.

530
00:43:03,140 --> 00:43:10,140
Flaring seems environmentally questionable, but in terms of safety, because it consumes the hydrocarbons

531
00:43:10,140 --> 00:43:16,140
carbons as they exit to atmosphere, you prevent that build up that could lead to an explosion.

532
00:43:16,140 --> 00:43:19,180
So whether it's environmentally questionable one way or the other, flaring versus cold

533
00:43:19,180 --> 00:43:22,680
venting, that's a separate discussion, not fit for this show perhaps, but ultimately

534
00:43:22,680 --> 00:43:27,180
from a safety point of view, there's no question that flaring done safely in a controlled fashion

535
00:43:27,180 --> 00:43:30,260
is the better option by far.

536
00:43:30,260 --> 00:43:34,960
Now flaring was actually proposed on five separate occasions in the two years leading

537
00:43:34,960 --> 00:43:41,640
up to the accident. But due to production pressures and cost, they were never fitted.

538
00:43:41,640 --> 00:43:45,100
Because you'd have to be offline in order to fit them. Plus they would cost money to

539
00:43:45,100 --> 00:43:52,360
fit. Production was king. So they didn't. Prior to the startup, all of the alarms and

540
00:43:52,360 --> 00:43:57,680
indications were supposed to be checked. That's part of an internal BP procedure that existed

541
00:43:57,680 --> 00:44:04,580
at the time. But they were not performed. Pretty much for most of those 19 prior startups

542
00:44:04,580 --> 00:44:11,300
that the Chemical Safety Board investigated. They just skipped it. Then again, not sure

543
00:44:11,300 --> 00:44:16,280
how you would validate against the level without a functioning sight glass or an independently

544
00:44:16,280 --> 00:44:22,420
calibrated level transmitter or a test liquid. Alright, beyond that, handover. Another big

545
00:44:22,420 --> 00:44:27,300
problem. In the industry, a lot of companies have a policy that there must be a face-to-face

546
00:44:27,300 --> 00:44:33,020
handover, certainly we do, and detailed notes from changing shifts or swings. And that changeover

547
00:44:33,020 --> 00:44:37,340
period is crucial.

548
00:44:37,340 --> 00:44:40,900
Training budgets have been cut to a point where the training was delivered via a standard

549
00:44:40,900 --> 00:44:44,440
computerized system rather than face-to-face.

550
00:44:44,440 --> 00:44:48,220
And whilst computerized systems have their place for consistency and simplicity in many

551
00:44:48,220 --> 00:44:54,780
cases, budgetarily as well of course, but no, face-to-face allows question and answer

552
00:44:54,780 --> 00:45:02,940
and probing. It's essentially a poor replacement for a face-to-face training.

553
00:45:02,940 --> 00:45:12,220
Simulation. Process simulation. Now five years before this incident, simulators were recommended

554
00:45:12,220 --> 00:45:17,580
for operator training purposes. Unfortunately, due to cost, they were never implemented.

555
00:45:17,580 --> 00:45:23,500
They could have had a simulation system, much like my current employer does,

556
00:45:24,060 --> 00:45:29,260
where you can train operators in how to safely start up and shut down a plant. You can inject

557
00:45:29,260 --> 00:45:35,420
errors, faults into the system during startup and shutdown and test the operator to see how

558
00:45:35,420 --> 00:45:41,580
they respond by following the appropriate procedures and proving that they are paying

559
00:45:41,580 --> 00:45:48,060
attention during their training. Now the impacts of mergers and cost reductions as you can see

560
00:45:48,060 --> 00:45:55,920
from this incident, they aren't felt immediately. These things have a habit of taking time before

561
00:45:55,920 --> 00:46:01,740
they're felt. The same with preventative maintenance. The reason we do preventative maintenance

562
00:46:01,740 --> 00:46:08,560
is to make sure that when there is that one in 1,000 time when all of the other elements

563
00:46:08,560 --> 00:46:17,100
line up, so that these safety systems that are designed to protect us will function correctly.

564
00:46:17,100 --> 00:46:24,940
have a thing called CFTs and that's short for critical function testing. CFTs are

565
00:46:24,940 --> 00:46:32,540
run 6 to 12 month basis on most mining oil and gas plants. Why? Because you

566
00:46:32,540 --> 00:46:36,300
need to know that your safety system is going to operate when it should, how it

567
00:46:36,300 --> 00:46:43,900
should and they have to be done all the time, regularly. Production doesn't matter

568
00:46:43,900 --> 00:46:45,900
you have to make sure your CFT works.

569
00:46:45,900 --> 00:46:49,900
Fatigue prevention is another problem.

570
00:46:49,900 --> 00:46:52,900
There was no fatigue prevention policy at BP at the time,

571
00:46:52,900 --> 00:46:56,900
or in the industry as a whole at the time, but now there definitely is.

572
00:46:56,900 --> 00:47:00,900
So in closing,

573
00:47:00,900 --> 00:47:04,900
it may or may not be of interest, but

574
00:47:04,900 --> 00:47:08,900
on the 1st of February 2013, BP sold the refinery

575
00:47:08,900 --> 00:47:12,900
at Texas City to Marathon Petroleum Corporation.

576
00:47:12,900 --> 00:47:17,060
that ended BP's 15 years operating that facility.

577
00:47:17,060 --> 00:47:21,660
It is now called the Marathon Galveston Bay Refinery

578
00:47:21,660 --> 00:47:23,460
and still in operation today.

579
00:47:23,460 --> 00:47:31,500
I think it's important to reflect one last comment

580
00:47:31,500 --> 00:47:36,500
about this disaster and about organisational learnings

581
00:47:36,500 --> 00:47:42,360
and the value of experience and the cost of cutbacks.

582
00:47:42,360 --> 00:47:54,400
There was a statement made by a man called Trevor Kletz in 1993 in his book "Lessons from

583
00:47:54,400 --> 00:47:56,240
Disaster."

584
00:47:56,240 --> 00:48:01,840
He says, "Organizations have no memory.

585
00:48:01,840 --> 00:48:04,640
Only people have memory."

586
00:48:04,640 --> 00:48:06,480
We write reports.

587
00:48:06,480 --> 00:48:09,420
We do investigations,

588
00:48:09,420 --> 00:48:14,940
talk about everything that went wrong and how these people who are

589
00:48:14,940 --> 00:48:19,620
essentially innocent were killed because of an accident.

590
00:48:19,620 --> 00:48:21,740
I hate that word "accident".

591
00:48:21,740 --> 00:48:29,940
Because it suggests that, you know, it was

592
00:48:29,940 --> 00:48:35,060
accidental, it was, you know, but it wasn't, it could have been prevented.

593
00:48:36,700 --> 00:48:42,300
Like they don't call traffic accidents...accidents, I call them traffic collisions because of what it was it was a collision.

594
00:48:42,300 --> 00:48:45,260
Was this an accident? No, it was a disaster.

595
00:48:45,260 --> 00:48:47,380
And the disaster could have been prevented.

596
00:48:47,380 --> 00:48:54,020
If you're enjoying causality and want to support the show, you can like one of our backers, Chris Stone.

597
00:48:54,020 --> 00:48:58,020
He and many others are patrons of the show via Patreon

598
00:48:58,020 --> 00:49:01,980
and you can find it at https://patreon.com/johnchidgey all one word.

599
00:49:01,980 --> 00:49:06,340
So if you'd like to contribute something, anything at all, it's very much appreciated.

600
00:49:06,340 --> 00:49:13,340
This was Causality. I'm John Chidgey. Thanks for listening.

601
00:49:13,340 --> 00:50:32,980
(gentle music)

