Causality 17: Three Mile Island

3 June, 2017

CURRENT

On March 28, 1979 Unit 2 of the Three Mile Island Nuclear Plant in the United States of America an incident would lead to a partial reactor core meltdown. Many blamed the operators for stopping the reactor cooling system but the real root causes showed a known flaw in the design and alarm flooding had blinded the operators to what was actually happening.

Transcript available
´╗┐Chain of events. Caused and effect. We analyze what went right, and what went wrong, as we discover that many outcomes can be predicted, planned for and even prevented. I'm John Chidgey and this is Causality. Causality is part of The Engineered Network. To support our shows including this one, head over to our Patreon page and for other great shows visit https://engineered.network/ today. "Three Mile Island" This is the first in a series of episodes with a focus on control system contributions to disasters. Built on a sandbar in Pennsylvania in the middle of the Susquehanna River between 1968 and 1970 the Three Mile Island Nuclear Plant consisted of 2 reactor cores, both being a Pressurized Water Reactor design. The reactors themselves were designed and built by Babcock & Wilcox, and had many reactors installed around the United States at that time, and it was operated by General Public Utilities whose parent company was Metropolitan Edison. The energy shortages and the energy crisis of the early 1970s where oil prices jumped from $3USD a barrel to $30USD a barrel led to fuel shortages across the United States and that had driven utilities to the lure of cheap nuclear energy. A large number of reactors were built in a relatively short period of time and the Nuclear Regulatory Committee had difficulty keeping up with the demand for certification and compliance of all of these new reactors. The designers had been producing several proof-of-concept plants in the hundred-megawatt range and then, once they'd proven them, scaled them to nearly 1GW with essentially the same design, scaled up with little proof during operation at full size. Nuclear reactors are essentially big steam engines. The nuclear fuel rods have a chain reaction that is slowed down by carbon control rods that absorb neutrons that are released from fission of those fuel rods and heat is withdrawn (or extracted) from the reactor core, by passing very clean water through the reactor. The name explains the basis of a Pressurised Water Reactor design. The primary coolant is kept under a higher pressure to stop the cooling water from boiling and turning into steam. Hence pressure control is vital and a safety system to prevent over-pressurization are essential that they function correctly to ensure the cooling remains under control. The water in the clean water circulation loop needs to be kept extremely clean or it will damage and prematurely wear the pipework inside the high pressure and high temperature sections of the boiler. In this design each unit had 8 condensate polishers, that filtered the clean water condensate before being circulated back through the high-temperature section of the boiler or more specifically the steam generator section of the loop. The secondary cooling loops purpose was to be the heat exchanger with the primary loop, with waste heat evaporated through huge cooling towers, which are commonplace in any thermal electricity generating plant that isn't alongside the ocean. The first unit at Three Mile Island was capable of generating 852MW and it came online on the 19th of April, 1974, followed by a second unit capable of generating 906MW on the 30th of December, 1978. Three Mile Islands Unit 2 had been operating for close to a year but only came online commercially for about 3 months when on Wednesday the 28th of March, 1979 Unit 2 would have a partial meltdown. Unit 1 at the time was offline and shut down for refueling. At approximately 5:30pm on Tuesday, the day before, in the early evening plant operators had attempted to rectify a blockage in one of the aforementioned condensate polishers. The usual practice of clearing the resin from the filter they used to use compressed air, however in this case the blockage was severe enough that this was unsuccessful so the operators instead chose to connect the compressed air to a water line and then use the additional water pressure generated by the airs back pressure to force the resin out. This turned out to be successful. The unit was returned to normal operation and no one thought anymore of it. At 4:37am Eastern Standard Time, Unit 2 secondary loop, which is the second of 3 steam water loops, lost its circulating water flow following a series of valves that had tripped shut. This led to an increase in the temperature of the primary coolant beyond a safety shutdown temperature setpoint. This then caused the primary reactor to shut down with a S.C.R.A.M. and the pilot-operated relief valve opened as designed, to briefly reduce the pressure inside the vessel. The high pressure injection pumps then automatically injected top-up water into the reactors primary coolant systems, as per design. Operators noticed that the level in the pressuriser was rising from the level indicator in the pressuriser. This was the only indication of the reactors cooling water level and although it was not a direct measurement it was rather an indirect measurement from a system of pipe work that was normally hydraulically linked. Operators were trained to ensure reactor coolant wasn't overfilled because if it was, there was a possibility of vessel rupture. By this time the primary coolant pumps were trying to pump both steam and water due to the incident and since they can only pump fluid, cavitation became severe resulting in large knocking and vibration of the primary coolant pumps. For these reasons the operators decided to override and stop the primary coolant pumps from circulating water, believing that the level in the pressurizer was a correct reading of reactor coolant water level, and to protect the coolant pumps from any damage. This ended forced cooling of the reactor core as the decay heat continued to build following the S.C.R.A.M. Refer to episode 3 of this show about Fukushima for the discussion about decay heat and S.C.R.A.M.s. By 6:00am there were about 50 people in the control room trying to figure out what had happened, with one of the operators that had entered at that time was called into the control room and they examined the readings and concluded that the pilot-operated relief valve had not closed as it was only supposed to momentarily open, but for some reason must still be open. At 6:22am a block valve was manually closed to stop the loss of coolant water through the faulty stuck-open pilot-operated relief valve. In the intervening 105 minutes, so much water and steam had been lost through the pilot-operated relief valve that the high-pressure steam had formed, creating gas locks in sections of pipe work and preventing convection cooling of the reactor core. With no forced cooling occurring the reactor temperature continued to climb. At 6:57am a plant supervisor declared a site area emergency shortly after radiation was detected in the control room. At 7:25am Station Manager Gary Miller declared a general emergency which is defined as potential for serious radiological consequences to the general public. There were only 2 phone lines into Unit 2s control room both of which were constantly in use during the incident, and a huge quantity of incoming calls and no direct line was actually available to the Emergency Response Center or to the engineers that had designed the plant. Instead representatives from Babcock & Wilcox had been unable to get through to the control room of Unit 2 but they were able to get through to Unit 1s control room and they had a runner, running messages between the two buildings between Units 1 & 2, relaying instructions, getting printouts, and then running those printouts back to Unit 1 to the phone connection that was open to them. By mid-afternoon operators gradually recommenced high-pressure injection of water into the reactor cooling system at Babcock & Wilcox's direction, in an attempt to increase pressure and force any steam and gases back into the solution. Without this step the primary cooling pumps would not be able to pump and would cavitate as they had earlier in the day and at 7:50pm that day, some 16hrs after the incident had begun, the designers instructed the operators to begin circulating water through the reactor once again. Once the operators did this, the temperatures began to drop...then the pressures began to drop as well. Over the following 2 days the gas build-up from that incident incrementally accumulated in the make up tank of the auxilary building, and the operators used a combination of compressors and pipe reconfiguration to move out as much of that gas as possible to the waste gas decay tanks. Unfortunately the compressors did not reliably seal and a quantity of radioactive gas was released. The following morning it was reported that there was a radioactive gas release and an evacuation plan was suggested for the immediately affected area. It wasn't until 10:00am that the actual amount of gas release was informed to the governor. The governor recommended that pregnant women and school-aged children evacuate a 5mi radius from the Three Mile Island plant. This set off somewhat of a panic. Before reaching the environment the gases had passed through a high-efficiency, particulate air filter sometimes called a HEPA filter, as well as an activated carbon charcoal filter set. This filtration captured all of the radionuclides with the only exception being noble gases. The quantity of gas released was not metered directly. Estimates however following the incident ranged from as little as 1.6PBq (Peta-Becquerels) to a maximum of 480PBq. 1 Peta-Becquerel is 27,000 Curie's. These figures are radioactive decay events not dose absorption figures. The average dose after the incident was estimated from this gas released as an average of 8 millirems per person with a single maximum likely dosage of 100 millirems or 1 milliSievert. An average background radiation dose in the United States is about 360 millirems per person per year, or 3.6 milliSieverts per person per year. The noble gases released had very short half-lives. Weren't absorbed by plants or animals: so-called biologically inert and did not cause an increase to the background radiation dosage levels in the immediate or extended area around the Three Mile Island plant. On the 30th of March and the 1st of April an increase in pressure caused by the exposed Zirc-alloy reaction (again refer to Episode 3) at higher temperatures creates a Hydrogen bubble above the reactor on top of the containment vessel. On Saturday morning some of the calculations suggest that a Hydrogen explosion was an imminent possibility and these were being seriously discussed by the response personnel, by... late Saturday afternoon the possibility of an explosion was leaked to the press setting off a new wave of panic. Operators however bled off the Hydrogen build-up gradually by briefly opening vent valves on the pressurizer, periodically over several days until the pressure had subsided. At the time there were great fears the bubble could cause an explosion however the pressure was never allowed to get high enough and the amount of Oxygen required to reach the Lower Explosive Limit, was nowhere near the required level for an explosion to take place. In an attempt to calm panic, the President of the United States at the time, Jimmy Carter, toured the facility 4 days after the incident had occurred. The tour group he was a part of was protected only by radiation boots, to prevent radioactive water from being absorbed into their shoes and feet. Following this incident, lead bricks were brought in to surround the base of the reactor and the Hydrogen build-up was gradually bled-off and contained. The pressure vessels' pressure was reduced to normal operating conditions. By the 27th of April the decay heat had subsided enough, such that natural convection flow of cooling water was now possible and the plant was in a cold shutdown. With water now below boiling point at standard atmospheric pressure. It wasn't until 3 years after the incident that a camera was able to be lowered safely into the reactor core to determine the full extent of the damage from the incident. They found that 5ft from the top of the reactor core had melted away. That's about 1.5m. Nearly half of the reactor had partly or fully melted down and had...pooled at the bottom head of the pressure vessel in the reactor, where it now lay, solidified. Approximately 19 tonnes of core material in total had melted and flowed to the bottom. 62 tonnes had partly or fully melted which is 45% of the entire reactor core. The reactor core of Unit 2 was within 30min of a complete meltdown. Had a full meltdown occurred it would have become so hot, the entire core would have become a molten blob of metal with self-sustaining heat melting its way through the vessel, concrete foundations and bedrock. Had it progressed to a full meltdown there's little doubt that the sand and water layer beneath the plant would have turned into a superheated radioactive steam, sending a huge amount of radiation through the water table and the local area and atmosphere surrounding the plant. Some disaster projections suggested it had the potential to wipe out an area from Washington DC to New York City, although that eventuality is hotly debated by the nuclear industry. So what went wrong at Three Mile Island? There were both technical errors and human errors. The trigger event was actually a mistake introduced the previous night. In the late afternoon of the preceding day when the operators had attempted a non- standard procedure to clear the resin blockage in one of the... filters, the position of the air line and the water line was very difficult to physically access. It's not entirely clear if it's long-term connection was intended or accidental however the process had allowed an amount of water to enter the instrument air-line. Instrument air is used actuate valves: control valves for a multitude of reasons. The primary being that air can be directed at a valve manifold and the very low current and low voltage relay can signal to open or close the valve or move it to a position using the air as the primary motive force to move the valve physically. It's cleaner and simpler than hydraulic valves because it doesn't leak in the same way and leave mess on the floor and it doesn't get as hot nor does it require thick cabling or take up as much physical space as an all-electric actuator. Unfortunately instrument air has a rather fatal flaw, and that is moisture. If too much moisture enters the valve manifolds they will either actuate without being directed to do so or they will cease to actuate when they are commanded to do so. In the case of Three Mile Island a series of valves on pipe- work connecting the feedwater pumps, condensate pumps and the condensate booster pumps all failed in quick succession, with several key valves all slamming shut...quickly. And this caused a cessation of the secondary cooling water flow into the primary vessel and initiated the chain of events. Once the chain of events had been set in motion though, there were automated systems designed to prevent a loss of cooling to the reactor as you'd expect: it's a nuclear reactor! Basically the plant operators and managers overrided the automatic safety equipment. It was those overrides that led to the reactor core meltdown. Superficially though it's easy to blame plant operators for the Three Mile Island incident. "Blame the operator" right? The truth is that there was a long list of reasons why they got it wrong. The actual real root causes included contributions from: the utility company (Met-Ed), the reactor vendor Babcock & Wilcox, the architect engineer and the Nuclear Regulatory Commission. They were all responsible, either in whole or in part, for deficiencies in training, control room design, instrumentation and equipment selection, the overall plant design and emergency and evacuation procedures. All we'll be looking at, is the exploration of the control system and equipment selection. The control system in use was a Bailey 855 Process Control Computer, and had been widely used by Babcock & Wilcox and their designs for nearly a decade at that point. The Bailey 855 was configured with Visual Annunciator lights as well as a computer printout from 1 of 2 printers. 1 for on- request plant status and the other for system alarms. Due to a limited physical space in the annunciation system, many alarms that were deemed to be less critical only appeared on the computer printout. The printers themselves were electric typewriters, and they were not high-speed though in... this day and age, we'd refer to these as printers, these were technically "Computer Typewriters." And these computer typewriters could print at most 14 alarms every minute. When the alarm rate was greater than the printing rate the system had a memory buffer and that would hold those alarms until the printer could catch up. During routine plant trips the alarm printer, as configured by the designers, could actually take an hour to fully print off all of the alarms that had occurred during a routine plant trip. The plant operators knew about this from their experiences in Reactor 1, and regularly ignored the alarm system from the printer and instead relied solely on the on-demand system status printouts, and alarm annunciator lights. High Water Level in the containment sump was one such alarm that only appeared on the printout and not on its own annunciator. Had the operators received this alarm in a timely and clear fashion, they would have realized that a large amount of water was escaping containment much earlier and it's likely that the block valve would have been closed much, much sooner, preventing such a big loss of primary coolant flow and most likely preventing the meltdown entirely. The unit had 1,200 alarms configured. A few hundred went off in the first minutes of the incident alone. After the incident some operators went on record stating alarms were: "...not very helpful..." and they simply: "...got in the way." They went on to say the day had concluded prior to the incident that: "...the alarms would provide little, if any immediate assistance..." when trying to diagnose and prioritize actions during an event. Poor instrument selection. The reactor coolant drain tank indicators weren't directly visible to the plant operators from the main console in the main control room. Worse than that there were no strip chart recorders. This was the days before graphical trend displays on a computer screen, for the reactor coolant drain tank conditions, this included pressure, temperature and water level. So there were no strip chart recorders, for any of those. There were no instruments that directly measured the water level in the reactor vessel. The level was intended to be surmised from the water level in the pressuriser, which during the incident could not have been expected to give an accurate reading, due to the plant conditions at the time and didn't. Instrumentation selection and ranging for temperature and pressure limits were designed primarily around the normal operational envelope, rather than extreme operating conditions like those experienced during the time of the incident. As a result of this choice most of the instrumentation was flat-lined at either maximum or minimum values and operators essentially had no useful information from which to attempt to diagnose or resolve the situation. The pilot valve. The pilot-operated relief valve was found to have previously failed on 11 occasions in the life of that specific reactor. 9 of those failures had failed in the Failed Open position. Every Failed Open position had resulted in a coolant leak within the containment vessel. The exact failure chain of events had in fact been replicated, 1-1/2yrs before the incident at Three Mile Island at another Babcock and Wilcox reactor of exactly the same design. In this instance however operators determined the failed open condition within 20min, compared to 80min for Three Mile Island. The Davis-Besse Nuclear Power Station was only operating at 9% power at the time its valve failed open, unlike near full power (at 97%) in the Unit 2 at Three Mile Island was producing at the time of its incident. Babcock & Wilcox did not clearly communicate this risk to all of its customers that utilized their reactor designs and not to Three Mile Island prior to the incident. In addition the valve itself did not have an independent and direct indicator of its position either open or closed. Its position instead was inferred based on its commanded output, and that is to say the control system commanded the valve to open and it displayed that the valve was open on the control system. Which technically is control and indication by inference, rather than control by feedback or control by fact. In programming control systems for decades I've learned it is always better to program based on fact, not presumption. Timers waiting for events that could happen or might not happen. Assuming valves open or pumps start, without independent evidence verifying that they have is potentially dangerous. Modern safety systems require direct indication of safety equipment position and loss of that indication when the plant is in use leads to alarm conditions and in some extreme cases will even trip shut the plant. In my experience lack of equipment feedback is predominantly driven by cost. Whether it's an I/O count reduction with less test burden, a simplification of the design or more commonly just the cost of the limit switches themselves on a valve, is considered too exorbitant and unnecessary. It's not clear what drove Babcock & Wilcox's decision to not provide position feedback in this instance. In the aftermath of Three Mile Island the exact radiation dosages that individuals received as they were experiencing and present during the incident at the Three Mile Island facility, is unknown since only 2 of the 7 radiation monitors in the plant were actually functioning. In addition, many of the personal dosage meters handed out weren't correctly recorded during the lead-up to the incident, and they weren't regularly changed out, hence their state when they were carried on people's person, wasn't known when they were going in hence the relative reading when they came out and wasn't known either. Whilst the maximum dose officially was estimated at 1 milliSievert, a 100 milliSievert dose increases the probability of radiation induced cancer by 0.8%. A 1 to 2 Sievert dose will increase the probability of a fatality due to radiation dosage at up to 5%. An 8 to 30 Sievert dose, will increase the probability of fatality to an essential certainty. The fallout from the incident has not shown a significant increase in the number of cancers or infant mortality rate in the area surrounding Three Mile Island. One of the interesting coincidences surround Three Mile Island incident was that the movie "The China Syndrome," which was about a nuclear meltdown had opened in the local movie theater in Harrisburg on the day of the incident. 2,000 gallons of contaminated water was released into the Susquehanna River as a result of the incident. Radioactive rat droppings were also found scattered throughout the building following the events. General Public Utilities said this wasn't an issue because: "...none of the rats had left the island." Showing some indifference to the fact there was no way to know that for sure. Following the incident legal interventions were undertaken against Met-Ed and GPU across multiple areas including management competency. The Atomic Safety and Licensing Board stated the interventions had wasted time and money and that said, 1 week after the Atomic Safety and Licensing Board issued General Public Utilities with a clean bill of health for management competency, 2 operators were caught cheating on their licensing tests and 4 operators in fact failed the tests entirely. The hearings were reopened to determine more stringent tests and all operators re-sat at these more rigorous tests, and still half of them failed. Evacuation plans were required to be drawn up in full detail, and far more thoroughly reviewed including correcting oversights such as putting the nearby city...halfway across its bridge. New safety and training measures were introduced following the incident for nuclear reactors throughout the United States. The clean-up following Three Mile Island Unit 2 took just under 12yrs to complete at a cost of approximately $973M USD. On the 22nd of October, 2009 the US Nuclear Regulatory Commission renewed the operating license for Three Mile Island Unit 1 until the 19th of April, 2034. Unit 2 however remains mostly disassembled with its generator moved in 2 parts refurbished and reused, at the Sheraton Harris Nuclear Plant in New Hill, North Carolina. So what do we learn from this? No matter what process plant you're designing, it's critically important to think about what to show an operator under normal operating conditions naturally, but more importantly, what to show them under abnormal operating conditions, and that includes critical system events. Having an up-to-date training simulator with regular training and refresher training sessions for all operators is crucial to ensure operators know the right way to respond when critical events occur. Critical events generally and hopefully don't happen very often, so people need regular re-visits of how to handle them correctly or we, as humans under pressure, will forget and make mistakes. Beyond that controlling by fact and not by inference, is crucial. And finally having an alarm system is one thing, but filling it with nuisance alarms, incorrectly prioritising those alarms, so they all appear to be equally important, not conditionally muting alarms...and having incorrectly set up consequential alarms: all of these things contribute to rendering the alarm system completely useless. Just like at Three Mile Island. They were operating a nuclear reactor that could kill tens of thousands of people if it went wrong, with confusion, misunderstanding and trying to control a plant whose design had essentially left them blind. It's a miracle we got off that lightly. If you're enjoying Causality and want to support the show you can like some of our backers: Eivind, Daniel Dudley and Chris Stone. They and many others are Patrons of the show via Patreon and you can find it at https://patreon.com/johnchidgey so if you'd like to contribute something, anything at all, it's all much appreciated. Causality is part of The Engineered Network and you can find it at https://engineered.network/ and you can follow me on Mastodon @chidgey@ engineered.space or for our shows on Twitter and @Engineered_Net. This was Causality. I'm John Chidgey. Thanks so much for listening.
Duration 32 minutes and 31 seconds

Show Notes

Related episodes:

Links of potential interest:


Episode Gold Producer: 'r'.
Episode Silver Producers: Carsten Hansen, Eivind Hjertnes and Daniel Dudley.
Premium supporters have access to ad-free, early released episodes with a full back-catalogues of previous episodes
SUPPORT CAUSALITY PATREON PAYPAL ME
STREAMING VALUE SUPPORT BREEZ PODFRIEND SPHINX TRIBE
CONTACT FEEDBACK REDDIT FEDIVERSE TWITTER FACEBOOK
LISTEN RSS APPLE PODCASTS SPOTIFY PANDORA GOOGLE PODCASTS INSTAGRAM STITCHER IHEART RADIO TUNEIN RADIO CASTBOX FM OVERCAST POCKETCASTS CASTRO GAANA JIOSAAVN AMAZON YOUTUBE

People


John Chidgey

John Chidgey

John is an Electrical, Instrumentation and Control Systems Engineer, software developer, podcaster, vocal actor and runs TechDistortion and the Engineered Network. John is a Chartered Professional Engineer in both Electrical Engineering and Information, Telecommunications and Electronics Engineering (ITEE) and a semi-regular conference speaker.

John has produced and appeared on many podcasts including Pragmatic and Causality and is available for hire for Vocal Acting or advertising. He has experience and interest in HMI Design, Alarm Management, Cyber-security and Root Cause Analysis.

You can find him on the Fediverse and on Twitter.