My trip to Kurdistan reminded me how critical a reliable source of electricity is for ensuring peace and prosperity. The rich, as one reader pointed out, can afford to run personal generators, but the poor have no such luxury and are lucky to get a few hours of power per day. Without a reliable source of energy, small businesses (the foundation of most booming economies) have a difficult time surviving. Too often people think oil = energy, including electrical energy, but that is not true. Oil=transportation. Coal, natural gas, wind, and nuclear energy=electrical power. Because of environmental concerns, people (even environmentalists like Stewart Brand) are reconsidering the benefits of generating more electricity using nuclear power (see my earlier post on Brand’s position — Natural or Manmade Environmentalism?).
Other environmentalists, of course, are worried that more nuclear power plants will bring with them a whole new range of challenges, including how to dispose of nuclear waste. Since starting Enterra Solutions a few years ago, I have used a nuclear power example in some of our literature to demonstrate that compliance, security, and performance challenges must now be addressed holistically in order to make them more resilient. In the Brand post mentioned above, I noted that Enterra Solutions has been working with the nuclear industry to figure out how rule set automation can make constructing and operating nuclear plants safer and more efficient. Blog reader Robert Johnson sent me (via Tom Barnett) a real-world example that eerily parallels the illustrative scenario we have been using. The National Regulatory Commission report of the incident reads like a movie script:
“On August 19, 2006, operators at Browns Ferry, Unit 3, manually scrammed the unit following a loss of both the 3A and 3B reactor recirculation pumps. Plant procedures following the loss of recirculation flow required the manual scram. Immediate loss of the recirculation flow placed the plant in a high power, low flow condition where core thermal hydraulic stability problems may exist at boiling-water reactors (BWRs). Generally, intentional operation in this condition, high power and low flow, is not permitted.”
It reminds me a bit of the Holiday Inn Express commercial about problems in a power plant control room. The NRC report goes to note that this emergency was created by a number of simultaneous failures.
“The initial investigation into the dual pump trip found that the recirculation pump variable frequency drive (VFD) controllers were nonresponsive. The operators cycled the control power off and on, reset the controllers, and restarted the VFDs. The licensee also determined that the Unit 3 condensate demineralizer controller had failed simultaneously with the Unit 3 VFD controllers. The condensate demineralizer primary controller is a dual redundant programmable logic control (PLC) system connected to the ethernet-based plant integrated computer system (ICS) network. The VFD controllers are also connected to this same plant ICS network. Both the VFD and condensate demineralizer controllers are microprocessor-based utilizing proprietary software.”
In other words, the plant experienced a computer problem rather than a physical plant problem. But what caused it?
“The licensee determined that the root cause of the event was the malfunction of the VFD controller because of excessive traffic on the plant ICS network. Testing by site personnel performed on the VFD controllers confirmed that the VFD control system is susceptible to failures induced by excessive network traffic.”
The plot thickens. Why was there excessive traffic on the ICS network? The report doesn’t say, it simply provides a primer on why excessive traffic can cause a problem.
“Ethernet is one technology used for local area networking (LAN) of many different types of digital devices such as computers, process controls, modems and PLCs. This allows many of these devices to transfer data over a common communications cable, typically coaxial cable, or special grades of twisted pair wire. It is the most widely used LAN technology today. A data packet is a basic unit of data in a networked environment. In basic networks, data packets are broadcast, meaning sent to each network device, rather than to one specific device. To function properly, a device must be able to effectively handle the broadcast data packets it receives. A key point is that all network devices must allocate time and resources to read and interpret each broadcasted data packet, even if the packet is not intended for that particular device. Excessive data packet traffic on the network may cause connected devices to have a delayed response to new commands or even to lockup, thereby, disrupting normal network operations. This excessive network traffic is sometimes called a broadcast (or data) storm. A firewall is a mechanism used to control and monitor data traffic to and from a network, or device, for the purpose of protecting devices on a network. In effect, it is a filter that blocks unwanted network traffic and limits the amount and type of communication flow. A firewall can act as an intrusion detection system by identifying data packets that are denied access, recognizing data packets specifically designed to cause problems, or reporting unusual (including excessive) traffic patterns, and many other security-based features. The reason the licensee at Browns Ferry investigated whether the failure of one device, the condensate demineralizer PLC, may have been a factor in causing the malfunction of the VFD controllers is that there is documentation of such failures in commercial process control. For instance, a memory malfunction of one device has been shown to cause a data storm by continually transmitting data that disrupts normal network operations resulting in other network devices becoming ‘locked up’ or nonresponsive. A network found to be operating outside of normal performance parameters with a device malfunctioning can effect devices on that network, the network as a whole, or interfacing components and systems. The effects could range from a slightly degraded performance to complete failure of the component or system. Major contributors to these network failures can be the addition of devices that are not compatible, network expansion without a procedure and a overall network plan in place, or the failure to maintain the operating environment for legacy devices already on the network.”
So was the network problem caused by an internal failure or an outside attack? That’s a question that Congress’ Committee on Homeland Security and its Subcommittee on Emerging Threats, Cybersecurity, and Science and Technology want to know. In a letter to the NRC, the chairmen of these committees expressed “deep reservations about the NRC’s hesitation to conduct a special investigation into this incident.” They wrote:
“Unless and until the cause of the excessive network load can be explained, there is no way for either the licensee (power company) or the NRC to know that this was not an external distributed denial-of-service attack.”
Robert Lemos, writing in SecurityFocus, noted that this is not first “cyber challenge” that has affected the nuclear industry [“‘Data Storm’ blamed for nuclear-plant shutdown,” 18 May 2007].
“In January 2003, the Slammer worm disrupted systems of Ohio’s Davis-Besse nuclear power plant, but did not pose a safety risk because the plant had been offline since the prior year. However, the incident did prompt a notice from the NRC warning all power plant operators to take such risks into account. In August 2003, nearly 50 million homes in the northeastern U.S. and neighboring Canadian provinces suffered from a loss of power after early warning systems failed to work properly, allowing a local outage to cascade across several power grids.”
Congress is justified in its concerns. I believe, however, that automated rule sets can be used to detect attacks; alert appropriate managers; alert appropriate local, state and federal agencies; identify affected or at-risk systems and shift them to alternative control mechanisms; trace the attack through communications systems, and perhaps, in the case of a cyber attack, even spoof the attack back to the attacker in order to keep him in communication and make him traceable by law enforcement. Operating rules govern this series of steps, which is why they can be automated and invoked consistently, every time the core asset comes under attack. Nuclear power will have to play a more significant part in America’s (and the globe’s) future. We can (and must) make it as safe and reliable as possible.