Thursday, July 20, 2017

Chemical Plants and Ransomware

There has been an interesting and on-going discussion on TWITTER® related to how chemical plants may be affected by ransomware like WannaCry. It was the result of the publication of two DHS-OCIA FOUO documents about WannaCry (here and here). They were published by PublicIntelligence.

The on-going TWITTER discussion was really based upon one entry in a chart in the second document described above; (U) Table 1—Ransomware Targeting and Susceptibility by Sector. The entry for the Chemical Sector contained the statement: “Chemical plants have manual overrides in place to ensure the safe containment of chemical processes in case cyber defenses fail. In some cases, it may be possible to run the chemical plant independently of cyber controls, otherwise the plant will most likely shut down.”

Most of the discussion has been on where the supporting data for that statement comes (short answer, no one knows) and how accurate that statement is. I cannot provide any information on the first, but a reasonable answer to the second will take more than 140 characters to explain.

Chemical Plant Automation


There is a great deal of variety in the level and sophistication of automation in chemical manufacturing processes. I have worked in a plant where there was absolutely no automation. Sensors were either analog or digital with no connections beyond a power supply. All operations are directly controlled by the operator manually operating various valves and power switches. Plants like this are unusual in this day and age. They are small plants typically running experimental processes on a shoestring budget. They are going to essentially be unaffected by ransomware except on the business process side of the house.

The most sophisticated facilities (and I have seen some of these, but never worked in one) have almost completely automated their chemical manufacturing processes. The extensive and complicated control system requires limited operator oversight; taking a wide mix of sensor data (temperature, pressure, flowrates and valve states for example) processes that data to develop (via a complex process control algorithm) commands to various operations devices (transfer valves, heating, cooling and vacuum controls for example) to control the manufacturing process. The operator actions are fairly limited to starting or stopping the process, making small manual adds of chemicals to the process and watching for process upset conditions.

Most specialty chemical manufacturing (batch processes) have a level of automation somewhere between these two extremes. An operator typically watches sensor data on a human machine interface (HMI) display and operates controls via the same HMI in response to a written set of instructions, training and experience. There may be some manual valve movements made by the operator or his assistants, but most are remotely operated via electrical or pneumatic operations.

Safety systems are in use (hopefully) in all plants regardless of the level of automation. They may be simple mechanical devices such as pressure relief valves or rupture disks. They could be process alarms that require operators to take manual corrective actions. They could be simple interlocks where a specific sensor output generates a direct command to operate a specific valve. Or they could be complex algorithmic responses to a variety of sensor readings resulting is a number of automatic operational changes to the process. These automated safety systems can reside in a stand-alone computer system with dedicated sensors and valves that are not in any way connected to the main process control system (the safest system) or various parts (or all of) the safety system could reside on the same computer system running the chemical manufacturing process.

In a perfect world, what determines the level of sophistication (and thus cost) of the safety system is the potential outcome of the process upset that it controls. The more serious the potential consequence of the process upset (again in a perfect world) the more complex and involved the safety system becomes. Where there are potential catastrophic, off-site consequences one would like to expect to see sophisticated stand-alone safety systems to prevent those catastrophic results.

Ransomware Effects


For purposes of this discussion I am going to assume that the ransomware has effected all networked controls system computers and that any stand-alone safety systems remain operational, these would include sophisticated systems, mechanical devices and most electro-mechanical interlocks (those not controlled through a PLC).

For the least automated systems the affects would be mainly cosmetic; operators would still be controlling the process, it would be more physical control with the operator going out and manually operating controls instead of using the HMI. This is assuming that there are still sensor readouts that do not go through the HMI. This would require either analog gauges or 4/20ma gauges wired to old-style displays.

Double displays with their associated wiring are a pain to maintain and frequently are considered a wasteful duplication of resources. The absence of analog gauges or non-computer sensor-output displays would mean that the operator would have no view of the key process control variables, and thus, no control of the process.

The consequences of going to full operator manual control of processes would be immense. I made the transition from full manual to semi-automated process control. We were able to add more sensors to better understand the process variables and those new sensors were in locations that were not readily accessible by the operator. Just those additional sensors decreased process times (and thus process costs) significantly as well as reducing product variability and off-spec products. We also significantly reduced the number of operators that were necessary to operate multiple processes that typically run at specialty chemical plants. Some plants would be able to operate at significantly reduced capacity, but increased product variability problem could have downstream quality effects on customer operations.

For fully automated chemical facilities (typically found in continuous process facilities like refineries) an instantaneous change to manual operation would not be possible. The lack of analog gauges and local sensor readouts and the relatively inaccessible manual controls would make it physically impossible for operators to coordinate the operation of the connected portions of the process in real time.

Safety Effects


Again, properly designed and implemented safety systems would be expected to stop any catastrophic consequences of sudden loss of control in chemical manufacturing systems. There were a number of very important qualifiers in that previous sentence. The major problem with designing safety systems is that it is very difficult to completely understand catastrophic failure modes in a manufacturing environment.

Typically, one has to use lab scale data to understand the physical parameters of those failure modes (NO ONE wants to do FULL SCALE testing of such failure modes) and then apply various models to try to scale up those test results to be able to plan for preventive actions to stop or mitigate the failures. No matter how sophisticated the modeling efforts they are, in the end, based upon educated guesses as to how the system will behave. Then systems are designed to try to best control those failure modes. And, it is not generally acceptable to really test those systems to see how they actually work in practice (in the emergency environment).

The OCIA Statement


The OCIA statement that started this discussion is almost certainly not based upon any survey of the chemical industry. It is a reasonable brief attempt by outsiders with a non-chemical manufacturing background to categorize the potential consequences of a non-chemical emergency event on generic chemical manufacturing.

If I were to attempt to reword this statement from a chemical manufacturing process point of view, it would read something like this:

“Chemical manufacturing facilities should have safety systems in place to contain catastrophic consequences in the event of loss of control. The efficacy of those systems and their operation in an instantaneous loss of computer control situation would have to be evaluated on a case-by-case basis. Continued commercial production without replacing/fixing affected computer based process controls could be possible is some unknown number of facilities. It would be difficult to accurately predict which facilities could continue commercially viable production.”


No comments:

 
/* Use this with templates/template-twocol.html */