There has been an interesting and on-going discussion on
TWITTER® related to how chemical plants may be affected by ransomware like
WannaCry. It was the result of the publication of two DHS-OCIA FOUO documents
about WannaCry (here and here).
They were published by PublicIntelligence.
The on-going TWITTER discussion was really based upon one
entry in a chart in the second document described above; (U) Table 1—Ransomware
Targeting and Susceptibility by Sector. The entry for the Chemical Sector
contained the statement: “Chemical plants have manual overrides in place to ensure
the safe containment of chemical processes in case cyber defenses fail. In some
cases, it may be possible to run the chemical plant independently of cyber
controls, otherwise the plant will most likely shut down.”
Most of the discussion has been on where the supporting data
for that statement comes (short answer, no one knows) and how accurate that
statement is. I cannot provide any information on the first, but a reasonable
answer to the second will take more than 140 characters to explain.
Chemical Plant Automation
There is a great deal of variety in the level and
sophistication of automation in chemical manufacturing processes. I have worked
in a plant where there was absolutely no automation. Sensors were either analog
or digital with no connections beyond a power supply. All operations are
directly controlled by the operator manually operating various valves and power
switches. Plants like this are unusual in this day and age. They are small
plants typically running experimental processes on a shoestring budget. They
are going to essentially be unaffected by ransomware except on the business
process side of the house.
The most sophisticated facilities (and I have seen some of
these, but never worked in one) have almost completely automated their chemical
manufacturing processes. The extensive and complicated control system requires
limited operator oversight; taking a wide mix of sensor data (temperature,
pressure, flowrates and valve states for example) processes that data to
develop (via a complex process control algorithm) commands to various
operations devices (transfer valves, heating, cooling and vacuum controls for
example) to control the manufacturing process. The operator actions are fairly
limited to starting or stopping the process, making small manual adds of
chemicals to the process and watching for process upset conditions.
Most specialty chemical manufacturing (batch processes) have
a level of automation somewhere between these two extremes. An operator
typically watches sensor data on a human machine interface (HMI) display and
operates controls via the same HMI in response to a written set of
instructions, training and experience. There may be some manual valve movements
made by the operator or his assistants, but most are remotely operated via
electrical or pneumatic operations.
Safety systems are in use (hopefully) in all plants
regardless of the level of automation. They may be simple mechanical devices
such as pressure relief valves or rupture disks. They could be process alarms
that require operators to take manual corrective actions. They could be simple
interlocks where a specific sensor output generates a direct command to operate
a specific valve. Or they could be complex algorithmic responses to a variety
of sensor readings resulting is a number of automatic operational changes to
the process. These automated safety systems can reside in a stand-alone
computer system with dedicated sensors and valves that are not in any way connected
to the main process control system (the safest system) or various parts (or all
of) the safety system could reside on the same computer system running the
chemical manufacturing process.
In a perfect world, what determines the level of
sophistication (and thus cost) of the safety system is the potential outcome of
the process upset that it controls. The more serious the potential consequence
of the process upset (again in a perfect world) the more complex and involved
the safety system becomes. Where there are potential catastrophic, off-site
consequences one would like to expect to see sophisticated stand-alone safety
systems to prevent those catastrophic results.
Ransomware Effects
For purposes of this discussion I am going to assume that
the ransomware has effected all networked controls system computers and that any
stand-alone safety systems remain operational, these would include
sophisticated systems, mechanical devices and most electro-mechanical interlocks
(those not controlled through a PLC).
For the least automated systems the affects would be mainly
cosmetic; operators would still be controlling the process, it would be more
physical control with the operator going out and manually operating controls
instead of using the HMI. This is assuming that there are still sensor readouts
that do not go through the HMI. This would require either analog gauges or
4/20ma gauges wired to old-style displays.
Double displays with their associated wiring are a pain to
maintain and frequently are considered a wasteful duplication of resources. The
absence of analog gauges or non-computer sensor-output displays would mean that
the operator would have no view of the key process control variables, and thus,
no control of the process.
The consequences of going to full operator manual control of
processes would be immense. I made the transition from full manual to
semi-automated process control. We were able to add more sensors to better
understand the process variables and those new sensors were in locations that
were not readily accessible by the operator. Just those additional sensors
decreased process times (and thus process costs) significantly as well as
reducing product variability and off-spec products. We also significantly
reduced the number of operators that were necessary to operate multiple
processes that typically run at specialty chemical plants. Some plants would be
able to operate at significantly reduced capacity, but increased product variability
problem could have downstream quality effects on customer operations.
For fully automated chemical facilities (typically found in
continuous process facilities like refineries) an instantaneous change to
manual operation would not be possible. The lack of analog gauges and local
sensor readouts and the relatively inaccessible manual controls would make it
physically impossible for operators to coordinate the operation of the
connected portions of the process in real time.
Safety Effects
Again, properly designed and implemented safety systems
would be expected to stop any catastrophic consequences of sudden loss of
control in chemical manufacturing systems. There were a number of very
important qualifiers in that previous sentence. The major problem with
designing safety systems is that it is very difficult to completely understand
catastrophic failure modes in a manufacturing environment.
Typically, one has to use lab scale data to understand the
physical parameters of those failure modes (NO ONE wants to do FULL SCALE
testing of such failure modes) and then apply various models to try to scale up
those test results to be able to plan for preventive actions to stop or
mitigate the failures. No matter how sophisticated the modeling efforts they
are, in the end, based upon educated guesses as to how the system will behave.
Then systems are designed to try to best control those failure modes. And, it
is not generally acceptable to really test those systems to see how they
actually work in practice (in the emergency environment).
The OCIA Statement
The OCIA statement that started this discussion is almost
certainly not based upon any survey of the chemical industry. It is a
reasonable brief attempt by outsiders with a non-chemical manufacturing
background to categorize the potential consequences of a non-chemical emergency
event on generic chemical manufacturing.
If I were to attempt to reword this statement from a
chemical manufacturing process point of view, it would read something like
this:
“Chemical manufacturing facilities
should have safety systems in place to contain catastrophic consequences in the
event of loss of control. The efficacy of those systems and their operation in
an instantaneous loss of computer control situation would have to be evaluated
on a case-by-case basis. Continued commercial production without
replacing/fixing affected computer based process controls could be possible is
some unknown number of facilities. It would be difficult to accurately predict
which facilities could continue commercially viable production.”
No comments:
Post a Comment