Thursday, July 14, 2011

ICS Attacks as Process Problem

There is an interesting post over at Digital Bond's SCADA Security blog about a subject that I have been discussing since Ralph Langner started describing the operation of the Stuxnet malware. Dale looks at an issue recently raised by Michael Toecker; should the search for a root cause of unexplained process problems include a look at possible ICS attacks?

Data Historians and Root Cause Analysis

As a process chemist in a specialty chemical manufacturing facility for many years, I have spent a great deal of time looking at various process upsets to determine the root cause so that the facility could correct those problems before subsequent batches were run. Process upsets normally lead to off-spec material being produced, a very large cost for any manufacturing facility. In many chemical facilities process upsets could lead to catastrophic consequences. So, root cause analysis is very important.

The addition of process historians to control systems made the root cause analysis task of chemical engineers and process chemists much easier. Chemical manufacturing processes are very complex and are influenced by a wide variety of factors; temperatures, pressure, heat transfer, the ratio of reactants, and even the rate at which raw materials are added to the process can play critical roles in the modern chemical manufacturing process. Detailed tracking of all of these variables (and more) and identifying which ones are most critical at various places in the process was not possible before the advent of data historians. The high productivity and quality of products made in modern chemical plants can be directly traced to the detailed use of process historians.

The ability to use data historians to track process variables and conduct root cause analysis for process upsets is closely dependent on the quality of the information being exported to these systems. This is one of the reasons that the maintenance folks in a modern chemical manufacturing facility spend so much time doing testing and calibration of sensors. But, this still assumes that what the sensor detects is accurately reported and recorded in the control system.

Compromised Control Systems

The point that Toecker was trying to make, and Dale was highlighting, was that because of the advent of Stuxnet, process people are now going to have to question whether their system had been stuxed (sounds better than ‘attacked by a Stuxnet like malware) if they start to see process equipment failing in unexpected frequencies and failure modes. Since this is what happened in the much publicized Stuxnet attack one would hope that this would be something that control systems engineers would consider when confronted with unusual equipment failures.

DEFINITION: Stux: Verb. To attack an electronic control system in such a way as to remotely change the output of one or more pieces of production equipment while making the equipment appear to be functioning properly by simultaneously spoofing the control system data.

I have been maintaining for almost a year now that this is not the real problem of Stuxnet. The deliberate destruction of process equipment is certainly possible (which even the Iranians have admitted), but it does require a significant understanding of the particular equipment and its failure modes. This means that the development of an attack on any particular facility will require detailed malware tweaking that will be time consuming and require a relatively high-level of expertise. This is certainly possible and will almost as certainly be seen in the near future, but the instances will be relatively few and far between.

A much easier way to attack a modern manufacturing facility will be to randomly stux the system. This would cause random changes in the manufacturing process while hiding those changes from the process control team. Some of the changes would have no significant effect. A larger number would cause process problems that would result in increased production times or off-spec products, both very costly. A small number of situations would result in serious safety problems like chemical releases, over-pressure vessel failures, or fires.

I am much more concerned with this type of attack. Randomly stuxing a manufacturing facility would be much harder to detect in the normal process of root cause analysis. Random problems would have to be real high-frequency for even the most suspicious process control engineer to start to question if the facility had been stuxed. Such out-side-the-box thinking would not be found at most facilities because of a standard focus on solving each problem in turn and ignoring a more holistic approach.

No comments:

 
/* Use this with templates/template-twocol.html */