A Prototype Forensic Toolkit

for Industrial-Control-Systems Incident Response

Nicholas B. Carra, Neil C. Roweb
a
Dept. of Homeland Security, Washington, DC, USA; b[*]Dept. of Computer Science, U.S. Naval Postgraduate School, Monterey, CA 93943 USA

Abstract

Industrial control systems (ICSs) are an important part of critical infrastructure in cyberspace. They are especially vulnerable to cyber-attacks because of their legacy hardware and software and the difficulty of changing it. We first survey the history of intrusions into ICSs, the more serious of which involved a continuing adversary presence on an ICS network. We discuss some common vulnerabilities and the categories of possible attacks, noting the frequent use of software written a long time ago.  We propose a framework for designing ICS incident response under the constraints that no new software must be required and that interventions cannot impede the continuous processing that is the norm for such systems. We then discuss a prototype toolkit we built using the Windows Management Instrumentation Command-Line tool for host-based analysis and the Bro intrusion-detection software for network-based analysis. Particularly useful techniques we used were learning the historical range of parameters of numeric quantities so as to recognize anomalies, learning the usual addresses of connections to a node, observing Internet addresses (usually rare), observing anomalous network protocols such as unencrypted data transfers, observing unusual scheduled tasks, and comparing key files through registry entries and hash values to find malicious modifications. We tested our methods on actual data from ICSs including publicly-available data, voluntarily-submitted data, and researcher-provided "advanced persistent threat” data. We found instances of interesting behavior in our experiments. Intrusions were generally easy to see because of the repetitive nature of most processing on ICSs, but operators need to be motivated to look.

 

This paper appeared in the Proc. of the 2015 SPIE Defense+Security Conference, Baltimore, MD, April 2015.

 

Keywords: Industrial control systems, cyberattacks, intrusion detection, testing, incident response, critical infrastructure, WMIC, Bro

 

1.       INTRODUCTION

The reliable operation of modern society's critical infrastructure depends on industrial control systems (ICS), the embedded software systems that allow an operator or device to monitor and control industrial processes. These automation systems are ubiquitous and heterogeneous; they were designed with much implicit trust and are not very compatible with most modern security solutions. In 2010, Stuxnet demonstrated that traditional network protections like segmentation and intrusion detection systems (IDS) alone are insufficient in securing control systems [1]. Insecurity of these devices can lead to severe consequences and malicious hacking tools can be effective without much difficulty to an attacker. In using these tools, an adversary will leave specific forensic artifacts and indicators of ICS malicious activity that can be found if sought.

 

ICS cyber-incident response processes are not well developed, and we lack tools built specifically for identifying adversary presence within the critical systems domain. Few published efforts reveal actionable technical solutions for ICS security practitioners, and none focus on reliably identifying malicious, persistent access within live data from production ICS devices. The primary motivation for this paper is the technical gaps observed and reported [2] during the course of ICS security assessments and cyber incident response.

 

Previous studies have shown the need for forensic collection within the ICS environment. The Department of Homeland Security (DHS) Industrial Control System Cyber Emergency Response Team (ICS-CERT) has concluded that traditional forensic tools are not suitable for ICS networks [3].  However, the ICS-CERT best practice documents only recommend that the capability should be created and do not offer specific tools and techniques to implement incident response. No available forensic toolkits have been designed for ICS networks [4]. Security teams require a repeatable, tailored response methodology employing host and network data collection and analysis techniques to identify malicious pathways and adversary presence on ICS networks.

 

This paper proposes a structured methodology to identify malicious activity by using host-based forensics and network analysis to identify anomalous client-side attack vectors during ICS assessments and incident response. To develop the most robust and reliable methodology, real data from ICS networks was used in this study. Some experimentation was conducted through the course of regular assessments and incident response with critical infrastructure asset-owner approval. Additional experiments were conducted within a closed network that replicates real ICS architectures.

2.       BACKGROUND

2.1    Industrial Control Systems

Components of ICS networks are designed to allow embedded logic to control a process efficiently without constant human intervention, and thus these components have specific roles and constraints to operate as low-level building blocks for industrial automation. Supervisory control and data acquisition (SCADA) systems extend the capabilities of an automated system so that it can be monitored and controlled from a remote location [5]; SCADA is often incorrectly used as a synonym for ICS.

 

In the late 1990s, vendors started to embrace commercial off-the-shelf computer systems and networking hardware. They adapted their legacy proprietary protocols for field devices so that systems could communicate with the Internet protocol suite TCP/IP. Microsoft Windows became the operating system of choice for HMIs, engineering workstations, and several ICS servers. Vendors used modified versions of Windows that, once initially tested for compatibility with vendor equipment, were rarely updated. These were supplemented with networked field devices like remote terminal units (RTUs), programmable logic controllers (PLCs), and human-machine interfaces (HMIs) allowed for customized implementations. RTUs are field devices that transmit telemetry data (information collected at remote or inaccessible points) to master systems, and accept commands from those master systems to control the connected objects.  PLCs are also field devices that execute a wide array of programmable functions such as vibration monitoring, catalyst loading, area monitoring, and product loading/unloading. These field devices enabled the system-integrator and critical-infrastructure companies to make their own additions to an existing network infrastructure. HMIs are the devices which present process data to their human operators for control and monitoring. Common HMIs in industry include Wonderware, Siemens WinCC, Rockwell RSView, and Areva's e-terra [6].

 

A distributed networked architecture is necessary for industrial applications that require distributed monitoring points, such as electric-power transmission and distribution, pipeline operations, chemical-production operations, and water-utility operations. Companies have continued to develop networked ICS architectures that allow for streamlined performance tracking, accurate billing, and off-site backup capabilities, despite the growing security concerns that this interconnection has introduced.  At least they do have firewalls. Inside the firewall, an ICS control center network houses the supervisory services that reside on the engineering workstation, the SCADA input/output (I/O) server, and the HMI. These supervisory control systems typically connect using ICS Ethernet protocols to the field devices such as PLCs or RTUs that are located at distributed sites. The field devices receive commands from the supervisory systems that instruct the field devices to subsequently control or acquire data, often over direct I/O, from the physical ICS assets like mechanical valves, circuit breakers, voltage regulators, digital temperature sensors, or other smart devices. The data acquired from the physical ICS assets is transferred from the PLCs and RTUs to a central control center where it is displayed to the operator using an HMI.

 

Traditional networks for business and manufacturing operations are often connected to their ICS networks for several reasons. The ICS network contains valuable billing and financial data, equipment trending, and operational reports. Interconnections to external networks may have been introduced to allow for remote vendor support or, in the case of the inter-control center communications protocol (ICCP), for utilities to share electrical power status for grid stability [7]. Remote access technology allows for access to corporate software from field locations and provides capability to manage devices that are difficult to access. Low-cost, easy-to-install, and easy-to-maintain wireless connectivity has also been added to field devices, allowing possible direct connection of field devices to the Internet.

 

ICS inter-domain networking can be architected using a variety of methods such as explicit direct connections, firewall-controlled connections, demilitarized zone (DMZ) connections, and data diodes. Direct connections can be hard to trace as they are constructed uniquely for the site, usually with standard protocols such as secure shell (SSH), Telnet, file transfer protocol (FTP), virtual private network (VPN), or with dial-up connections. This means that all the direct connections may not be known to the security team. Firewalls and access control lists (ACLs) are often used for this interconnection and only allow certain types of connections, but the software for them usually provides only limited support for ICS protocols. DMZs interconnections are common for ICS, though they are not usually constructed to have access to the Internet like traditional web server DMZs. They enable isolated networks to communicate with the DMZ but not with each other, making the DMZ good for storage servers.  Data diodes enforce unidirectional network flow, making them ideal for the ICS environment, but they do not allow for acknowledgement or response and thus are not compatible with the Internet's standard TCP/IP protocols.

 

Since the development of ICS technology was vendor-driven with a wide variety of competing hardware, software, and capabilities, communications lacks standards. Common communication protocols include Modbus and the distributed network protocol revision 3 (DNP3) as well as proprietary protocols like ANSI X3.28, CDC Types 1 and 2, Conitel 2020/2000/3000, DCP 1, Gedac 7020, IBM 3707, ICCP, IEC 61850, Landis & Gyr 8979, OPC, Redac 70H, Tejas 3 and 5, TRW 9550, and UCA. There are 150 to 200 different ICS protocols [8], with Modbus, DNP3, and Ethernet/IP the most used [9]. Most protocols are primitive, and field devices cannot be queried with most of them to see what protocols they support.  All these factors result in a highly complex forensic and incident-response process. A recent trend toward routable, industry-standard protocols may someday replace these legacy vendor-specific proprietary protocols [10], but the long product lifecycle and high cost for replacing ICS components mean this is a long way off.

 

ICS protocols generally use master-slave communication. The master polls for data, controls slave devices, and maintains a repository of data. The slaves, transmitting either by polling or reporting by exception, respond to master commands. Slaves can have more than one master, and a device can be a master in one environment and a slave in another in a tiered architecture. Master-slave relationships save bandwidth and reduce the poll cycle. Most modern protocols communicate with TCP/IP for subprotocols, which allows for some level of passive network-traffic parsing.

 

2.2    Industrial Control Systems Security

As ICS systems have become increasingly connected to other networks, they have become more susceptible to malicious attacks for several reasons [11]. Field devices are low capability, designed for performance rather than security, and operate on a large number of unique protocols [8]. Thus they have little provision for security. In addition, ICS systems were built to be highly available, and integrity and confidentiality were afterthoughts because they were originally deployed in isolated environments. Also, system age is a problem. 20-year-old ICS systems are common compared to 3-to-5-year-old traditional IT systems, and they tend to have well-known vulnerabilities. It is costly to replace insecure legacy equipment, much of which is specialized. Vendor support is limited compared to traditional computers with few support styles and often a single vendor supporting many systems. Forensic analysis is also impeded on most ICS equipment since field devices do not generally store logs and do not run intrusion-detection systems.

 

Protocols are another weakness. To communicate in environments with both legacy and modern equipment, proprietary ICS protocols have been modified to be used in IP-based networks. The result is already weak protocols transformed into cleartext packets wrapped in TCP/IP layers, which has only increased attack opportunities. With prevalent cleartext communications and limited authentication or validation, ICS networks provide ample opportunity for tampering, interception, and injection of data. With persistent access to the ICS supervisory local area network, an adversary could conduct an eavesdropping attack on the SCADA server's communication with PLCs, since this is often in cleartext and could be manipulated. This could enable stealing of secrets, injection of data, or sabotage. Modern solutions like encryption are often not supported within legacy ICS protocols. In fact, the popular Modbus and DNP3 protocols currently do not support authentication, integrity checking, authorization, or encryption. Third-party security solutions often cannot be used since the components often do not have enough computing resources or memory to support additional capabilities [12].

 

Major limitations on security measures are that they must not introduce latency or overhead into the network traffic as that would impede operations, must generally ensure systems are operational 24 hours a day for 365 days a year, and must meet strict timing deadlines [7]. Real-time availability requirements were historically addressed with redundancy but full-time redundant backups are no longer the industry standard. Simple IT functions like rebooting may also be impossible due to system-availability requirements.

 

ICS components tend to be inadequately patched compared to conventional systems because it is harder to do so on relatively isolated systems [10]. This is compounded by the lack of vendor support and the difficulty of upgrading within the highly volatile and sometimes unstable environment in which control systems reside. Months of planning are often required in order to take ICS systems offline and apply patches. Furthermore, even if patches are scheduled and applied properly, they can introduce instability into the ICS domain if not thoroughly tested, and vendors are under competitive pressures to avoid testing.  This means that an accumulating number of vulnerabilities are being recognized in older control systems [13].

 

Our thesis [14] reports a range of security-violation incidents within ICS. The studied intrusions often started within the business network, where the business network was made a reconnaissance point for follow-up intrusions into the ICS networks. By compromising the underlying operating systems of the workstations hosting ICS software, adversaries were able to abuse trust relationships between those compromised systems' software applications and ICS hardware. In most studied incidents, the adversary could maintain a presence on the target ICS and could communicate orders. In rare cases, malicious access was achieved directly to the field devices over wireless or wired connection method to the endpoints. Unauthorized access generally persisted for a significant amount of time due to the difficulties of auditing these systems and concerns of introducing instability into the environment.

 

3.       Toolkit Design

Our proposed malicious-activity identification methodology consists of collection, analysis, and decision components for host-based and network-based ICS artifacts. The framework is modeled after modern intrusion-detection techniques employed in traditional networks; however, these solutions are rarely deployed correctly in critical infrastructures [10] and most lack ICS protocol support as well as the signatures and behavioral-anomaly data necessary to identify ICS attacker tactics [4]. The purpose of this technique is to quickly analyze an ICS environment's systems and their communications, attempt to interpret abnormalities with a minimal required baseline, and isolate possibly malicious activity and pathways on critical networks in support of situational awareness, security assessments, and cyber-incident response.

 

3.1   Toolkit Constraints

ICS networks require that specific technical constraints be understood and adhered to for both host-based and network-based tools. Host-based tools are limited to those installed and already available on legacy operating systems. There exists no centralized view of system security to draw data from, and field devices do not always store logs. Any security-monitoring commands executed on these systems should be run at the lowest priority level to not interfere with critical processes. Furthermore, there are temporal challenges with ICS forensic data because process and state information is often overwritten at a rate that makes collection impossible [3].

 

Port scanning and automated device interrogation techniques used to check security on traditional networks can crash ICS hardware by scanning too fast or by sending null or malformed packets. For modern SCADA services using internal Web servers, HTTP GET and POST requests can cause actual physical actions, so even Web-based automated tools should not be used actively on ICS networks. Industry best practices recommend remaining entirely passive for network traffic analysis. That analysis must also include the highly specialized and often proprietary ICS communication protocols. Although minimal "noise” (irrelevant traffic) should exist on ICS networks, data volume is a challenge because the large amount of real-time telemetry data and other traffic makes it hard to find malicious packets. Furthermore, ICS domain-interconnection methods and some regulator restrictions limit the ability to tap critical networks at multiple locations. Several traditional penetration test techniques can be modified to work on ICS networks. For example, instead of sending packets, a penetration tester should verify open ports on each host in the environment without generating network traffic.

 

As an example of why these constraints are important, a ping sweep was performed on an active ICS network that controlled 9-foot robotic arms and a controller for an arm that was in standby mode received the ping sweep and abruptly swung around 180 degrees, luckily missing the person nearby [15].  In another instance in the same study, a ping sweep was performed to enumerate all hosts on an ICS network and it caused an integrated circuit fabrication system to fail, destroying wafers worth $50,000. In a third instance, a penetration test at a gas utility locked up the SCADA system and the utility was unable to send gas through its pipelines, causing a loss of service to customers for four hours. Lastly, in August 2006, operators at Browns Ferry Nuclear plant had misconfigured products from two vendors causing excessive traffic on the control network resulting in a high-power low-flow condition where the recirculating water could not be properly cooled [16]. This even took the plant offline for two days and cost nearly $600,000 in revenue, and demonstrated the fragility of these networks when exposed to unexpectedly heavy network traffic.

 

So security-tool implementation should focus on agentless built-in commands that generate minimal network traffic. Popular toolkits such as Microsoft's Sysinternals are undesirable because they require detailed configuration and have not been thoroughly tested on ICS systems. For host-based querying for Windows artifacts, only built-in command-line utilities should be used such as the Windows Management Instrumentation Command-line (WMIC) tool. Special care should be taken to ensure that the toolkit queries using only command-line utilities of that hardware's possibly-modified operating system. When possible, scripts should be written for the legacy Microsoft Disk Operating System (MS-DOS) 16-bit command.com processor that preceded the 32- and 64-bit cmd.exe found on Windows, to minimize system loading.

 

Tools should comply with the North American Electric Reliability Corporation (NERC) Critical Infrastructure Protection (CIP) standards' scanning policies [17], which ensures that the toolkit can be used at not only electrical utilities but at many other critical-infrastructure sites. These tools can run at user privileges but offer the most functionality when run by administrator. For critical real-world networks, the toolkit can run the commands locally with no network traffic, with the manual export of results collected on a separate closed network.

 

Our experiments used the Bro platform for network-based analysis [18]. It is an open-source network solution composed of signature detection, anomaly detection, and a programming language designed to work with network traffic. The signature detection generates logs for which a protocol analyzer abstracts details in real-time. The programming language defines the actions the platform takes based on logic and structured programming. The Bro programming language was updated in November 2013 to support protocol parsing of the two most popular ICS protocols, Modbus and DNP3. Bro analysis can be automated through the creation of customized scripts.

 

3.2    Identification of Generally Suspicious Tactics

Since ICS traffic is often predictable because critical processes rely on predictable outputs for given inputs, anomalous traffic is a strong clue to tampering or malicious behavior, stronger than with other computer systems.  Master and slave roles can be easily extracted for IP-based protocols because the source and destination addresses for a specific message type imply its role. Communication channels are defined by device function, so an HMI should only talk to a PLC, RTU, or SCADA I/O server and an RTU should probably not be sent non-ICS protocol traffic. Device characteristics should not change and devices are rarely added or removed from the network. ICS field-device communication generally occurs on set polling intervals, so that provides additional regularity to the traffic.

 

Since ICS networks should not connect to the Internet directly, all connections to external IP space are suspicious including HTTP, FTP (file transfer), Telnet, SSH (secure shell), SNMP (network management). and wireless protocols. Failed connections are also suspicious. For instance, both a half-open TCP handshake and a connection that has been rejected by an external IP issuing a reset packet to deny it would not be considered a connection in most traditional networks, but are suspicious in ICS networks where most traffic is very predictable.

 

In some cases the only indication of malicious access is the anomalous operation of the ICS field devices. Stuxnet resided on the WinCC HMI and, using a known hardcoded password, modified rotating motor spinning frequencies in Siemens S7-315 programmable logic controllers and valve settings in S7-417s. This suggests among other things that cleartext protocol data should be examined and ICS protocol datagrams should be inspected for known default passwords and other vulnerabilities. Obtaining and validating high and low field-device register values against site-specific expectations and equipment-tolerance values may help identify overclocked or maliciously manipulated devices. ICS protocol operations should also be used to automatically create a catalog of devices, and responses to these packets can be passively inspected for field device characteristics. To extract high and low values, the maximum and minimum values for each device register should be calculated and stored. This should be subcategorized by the make and model of equipment and what process it is controlling. Organizationally unique identifier (OUI) bits can be extracted from device addresses to assist in identifying devices.

 

Additional logic should check for interactive commands or reprogramming that suggests malicious activity. For instance, Modbus function code 0x7D initiates firmware replacement; DNP3 code 0x1B deletes a file; and DNP3 codes 0x0D and 0x0E restart a device. Similarly, a Modbus slave's exception code of 0x03 means an illegal data value was received, and 0x0B indicates that a target device failed to respond or may not be present on the network; DNP3 uses code 0x21 for an authentication error and 0x82 for an unsolicited response.  All of these are suspicious, but they may also occur with misconfigurations.

 

3.3    Other Suspicious Tactics to Look For

More specific clues for malware can also be identified. The Mariposa malware used custom UDP datagrams for communication and hardcoded domain names, both of which can be detected. The Monju incident's malware redirected from the compromised streaming media service's website to testqeasd.tk, so domain name system (DNS) lookups over TCP port 53 would be observed [19]. DNS cache, cookies, and the hosts file can be examined for previous outbound attempts and identify planned routes or dormant malware. Web browser history can also be inspected.

 

Comparing the hosts' registries against each other and also against a clean operating system baseline may be possible in ICS networks, unlike most networks, since significantly less software should be installed and user behavior is restricted. In the absence of those registry baselines, several common locations were used for malicious registry persistence in the case studies we examined. These locations, as well as the "shutdown” key value with a dynamic-link library (DLL) loaded, should be checked in the registry.

 

Technical incident data can reveal multiple methods of scheduling Windows tasks for both persistence and privilege escalation. It is necessary to check tasks scheduled using a variety of services, such as AT jobs, BITSADMIN jobs, and jobs assigned with SCHTASKS. To further separate potentially malicious tasks from innocuous administrative scheduled jobs, tasks with executable actions, explicit command line variables, and logic triggers such as time should be displayed prominently in the script's output.

 

It is important to inspect all processes and services whose name or display name is unknown. Special attention should be paid to limit processes that are installed and listening on multiple boxes which, if compromised, provide access to multiple hosts [20]. With fewer systems and processes running than traditional IT networks, it should be possible to identify anomalous processes running in control centers using WMIC with the /node switch for remote hosts and a well-structured central query.

 

The malware used at the Monju plant hid itself with double file extensions, padding .dll files as .tmp and .pdf files. Running a file-type classifier such as the Linux "file” command or detecting extension mismatches in system or temp directories can help to identify previously-unknown malicious files. Not only can analysis be conducted on the hosts for these extension mismatches, but also within the network traffic captures.

 

The Mariposa malware copied binaries into the C:\RECYCLER folder. Finding undeleted files in the RECYCLER or newer $Recycle.Bin directories is suspicious. The recycling bin may also contain interesting recently-deleted data, so it should be checked for interesting files with exe, rar., zip, and txt extensions.

 

Automated file listing and property checks can also include the auditing of certain applications like Windows Sticky Keys (sethc.exe) and Windows Utility Manager (utilman.exe) that are often replaced for privilege escalation and malicious persistence. Another file listing trick that malicious users can exploit is the way Windows handles unquoted executables. If paths include spaces, such as ‘C:\Program Files\' or ‘C:\Documents and Settings\,' and references to these locations are missing the quotes around the full path, Windows will separate file items at the spaces.

 

Adversaries attempt to transition through internal networks to gain control or access on a system inside the target security zone from a network presence on a lesser security zone. Stuxnet accomplished this lateral movement by propagating over network shares, exploiting a printer-spool vulnerability, and through Windows remote procedure calls (RPCs). Internal lateral movement can appear as many flows from a single workstation to others that normally do not communicate, especially over ports 139 and 445. If logs are available, Event ID 552 in the Security event log may help identify accounts and systems used to conduct this activity, and these can then be filtered to isolate malicious behavior and lateral movement.

 

ICS network hosts should be checked to ensure that AutoRun is disabled in the registry for portable media and unknown drives or devices. These would be unusual in an ICS network and so are suspicious. Many operating systems also provide queryable records of inserted portable media devices. Custom scripts can be created to check USB device connectivity, make and model of previous devices, and other metadata to determine if ICS network security policies are being followed and if malicious activity has occurred.

 

As with traditional computer systems, host-based artifacts and the corresponding log events should be analyzed for locked-out accounts, users who have never logged on, passwords that never expire, guest accounts belonging to a group, blank administration passwords, and the creation of new user accounts and their naming convention, and simultaneous sessions across multiple machines to discover possible compromised user accounts and intrusion pathways. Another classic behavior to identify is suspicious activity outside of working hours.

4.       EXPERIMENTS

We built a toolkit to check for most of the clues described in the last section. Tests of the toolkit were conducted on a variety of data sources, including voluntarily-submitted and publicly-available ICS network traffic as well as researcher-provided sophisticated advanced persistent threat (APT) malware and packet captures. When sufficient technical data was unavailable on adversary methods, actions were recreated on an array of virtual machines configured with a variety of operating systems and software representative of a typical ICS installment.

 

4.1   Toolkit implementation

The prototype toolkit consisted of eight substantial command-line utility scripts representing each adversary tactic researched, with several supporting batch scripts for adhering to ICS device limitations and formatting results.  Several Bro network programming-language scripts extend the coverage and capability of the methodology. Host and network-forensic artifact collection has been implemented for all specific adversary tactics described.

 

For the research and testing purposes, a new shared administrator account with strong authentication was created on every test workstation to centrally query the hosts. To operate within the ICS domain without affecting ongoing operations, all host-based tools were run at the lowest priority level to allow process control functions to operate at full availability. This was accomplished by using the built-in Windows function SetPriority to force the process and thread priority to the lowest possible level. Attempts to lower the host script's priority from within by referencing its own process ID produced inconsistent results. The most successful technique was to launch all host-based scripts from within a separate wrapper function that set the priority for all spawned processes. Set at idle priority, the scripts only use free processing cycles and do not interrupt any functions. Idle priority is normally used for lightweight software such as desktop screensavers and applications that only require periodic updating. Disk reads and writes were minimized through similar process monitoring to safeguard against retrieving too much data, which could overwhelm a resource-constrained ICS system.

 

The command-line utilities employed on the hosts were selected for their support of legacy operating systems. The logic built within the scripts refrained from executing on unsupported versions by using the VER command and checking the %OS% and %COMSPEC% environment variables. The WMIC command, which allows for detailed system management from the command-line, can be run with the "/node” option to run the host scripts across multiple systems and centrally collect the data while generating minimal network traffic.

 

The network-based toolkit successfully functioned without the need for active interaction with devices. This was confirmed by replaying historical network traffic through the scripts to produce the results, while running tcpdump to intercept and display packets on the system running the network-based scripts. Although not a requirement, the network-based toolkit performed with no delay while running on a consumer-grade laptop, demonstrating that the toolkit is flexible enough to run on any platform compatible with Bro. Custom Bro scripts should work under all possible constraints in the ICS network because they are passive by design.

 

The Microsoft tool WMIC was heavily used by our toolkit. Executing this package on remote machines requires some additional steps. REG was used as a quick, compatible method to query the Windows registry for specific keys related to the eight adversary tactics within all scripts. The FIND command and the more demanding FINDSTR command when necessary were used to incrementally search for values and regular expression strings within the output of several other commands. WEVTUTIL was included for its ability to quickly isolate relevant event log files, but its use is optional since the kit was designed with the expectation that WEVTUTIL would not be supported and that the target system's event logs would be insufficient. The ICS field device script used ARP, IPCONFIG, and NETSH to extract MAC addresses from the local network, host machine, and in-range wireless devices, respectively. The external connection host-based script used IPCONFIG, NETSTAT, and TYPE to display the DNS cache, correlate a host process to network behavior observed, and to output cookies from file paths determined by Windows version.

 

While registry persistence was checked entirely with the core REG command, the startup portion of the persistence script employed TYPE to output contents of each host's startup folders and SC to query services that automatically restart or trigger based on failure conditions. The scheduled task portion of the persistence script primarily used WMIC but also applied the SCHTASKS and BITSADMIN utilities to output their respective scheduled job data to a central report for identification of malicious dormancy and privilege escalation attempts. The process injection and hijacking script mainly used the core WMIC tool's querying language as well as FIND to identify processes running from %TEMP% or %LOCALAPPDATA% and to compare processes and DLL imports with ICS artifact whitelists and blacklists. The process injection script also leveraged DRIVERQUERY to display details about loaded unsigned drivers as well as a consolidated list of certificates used for the drivers that were signed for easy analysis.

 

The file system sabotage script harnessed WMIC to parse the cim_datafile for compressed archives along with TREE and DIR with special parameters to check for unquoted executables, recently created binaries, and anomalous files in the RECYCLER. The file system sabotage script also used REG to query SOFTWARE\Classes and provide anomalous debugger and image file execution keys. The file system sabotage script combined results of ASSOC and FTYPE to output file extensions and their linked programs for anomalous extensions that do not exist in the baseline operating system. The internal-lateral-movement pathway script relied on WMIC to provide various shared resources such as printers that could span network zones. The same script also used NETSTAT to check potential services in use and the NBTSTAT tool to display cached historical NetBIOS connections. The portable-media script employed only the core utilities to extract currently-connected and previously-connected portable devices, extracting uniquely-identifying data from various registry entries to correlate their usage across all hosts. The script also extracted AutoRun settings from various forensic locations to determine portable media propagation risk. The user-behavior and remote-access-abuse script largely used WMIC and WEVTUTIL (if available) to first search for specific Event IDs if present, then query TerminalServices, RemoteConnectionManager, and LocalSessionManager for RDP artifacts in the absence of security event log data. The script was built with input parameters to whitelist certain time periods so as to narrow results to specified anomalous off-hours. The user behavior and remote access abuse script also used DOSKEY to extract explicit command-line usage from memory.

 

4.2   Results

The network-based scripts identified external communication paths from the ICS network quite well, providing an understanding of how the network was configured. This included such things as a company's interconnection of an ICS network with the DNS and network time protocol (NTP) servers on their business network (Figure 1). Other systems attempted outbound connections to the Internet over NetBIOS and other non-ICS protocol ports. In addition, pathways and network traffic fragments were identified between several devices and 143.127.102.40 in Cupertino, CA, Symantec's LiveUpdate server for virus definitions; this was further validated when the host-based scripts extracted securityresponse.symantec.com and liveupdate.symantecliveupdate.com from the DNS cache. A separate ICS network traffic sample showed systems attempting to connect to guru.avg.com and bguru.avg.cz to update the Anti-Virus Guard (AVG) antivirus product. In yet another sample data set, the host-based and network-based scripts quickly revealed other attempts to reach product update Web sites for multiple ICS equipment vendors. Although the inconsistent attempts to external networks in this data were system misconfigurations and attempted product updates, this analysis could have similarly identified beaconing malware and malicious external command and control attempts.

 

Figure 1: Identified communication paths on one ICS network.

HTTP protocol analysis revealed interesting characteristics unique to ICS networks. Built-in Bro IDS alerts were triggered because some traffic included invalid HTTP requests in which vendors had configured their product to use a [VENDORNAME]_POST method and used HTTP as a convenient transport protocol.  Other analysis showed HTTP access to DLL files and posting values directly as variables (/[vendor]/[vendor].dll?v=update). These ICS device HTTP traits will be added to the growing whitelist of atypical vendor implementations of traditional network protocols.

 

USB drive insertion is of immediate interest during incident response and is critical for sites whose network security policies disallows it.  A host-based script reported all USB drive insertions. The toolkit also provided remote querying of an endpoint to identify connected equipment (Figure 2) including in-range wireless access points and previous wireless connections.

 

 

Figure 2: Example extraction of host characteristics.

Information about the machines found in our data is shown in Figure 3 (omitting some sites for privacy reasons). Our script pulls MAC (physical) addresses and IP (logical) addresses from ARP requests and replies, and extracts MACs, IPs, and hostnames from DHCP request, inform, and discovery packets. MAC addresses were passively observed for 64% of devices on the LAN and 100% of the hardware vendors were identified from those MAC addresses. 100% of the internal device IP addresses were collected, which is to be expected for TCP/IP to properly function, and none were identified as dual-homed (a system having multiple network interface cards and thus multiple IP addresses per one MAC address). Only 12% of hostnames were observed in DHCP traffic, likely due to the prevalence of static IP assignments in critical networks. This was a known consideration during network toolkit creation, and so the script includes an event to trigger when a DNS address (type A) reply is to an internal IP address that does not yet have a known associated hostname and for which the corresponding query's subdomain substring is used for the hostname. Using both DHCP and internal DNS extraction, 41% of hostnames were identified. According to Bro's built-in operating system analyzer, the majority of operating systems in the data were outdated and unsupported versions of Microsoft Windows. The toolkit provides some initial ICS protocol and supervisory device trait-matching. For instance, based on publicly-available vulnerability data released for Wonderware's SuiteLink service, any Windows OS host listening on UDP port 5413 with connections to one or more systems that speak other ICS protocols is flagged by the script as likely a WonderWare InTouch HMI.

 

 

Figure 3: Statistics on ICS hosts observed in our study data.

For proper comparison, 24-hour periods of Modbus and DNP3 network traffic captures from several critical infrastructure sectors were analyzed and correlated. One of our Bro scripts attempted to identify ICS protocol abnormalities while another script attempted to fingerprint devices' specific roles in the ICS network based on their register values and their function and exception codes. The first script's analysis indicated that the ingested data did not contain any protocol convention anomalies or suspicious ICS function code usage, which may indicate malicious tampering or field device modification. All examined ICS protocol traffic contained only register reads and writes, which was confirmed through packet analysis outside of the toolkit, so the extraction of potentially malicious activity based on anomalous protocol events was not tested.

 

Analyzing the same source data, the second Bro script did not find a significant correlation between ICS protocol function code patterns or register-boundary values and the field device's role (e.g., HMI, RTU, PLC).  This may reflect the limited scope of the data. Frequency analysis of observed Modbus and DNP3 function codes is shown in Figure 4. Across all samples, slave field devices had an average of 5.5 registers. Register ranges varied significantly even on a single device, as in one case a device's register only changed by within a range of one across a 24-hour period but the device's other register had a range of 65,369. The average low boundary value for registers was 8,594 and the average high boundary was 37,248, but the protocols are device-agnostic so more meaningful anomalies or further indications of malicious behavior could not be extracted from the field devices' numeric register-values without knowing the specific processes they controlled. There are likely values for field device register limits and metrics so alerts could be created based on manually-input expectations. Alternately, future iterations of the Bro script may record register changes as a percentage of the current values. The extraction and correlation of ICS protocol functions did however confirm that ICS networks tend to have a high signal-to-noise ratio so anomalous function codes, exception codes, and malformed packets should be significantly distinguishable.

 

 

Table 1: Observed ICS protocol frequency counts.

Protocol Function (Code)

DNP3

Modbus

DNP3 Confirm (0)

0.02%

0.00%

DNP3 Read (1)

99.98%

0.00%

Modbus Read Coils (1)

0.00%

56.26%

Modbus Read Holding Registers (3)

0.00%

25.90%

Modbus Read Input Registers (4)

0.00%

13.57%

Modbus Write Single Coil (5)

0.00%

3.01%

Modbus Write Multiple Registers (16)

0.00%

1.26%

Grand Total

100.00%

100.00%

 

4.3    Toolkit limitations

One significant limitation of our approach is its inability to calculate cryptographic hashes to verify authenticity of files.  Hashes should be computed to compare against manufacturer data provided in documentation or online.

 

Several studied adversary attack techniques only produced accurate forensic artifacts when the same command-line utilities used by an attacker were also used to query the host by a responder. This suggests the need to investigate command-line utility outputs before claiming full coverage.  For example, within the task-scheduling persistence identification script, simulated malicious tasks for privilege escalation and reboot persistence created on a Windows XP host with the AT command were only revealed by WMIC, and tasks created with SCHTASKS command were only revealed by querying SCHTASKS.  On Windows Vista and up, a third built-in tool, BITSADMIN, allows scheduling of malicious tasks that are only revealed by listing from BITSADMIN and no other tools.

 

Separating malicious access from trusted access was still difficult in the ICS environment despite the high signal-to-noise ratio. While the toolkit reliably distinguished automated device activity from human-initiated activity, further separating trusted user behavior from malicious attacker behavior was challenging with limited historical visibility and inconsistent logging on the systems analyzed. Some promising data was found outside of command-line tools and parsed event logs, such as within the *.rdp files on the host and within parsed remote access network traffic, but the initial scripts were unable to automatically extract that data in a meaningful way that applied to multiple sites. Similarly, malicious internal lateral movement was difficult to identify in this environment due to the number of accounts with administrative privileges and ICS vendors' pervasive usage of server message block (SMB) and NetBIOS for access to shared resources. This should diminish with the development of more robust vendor-specific whitelists.

 

False positives occur when non-malicious is identified as malicious activity and we had a number of them. They were more likely to occur when only host or network data was available. Ideally, toolkits should try to determine where data went that left a host or what process was running that was using a suspicious protocol. In these experiments, some host-based activity that was flagged as malicious was SCADA engineers moving laptops between network zones. It was assumed that a host remained in an environment, so laptops moved between zones generated several suspicious traits. But this is not an advisable practice anyway from a security standpoint.

 

False negatives can occur because many ICS systems were observed using embedded HTTP functionality for diagnostics and monitoring. The analysis of unexpected HTTP user-agent strings requires a more comprehensive understanding of the ICS software that uses HTTP. False negatives can also occur due to the lack of built-in Bro support for ICCP, OPC, and some other observed proprietary ICS protocols.

 

5.       Conclusions

This paper described the need for a forensic capability tailored to industrial control systems. It then described and tested several techniques for identifying and analyzing potential malicious activity and adversary access pathways within ICS networks. We then gave some criteria for useful tools for live production ICS forensics and suggested some approaches to building high-confidence indicators of unauthorized access. ICS networks do provide good signal-to-noise ratios for behavioral anomaly detection. The need for tailored tools to identify potentially malicious access pathways has been suggested by this work and the feasibility of such tools has been established with the prototype host- and network-based toolkit.

 

We then described a prototype toolkit for ICSs that we built. This toolkit can be run on each host, transferring the HTML-formatted output to a write-once CD or other protected media. The toolkit can also be run on multiple hosts and centrally-collected, but this has only been tested on small, closed networks, and already is generating significant amounts of data.  If both host and network data can be centrally analyzed, the data will require improved structures to ensure performance is still acceptable. This approach of central collection and analysis of host and network data allows additional features such as rootkit identification to become possible by comparing host and network data. For example, if a port is communicating in network traffic but not showing on a host's Netstat querying, a rootkit may be hiding it from the Windows API.

 

Security assessment and incident response teams can use the approach to analyze critical devices in an environment with minimal irrelevant data, then trace those findings back into the business network instead of the other way around. The toolkit should assist in making more confident intrusion judgments when malicious activity is suspected in the ICS environment. With the careful selection of compatible tools, and a technique that is passive and forensically cautious, these methods can be easily incorporated into many organizations' security processes.

 

Due to the relatively long lifecycle of systems in the ICS domain and the current compatibility with Windows 8 systems, this technique is expected to be functional for a while on Microsoft operating systems. However, Linux compatibility is needed and work has already begun on translating the host-based examination commands for Linux. The WMI core application has been ported to Linux, which presents a desirable host-platform shift since Bro runs on Linux. If the host and network tools are executed from the same system, compatibility should expand and the ability to cross-reference host and network data should improve greatly. Adding agentless querying of real-time operating systems such as VxWorks, Windows CE, QNX and embedded Linux should also be pursued.

 

 

6.       REFERENCES

 

[1]    Falliere, C., [W32.Stuxnet Dossier Whitepaper], Symantec Corporation, 2011.

[2]    DHS Industrial Control Systems Cyber Emergency Response Team, "ICS-CERT Monitor: April/May/June 2013," <https://ics-cert.us-cert.gov/sites/default/files/Monitors/ICS-CERT_ Monitor_Apr-Jun2013.pdf>, 27 June 2013.

[3]    DHS Industrial Control Systems Cyber Emergency Response Team, "Recommended Practice: Creating Cyber Forensics Plans for Control Systems," August 2008.

[4]    Taveras, P., "SCADA Live Forensics: Real Time Data Acquisition Process to Detect, Prevent or Evaluate Critical Situations," Proc. 1st Annual International Interdisciplinary Conference, Azores, Portugal, 2013.

[5]    Centre for the Protection of National Infrastructure (CPNI), "Securing the Move to IP-Based SCADA/PLC Networks," 2011.

[6]    Schwartz, A., "Control System Devices: Architectures and Supply Channels Overview," Sandia National Laboratories, Albuquerque, NM, 2010.

[7]    Lowe, J., "The Myths and Facts behind Cyber Security Risks for Industrial Control Systems," Proc. VDE Kongress, 2004.

[8]    Igure, V., "Security Issues in SCADA Networks," Computers and Security, 2006, pp. 498-506.

[9]    Clarke, E., "Practical Modern SCADA Protocols: DNP3, 60870.5 and Related Systems,” Burlington, MA: Newnes, 2004.

[10]  Stouffer, K., Falco, J., and Scarfone, K., "Guide to Industrial Control Systems (ICS) Security," Special Publication 8800-82, National Institute of Standards and Technology, Gaitherburg, MD, 2011.

[11]  North American Electric Reliability Corporation, "Control systems security working group (CSSWG)," <www.nerc.com/comm/CIPC/Pages/Control Systems Security Working Group CSSWG/Control-Systems-Security-Working-Group-CSSWG.aspx>, 2013.

[12]  Dzung, D., Naedele, M., Von Hoff, T., and Crevatin, M., "Security for Industrial Communication Systems," Proceedings of the IEEE, vol. 93, no. 6, pp. 1152-1177, 2005.

[13]  Townsend, A., [Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia], New York: W. W. Norton & COmpany, 2013, pp. 268-269.

[14]  Carr, N. "Development of a Tailored Methodology and Forensic Toolkit for Industrial Control Systems Incident Response,” M.S. thesis, U.S. Naval Postgraduate School,
<faculty.nps.edu/ ncrowe/oldstudents/19jun_carr_nicholas_with_changes.htm>, June 2014.

[15] Sandia National Laboratories, "Penetration Testing of Industrial Control Systems," <energy.sandia.gov/wp/wp-content/gallery/uploads/sand_2005_2846p.pdf>, March 2005.

[16]   Tofino Security, "Cyber Security Incident Case Profile: TVA/Browns Ferry," <www.tofinosecurity.com/sites/default/files/CP-101-Case_Profile-Browns_Ferry-rev1.pdf>.

[17]   Digital Bond, Inc., "NERC CIP Scan Policies," <www.digitalbond.com/tools/bandolier/nerc-cip-scan-policies>.

[18]   Paxon, V., "Bro: A System for Detecting Network Intruders in Real-Time,” Proc. 7th USENIX Security Symposium, San Antonio, TX, 1998.

[19]   Context Information Security, "Context Threat Advisory: The Monju Incident,"  <contextis.co.uk/research/blog/CTI_TA-monju-incident>, 27 January 2014.

[20]  Centre for the Protection of National Infrastructure (CPNI), "Process Control and SCADA Security: Good Practice Guide," <www.cpni.gov.uk/documents/publications/2008/2008031-gpg_scada_ security_good_practice.pdf>, 2008.

 

 

 



[*] ncrowe@nps.edu, 1-831-656-2462