MONTEREY, CALIFORNIA
THESIS
METAMORPHIC AND POLYMORPHIC TECHNIQUES FOR OBFUSCATION OF K-ARY MALICIOUS CODES
by
George T. Harter
March 2020
Thesis Advisor: Neil C. Rowe
Second Reader: Alan B. Shaffer
ABSTRACT
K-ary malicious codes are a form of obfuscated malware in which the malicious code is distributed across K-distinct files; detecting these codes is computationally very difficult. Encryption, metamorphism, and steganography are techniques that are traditionally used to obfuscate malware. We present a proof-of-concept K-ary program that incorporates these techniques. We also test a statistical technique to correlate pieces of K-ary malware, which may make their detection easier. Our results show that this technique is moderately successful. Future research may reveal better-performing statistical techniques.
ACKNOWLEDGMENTS
I would like to thank Dr. Neil Rowe for his guidance as advisor for this thesis and Dr. Alan Shaffer for his support. I would also like to thank Dr. Cynthia Irvine for supporting me and the other Scholarship for Service students through our journey at the Naval Postgraduate School. This material is based upon work supported by the National Science Foundation under Grant No. DUE-1241432. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
The menace of malware (malicious code) is widespread and pervasive. It affects businesses, governments, hospitals, and schools around the world. The goals of malware range from petty crime to espionage and warfare. Individual malware attacks have caused billions in damages. For instance, the ILOVEYOU worm has caused over $10 billion in damages (Ingram, 2000), and the MyDoom worm is estimated to have caused over $38 billion in damage worldwide (Palmer, 2019). More menacing malware such as the Triton malware which targets industrial safety-control systems can endanger human lives (Giles, 2019). The Stuxnet worm sabotaged industrial uranium-enrichment facilities (Zeter, 2014).
Because these threats, the anti-malware industry provides automated software to prevent and mitigate digital threats such as malware infections. This industry, which includes such major companies as McAfee, Symantec, Bitdefender, Kaspersky, and IBM, has achieved $36.3 billion in global revenue in 2018 (Liu, 2020).
The products created by anti-malware vendors are useful defensive tools, but they are inevitably imperfect. An early paper argued that detection of computer viruses, one malware, is an undecidable problem, and that a perfect virus detector can therefore never be implemented (Cohen, 1987). (Chess & White, 2000) formally proved that not only is it impossible to make a perfect virus detector, but a virus can be created for which no automated detector can be implemented without producing false positives. These results can extend to other malware. This suggests that those who wish to defend digital systems against malware infections have a challenging task.
A difficulty of malware detection is the subjective element in what defines malware. Effective malware is designed so that any traits a malware file or process may have could also exist in a benign program. Some programs may even be considered either malicious or benign depending on the context in which they are found. It is the difficulty of determining purpose that makes malware difficult to detect. Malware authors aim to use this difficulty to make their products more resilient.
Malware authors have used many techniques to evade detection. Examples are encryption or steganography to hide their malicious payloads, disguising the malware as legitimate programs, and detecting analysis programs to switch tactics. By learning more about the evasion methods used by a malware, defenders can produce better ways to detect it.
One method of making malware less detectable is to split it across multiple files, a K-ary design (Moubarak, Chamoun, & Filiol, 2018). This aids code obfuscation in that the pieces are less likely to show suspicious features than the whole. The detection of malicious features of K-ary code has been proven to be NP-complete which means it is computationally very difficult (Filiol, 2007). Additionally, K-ary code can be enhanced with other stealthy techniques.
K-ary program designs can complicate malware detection, but little research has been done on them and few examples have been found outside of laboratory environments. The development of this technique may follow the pattern of previous techniques. Early work analyzed a use of asymmetric encryption in a malicious tool to encrypt user files and hold them for ransom (Young & Yung, 1996). However, it was not until 2009 that the Archievus ransomware implemented a matching design (O'Kane, Sezer, & Carlin, 2018). Today attacks using this design have become common. Similarly, significant K-ary malware might not be seen for many years. If it does show to be advantageous, defenders against malware should be prepared for its use.
The next chapter is an overview of malware detection and anti-malware evasion techniques. Chapter III will outline the methodology of the experiments in this thesis. Chapter IV will describe the results. Chapter V will give concluding remarks and suggestions for future research.
Automated malware detection by anti-malware and anti-virus programs seeks to recognize malware and prevent it from damaging systems. “Anti-virus” is somewhat a misnomer as viruses are only one kind of malware these systems defend against. Anti-malware programs use both static techniques, examining binary code as it is stored in secondary storage, or dynamic techniques, examining the malware’s behavior or memory as it is executed.
In response to anti-malware tools, malware authors have devised techniques to evade them. These techniques include obfuscation, anti-debugging, and attacks on the anti-malware programs themselves. Although other malware evasion techniques will be briefly addressed, this thesis will concentrate on the obfuscation techniques of polymorphism, metamorphism, and steganography in the context of K-ary code.
Malware detection can be either signature-based or anomaly-based (Desai & Stamp, 2010). Signature-based detection is the primary method used by anti-malware tools (Dalla Preda & Giusto, 2011). It recognizes specific artifacts of a file or process which uniquely identify it as malware. The benefits of signature-based detection are efficiency and a low false-positive rate. Its weakness is that it cannot detect new malware and it may not recognize obfuscated malware such as polymorphic and metamorphic (Saeed, Selamat, & Abuagoub 2013; You & Yim, 2010).
Anti-malware tools may also use statistical analyses of anomalies (Gaikwad, Motwani, & Shinde, 2015; Bazrafshan et al., 2013; Desai & Stamp, 2010). Anomaly-detection techniques classify files as malware by recognizing statistical characteristics of known malware. The weakness of anomaly detection is that it produces more false positives than traditional signature-based methods (Wong & Stamp, 2006).
Signature-based detection techniques usually match a known malware signature against the bytes of a file or process under inspection. Usually a malware signature is a string of bytes that appear in the malware’s code although it may be other artifacts as well (Idika & Mathur, 2007). Signatures are made sufficiently specific that a positive match indicates a high probability of malware. Signature-based scanning may do decoding or unpacking (Guo, Ferrie, & Chiueh, 2008). However, signature-based detection is not very effective against malware that uses encryption or specialized compression to obfuscate its own code.
A signature in a file can be matched exactly, but more advanced matching can use “wildcard” bytes, mismatch tolerances, regular expressions, or graph-based signatures (Szor, 2005). To increase the speed of signature-based detection, detectors may examine only the parts of a file that are more likely to contain the signature such as the beginning, end, or entry point of an executable. Anti-malware software may also use checksums (noncryptographic hashes) or cryptographic hashes to do static signature-based detection (Bachaalany & Koret, 2015). Some examine the cyclic redundancy check (CRC) (Peterson & Brown 1961). Signature-based detection does have some ability to detect new variants of malware. Significant code from previously identified malware is often reused by malware authors (Keen, n.d.). Detecting malware variants in this way is generic detection.
Hashes may be applied to the entire file under inspection or to some portion of it to get signatures. More advanced signature-detection techniques such as fuzzy hashes and graph-based signatures avoid some limitations of traditional hashes. Fuzzy-hash algorithms are designed so that the difference between the hashes of two files will be somewhat proportional to the amount of difference between the files themselves (Bachaalany & Koret, 2015). Anti-malware tools using fuzzy hashes can ignore trivial changes made to try to obfuscate code.
Executable files with encrypted or compressed bodies offer few signatures. Some anti-malware tools dynamically examine these files by emulating their execution in a sandboxed environment that protects the host system (Wong & Stamp, 2006). Then the files may decrypt or unpack themselves into a state that can be examined with signature-based detection techniques (You & Yim, 2010).
Packers are programs that compress, encrypt, or otherwise obscure the appearance of a target binary without changing its execution (Guo et al., 2008). Packers can make executable files smaller but are now often for obfuscating malware (Arntz, 2017). If the packer used on a file under inspection can be identified and an unpacker for that packer is available, anti-malware analysis can unpack the file to do a signature-based scan on it. Otherwise, dynamic inspection can do signature-based scanning or a new unpacker can be built.
Anomaly-detection techniques can use methods such as data mining, machine learning, and statistical analyses to recognize file traits that are suspicious. Often anomaly detection uses the consensus of an ensemble of detection techniques since any single technique tends to result in significantly more false positives than signature-based techniques (Desai & Stamp, 2010). Statistical and machine-learning approaches to anomaly malware detection that try to identify characteristics indicative of malware using data analysis exist.
Most anomaly engines use a rule-based or weight-based system to estimate the likelihood that a file or process under inspection is malicious (Gaikwad, et al., 2015). In a weight-based system, indicators of malware are given weights proportional to how strong they are as indicators. A file or process under inspection is analyzed for these characteristics and the weighted sum over all characteristics is taken; if the sum exceeds a set threshold, the file or process is labelled as malware. A rule-based system uses a set of if-then rules on a file or process (Schmall, 2002). If a pattern of rules applies, the file or process is classified as malware. If other patterns of rules apply, the file or process can be classified as non-malware. Specialized clues for anomaly detection include:
· System-call sequences: Application-programmer interface (API) calls are used by a program to interact with its host system. These calls may read, create, delete, or modify files on a system, or access a network. Malicious files often use these calls to carry out malicious actions (Bazrafshan et al., 2013). Permutations of call sequences are used by metamorphic malware to obfuscate themselves from detection (Lee, Jeong, & Lee, 2010), but these permutations can be detected too.
· Control flow and call graphs: Although some aspects of a polymorphic malware may change between instances, the execution flow often remains the same (Bachaalany & Koret, 2015). Consequently graph-based detection could detect metamorphic malware (Eskandari & Hashemi, 2011). Graph-based detection uses call graphs or flow graphs, and parses the executable code, usually after disassembly. .However, graph-based detection can be slow and computationally expensive.
· Other clues to malware are code sections marked as writable but not readable, control-flow redirection to a section that does not typically contain code, gaps between sections of a program, multiple file headers in a single file, and signs of a patched import address table (Szor, 2005).
Metamorphic malware applies code-transformation techniques to the malware body. Transformations require that functionality be preserved without requiring runtime adjustments. These techniques include (Desai and Stamp, 2010; You and Yim, 2010):
· Dead code insertion: Code which does not affect the state of the process is inserted between legitimate instructions.
· Independent code permutation: Nearby instructions that are independent are permuted.
· Code transposition using added control flow: Segments of code are distributed to different parts of the file. Control flow instructions are added to jump to them.
· Register renaming: Registers used as operands in instructions are changed.
· Equivalent code substitution: Blocks of code are replaced by different blocks that do the same function.
· Subroutine permutation: The order and position of subroutines in a file are changed.
· Subroutine inlining: A call to a subroutine is replaced by that subroutine’s code.
· Subroutine outlining: Portions of code are made into subroutines and replaced by calls to those subroutines.
Metamorphism aims to resist traditional signature-based detection as it eliminates many signatures. It resists emulation techniques for encrypted malware as emulating a metamorphic malware’s execution will not find any signatures. Metamorphic malware can be vulnerable to other dynamic analysis techniques that examine the stack during execution or examine its behavior, or that use graph-based detection techniques and Markov models (Lee, Jeong, & Lee, 2010; Eskandari & Hashemi, 2011).
Monomorphic malware often contains a payload of encrypted malicious code and a built-in decryptor. The decryption key can be changed between instances to make it harder to decrypt. However, this key itself must remain unencrypted and may be a detectable signature. Furthermore, emulation can see the malware decrypt itself, exposing the malware’s payload for analysis.
Oligomorphic malware uses several decryptors by making trivial changes to its decryption code between instances, or by carrying code for several distinct decryptors (Schiffman, 2010). Although oligomorphic malware is more challenging to detect than monomorphic malware, its decryptors still can be detected.
93% of the malware seen in 2017 by one study used polymorphism, which improves on oligomorphism by using a much larger number of encryptors and decryptors (Webroot, 2019). This is often done by metamorphic code-transformation techniques that allow the decryption code itself to change into many variants while preserving its functionality. However, under emulation a polymorphic malware will decrypt its body, revealing signatures. “On-demand” polymorphism decrypts its code in memory as needed (Alvarez, 2018) so the code is never entirely exposed in memory for signature checking at one time. On-demand polymorphism can evade certain anti-malware detection techniques, as well as slow down and deter manual analysis.
Many malware detection and analysis techniques use emulators and virtual machines to do dynamic analysis. Malware could possibly recognize these techniques with emulator fingerprinting and route its execution to non-malicious code or end its execution. Fingerprinting techniques include (Bulazel et al., 2017):
· Environmental artifacts: Usernames, files, environment variable values, and others.
· Timing: Inconsistencies or skews which indicate a process is virtualized.
· Process introspection: Recognition of libraries or hooks that have been injected into the malware code, indicating potential analysis.
· Processor anomalies: Discrepancies in execution that indicate emulation is occurring, since it is difficult for an emulator to perfectly replicate a processor or operating system.
· Reverse Turing tests: Lack of signs of use of a system such as Web-browser history, login times, and the presence of input devices.
· Network artifacts: Discrepancies in what would typically be seen in a network.
Anti-malware tools can use debugging techniques to monitor the behavior of running processes. By detecting these techniques, malware can route execution to benign code or end execution altogether to evade inspection. Alternatively, malware can remove debugging instructions and continue to execute maliciously (Bachaalany & Koret, 2015). On Windows hosts, the functions IsDebuggerPresent and CheckRemoteDebuggerPresent can tell whether a process is being debugged (Gao, Lu, & Luo, 2014). Similarly, on Windows NT, flags stored in the NtGlobalFlag variable in the system registry indicate whether a process is being debugged (Kulchytskyy & Kukoba, 2019). However, references to these things themselves are good signatures for anti-malware tools (Chen, 2018).
Malware can also inspect its own memory state to detect the presence of a debugger (Bulazel & Yener, 2017). Debuggers often operate by injecting debugging libraries into a process under inspection and modifying function calls to route execution to debugger routines before carrying out the originally indicated operation. By retrieving the original libraries and writing them to memory, malware can overwrite code injected for debugging purposes. Just as anti-malware tools use signature detection to detect malware, malware itself can use the same technique to detect anti-malware programs (Gao, Lu, & Luo, 2014). Anti-debugging techniques specific to certain debuggers such as the manipulation of the handle to the SoftICE kernel-level debugger or the modification of the MODULEENTRY32 structure’s value to disrupt the functionality of the Dump monitoring tool also exist.
Disassembly refers to reconstructing source code from executable code and is used by many malware analyses. Finding an acceptable disassembly is a hard computational problem in the semi-decidable class of complexity. In fact, even the best disassemblers cannot correctly disassemble 40-60% of a program’s executable code (Popov, Debray, & Andrews, 2007). Malware can exacerbate the difficulty of disassembly by several anti-disassembly techniques: indirect jumps, system exception handling, and anti-emulation techniques like those mentioned previously.
Some disassemblers use “recursive traversal” to follow the control flow of a program through and produce a more accurate analysis. “Flower instructions” can exploit this to confuse the disassembler. They use control-flow instructions which point to sections of valid code but at incorrect offsets, causing the disassembler to misinterpret those sections of code. However, when the malware is executed, it is designed so that the incorrect offsets are not executed. Many machines support runtime-determined control flow as indirect jumps that can route control flow based on the value they are passed in their operand at runtime. Malware can use this to obfuscate because determining what this value may be during disassembly is a difficult computational problem.
Operations can also be obfuscated with exception handling that uses custom signal handlers to redirect execution. Calls and jumps in the program can then be replaced by instructions that are guaranteed to cause exceptions, obfuscating the control-flow change.
Covert channels can also conceal the existence of information. Steganography is a covert channel which hides private information (Rowe & Rrushi, 2016). It often involves embedding sensitive data within an image, video, or audio file so that the data is not detectable by humans using it (z3roTrust, 2018). A popular steganography technique is to use the least-significant bits of values in some media’s data to hide information. Such small changes to the media file may not be detected by human observation.
Malware can use steganography to hide malware code or the data it intends to exfiltrate (Yoon, 2019). Malware that uses steganography is sometimes called stegomalware (Suarez-Tangil, Tapiador, & Peris-Lopez, 2014). Examples are AdGholas, which embeds malicious JavaScript code in images and other files (SentinelOne, 2019), and ZeroT, which embeds its payload in a set of images (Yoon, 2019). Many anti-malware tools will only check specific files like executables for malware, overlooking most steganographic embedding techniques (Ramill & Bishop, 2010).
Packers are file compressors that encode data to reduce its size but allow its complete reconstruction later (Arntz, 2017). Packers can also be used as obfuscation tools since packing eliminates signatures. An estimated 80% of malware uses some form of packer (Guo et al., 2008). Because packers often use well-known methods, anti-malware tools and malware analysts often have unpacker programs for them. Even if no unpacker is available for a specific packed program, emulation can wait for the program to unpack itself before analysis.
Another malware-obfuscation technique is dividing malicious code into multiple files (Ramilli, Bishop, & Sun, 2011). It may then be possible then that no subfile holds a suspicious signature (Ramill & Bishop, 2010). Three possible methods of multifile malware design using K files are:
1. Centralized: A single initiating process locates the subfiles, assembles them in its own memory, and executes them (Ramilli et al., 2010).
2. Sequential actors: K processes are activated in a sequence to do their malicious actions (Filiol, 2007).
3. Parallel actors: K separate processes act simultaneously to carry out the intended malware actions.
All three designs can obfuscate against static signature-based detection without encryption or metamorphism. A malware with a centralized design is still susceptible to dynamic analysis as its malicious code will be exposed in memory at times as it executes. The sequential and parallel-actor designs will likely raise fewer alerts in an anomaly-analysis engine. The latter can also maintain persistence as the malware’s multiple processes can be designed to restore any individual file or process that is removed by anti-malware (Filiol, 2007).
K-ary malware design may not provide sufficient concealment for all signatures because a signature may be as little as 16 bytes (Szor, 2005). Concealment may be especially difficult for the sequential and parallel-actor designs because each process requires a minimum amount of code to execute.
Although multifile malware design reduces signatures, it can introduce new signatures in the additional code that coordinates processes. However, in the centralized design the initiating process can be written with only a small amount of very common code, making it difficult to extract a useful static signature (Ramilli et al., 2010). To further obfuscate the links between processes, covert channels can coordinate them (Ramilli et al., 2011). A blockchain can secure interprocess coordination (Moubarak et al., 2018), and processes can coordinate operations without direct references to one another. An advantage of K-ary malware designs is that deciding which files or processes should be examined together is an NP-complete problem. Though such problems can be addressed by heuristics to correlate their constituent parts, the combination of K-ary code with encryption, steganography, and metamorphism can make malware very difficult to detect.
This thesis focuses on K-ary malware designs. Multifile and multiprocess designs for K-ary malware can complicate analyzing it because anti-malware tools must examine different combinations of files together. Exhaustively examining every combination that might lie on a system would be impractical and could result in many false positives and false negatives. On the other hand, because signature patterns can be relatively small, determining how a malware should be partitioned requires significant research by malware authors (Ramilli & Bishop, 2010). Antivirus vendors may try to make this harder by defining even shorter patterns with probabilistic inference methods. K-ary designs do give some protection from behavior-based detection as discussed earlier, however (Ramilli et al., 2011).
Statistical analyses of executable code may be especially useful for correlating K-ary files or processes. Executable code that have related functionality likely have related features , as might code written by the same author or with the same tools. Hidden-Markov models and similarity-based comparisons could correlate code obfuscated by the same metamorphic engine.
Previous work has shown that K-ary malware program designs can evade both static and dynamic detection by anti-malware software (Ramilli & Bishop, 2010; Ramilli, Bishop, & Sun, 2011). Signatures may be divided when K-ary malware are partitioned, making them harder to identify without gathering the pieces. Our experiments tested techniques that could make K-ary programs easier to detect by correlating their pieces. To test our hypotheses, we created a parallel K-ary program “ManyEyes”. To simulate malicious actions, ManyEyes is designed to be a keylogger. A keylogger is a program that monitors and logs the keystrokes made by a user. Similar programs have been used by criminal hackers to steal passwords and other sensitive information (Solairaj, Prabanand et al., 2016). To exemplify ways that traditional obfuscation techniques might be used in K-ary programs, ManyEyes used steganography, encryption, and metamorphism.
We did two sets of experiments. The first applied statistical analyses to the program pieces to try to correlate them. These statistical analyses used bigram histograms, cosine similarity, and a clustering algorithm. Although the files used are static, they were processed to simulate aspects of them being loaded into memory. This set of experiments was done several times with a variety of malware and benignware combinations. The second set of experiments tried to correlate K-ary memory contents during runs. Memory dumps of a running instance of ManyEyes were compared to those of other processes using the same statistical techniques.
ManyEyes is made up of three forms of subprograms: code-gathering, key-monitoring, and log-aggregating. There may be many instances of each subprogram in an instance of ManyEyes. The code-gathering subprograms were Windows executable programs that gathered parts of the partitioned executable from secondary storage and executed them in memory. These programs were written in C++ and compiled for x86 architecture systems using the Microsoft C/C++ compiler. Unlike a fully centralized design, there is a separate code-gathering program for each key-monitoring subprogram to be run. The key-monitoring subprograms were also Windows executable programs and monitor keypresses and writes logs of them. These were written in C and compiled for the 32-bit x86 architecture using the Microsoft C/C++ compiler. The log aggregators were Python scripts which concatenate log files. The code-gathering subprograms were 23KB. The size of the key-monitoring programs varied because they are generated by a metamorphic engine; originally they were 12KB, but when generated by our metamorphic engine they tended to be around 45KB. The log aggregators tended to be around 5KB.
A configurable Python script is used to build an instance of ManyEyes. ManyEyes may be configured with many key-monitoring subprograms or just a few. The processing resources used by ManyEyes also varied. If the key-monitoring programs are given a long delay between keyboard queries, processing time will be high. Our experiments used a delay of 10ms between each set of queries, which caused the Windows Task Manager to classify the power usage of these programs as “Very low”.
ManyEyes functions in five stages: metamorphic code generation, steganographic storage, in-memory execution, keylogging, and log aggregation. Although some log aggregation and keylogging may occur simultaneously, each keylogger process must hand logs off to an aggregator. The in-memory execution and keylogging stages explicitly involve parallel K-ary code design and the gathering of code portions of code from steganographic storage emulating the centralized design. In this way, ManyEyes uses aspects of all three of the K-ary malware designs although it is primarily a parallel-design K-ary program.
Before ManyEyes is deployed, it applies metamorphic transformations to the executable code pieces. For key monitoring code, we used subroutine reordering and dead-code insertion, and we randomized strings and symbols that could be modified without changing the functionality of the program. These transformations were done at the assembly level before compilation. For log aggregators, we just changed the names of functions and variables.
A goal of ManyEyes’ design was to prevent key-monitoring code from being present in an unobfuscated form in storage since many anti-malware tools recognize many keylogging methods. To do this, we also used steganography and encryption to obfuscate the keylogger code. The ManyEyes steganography encoded the key-monitoring programs within 24-bit bitmap pictures. The least significant bit of each of the red, green, and blue values for each pixel in the pictures were used for storing the code, which caused only minor changes to the appearance of the pictures that were not visible to the human eye. Additionally, the code stored in these least-significant bits was encoded using an exclusive-or with a key between four and twenty-nine bytes in length. Each bit of the key was applied to the least significant bits of each pixel in a sequential fashion and the key was repeated until exhausted.
The key-monitoring programs were 45 KB and usually the entire keylogger payload could be stored in five bitmap files in our experiments. Three of these images were 128 by 209 pixels and consumed approximately 80 KB in storage; the other two were 148 by 197 pixels and consumed about 86 KB in storage. This means that the key-monitoring programs used approximately one ninth of the data in the pictures.
The key-monitoring programs were invoked by the code-gathering subprograms. The code-gathering subprograms extracted the encrypted key-monitoring programs, decrypted them, and executed them in memory without allowing them to appear in secondary memory in an unobfuscated form. This required challenging workarounds in the Windows operating system. We used a method “RunPE” in which a host process spawned a child process in a suspended state, and then wrote code in the Windows portable executable (PE) format to the new process’ memory before releasing its suspension. The cross-process memory writing is done using the Windows API with calls to WriteProcessMemory and VirtualAllocEx. Our implementation of the RunPE method is a modified version of one available online (Zer0Mem0ry, 2016). It would have been simpler to concatenate the pieces, write them to a file, and then execute the file from static memory, but that would reveal the unencoded key-monitoring programs in secondary storage where antimalware programs could analyze them easily.
In the third stage, the key-monitoring programs put into memory were executed to monitor keystrokes and write them to log files. The GetAsyncKeyState function in the Windows API was queried repeatedly for each of the keys being monitored. Anti-malware tools could potentially detect this keylogging by looking for frequent calls to the GetAsyncKeyState, but this was obfuscated by splitting the calls over parallel key-monitoring programs that reduced the number of calls by each process.
Each keylogger process wrote to log files that were then aggregated by processes in the fourth stage. Each aggregator was assigned multiple logs to monitor. The aggregators concatenated logs and wrote the combined logs to a new location. A hierarchy of log aggregators was created with some reading the logs produced by other aggregators, to lower the chance of all the key-monitoring programs being discovered in examining the files any single aggregator read. Additionally, each aggregator program tried to read several random files in the filesystem to confuse attempts to inspect its output.
Our experiments tested the correlation of our K-ary programs using statistical techniques and applied them to live processes as well as static code. To do this, we clustered code pieces based on similarity. The similarity metric was the cosine similarity of the histograms of byte sequences of length N (N-grams) in each. N-gram statistical analyses are often used in natural-language processing and deep packet inspection for cybersecurity. We then clustered all the malware and benignware pieces and measured how often malware pieces were placed in the same cluster.
Our first experiment counted 2-byte N-grams (bigrams) in 1665 benign and 1235 malicious PE executable-file samples. The benign samples were collected from a Windows computer with the permission of the owner. The malicious samples were selected from a corpus of malware previously used in academic work (Garfinkel, Farrell, Roussev, & Dinolt, 2009). There were 32,4252,414 bigrams in the benign samples and 139,696,687 bigrams in the malicious samples. Both sets of data contained 65536 unique bigrams which means that every possible bigram was represented. The average malicious file was 218KB, and the average benignware file was 407KB.
We used cosine similarity to quantify the difference
between two histograms (Han, Pei, & Kamber, 2011). Its advantage is that histograms
of different total counts can be compared. The cosine similarity formula is
shown in Figure 1 below. The cosine similarity of histograms A and B
is represented in Figure 1 where  ,
,  , is a bigram,
, is a bigram,  is the frequency of bigram i
in A, and
 is the frequency of bigram i
in A, and  is the frequency of that bigram
in B.
 is the frequency of that bigram
in B.

Figure 1. Cosine similarity formula
For each of 1,000 runs of each experiment, a malware program and 200 benign programs were randomly selected. This was to simulate the proportion of malware to benignware that is likely to run simultaneously. For each run, the histograms for each of the K pieces of the malware being tested and the 200 benign samples were paired and their cosine similarity was calculated. The benignware were not split as we expect they would not be on a real system. We then used the HDBSCAN clustering algorithm correlate the malware parts (McInnes, Healy, & Astels, 2017). HDBSCAN is a hierarchical density-based clustering algorithm based on the DBSCAN clustering algorithm. HDBSCAN can discover clusters of varying densities and requires less prior knowledge about the nature of the clusters than related algorithms. Although HDBSCAN may identify a variable number of clusters, the success of our experiments is only determined by the number of clusters which contain malware pieces.
As previously noted, few K-ary malware samples exist. To test our hypotheses at a significant scale, 1235 malware samples were used and their executable code was split into five parts for the first 1000 runs of this experiment and fifteen parts for the second 1000 runs of this experiment. In each experiment run a single one of these malware will be selected.
Most modern operating systems use the security technique of address-space layout randomization (ASLR) which causes memory references and virtual-memory placements to differ between each instance of a program. Because of ASLR we expect real-world parallel K-ary malware implementations will have varying memory-references between their constituent processes. To prevent common memory references from biasing the experiment, we added a random integer to memory references marked for modification in relocation table for each malware part. This simulates the process of “memory mapping” by an operating system in which a program’s code is placed in memory and memory references in the code are updated to reflect where it is placed. In our experiments, this preparation is done before bigrams are counted. For consistency, the benign programs also underwent this simulated memory mapping.
Success of our K-ary malware partitioning can be measured by the number of malware pieces clustered together. Because it is most likely to contain a malware signature, we consider the cluster that contains the most pieces to be the classification of the malware. True positives are the malware pieces in that cluster. False positives are the benignware in that cluster. True negatives are the benignware in other clusters. False negatives are the malware pieces in the other cluster. Using these measures, we can calculate accuracy, precision, recall, and the F-score for the identification of malware pieces. Accuracy measures how many benign programs are not placed into clusters with malware parts and how many malware parts are. Recall measures how many malware parts are in clusters with other malware parts. Precision measures how many few benign programs are clustered with malware parts. The F-score combines the recall and precision into a single value.
Our second set of experiments correlated the running processes of ManyEyes on a Windows system. For these experiments we captured 120 memory dumps of benign processes and two of ManyEyes. The dumps captured were only of memory regions marked as executable. One ManyEyes version was composed of five code-gathering subprograms, five key-monitoring programs, and a single log aggregator. Since this configuration results in six parallel processes, we refer to it as the 6-piece ManyEyes instance. The second version was composed of fifteen code-gathering subprograms, fifteen key-monitoring subprograms, and five log aggregators. Because this results in twenty parallel processes, we refer to it as the twenty-piece ManyEyes instance.
Collecting data from live processes has its challenges. Some data could be from uninitialized portions of the memory with only zeros and other uninitialized data may include data from previous processes. That data can add noise to our analyses. Some of it can be eliminated by examining its entropy. Very low or very high entropy may indicate the presence of data that would not be useful to our analyses. We examined sixteen byte chunks of data using Shannon entropy and removed portions that had entropies higher than 0.7 or lower than 0.3 which would remove especially low or high entropy data.
To produce the memory dumps, we created a tool named “MemoryScanner” to capture executable regions of memory from processes running in a Windows system. This tool uses Windows ReadProcessMemory API. Some memory regions are protected by PAGE_NOACCESS and PAGE_GUARD permissions. To maintain system stability, those regions are avoided by the scanner. The tool does not address potential image “smear” from reading memory while it is changing. We chose to build our own tool rather than use another so that we would have greater control of what data it accessed in case changes would need to be done to the experiment.
Once the memory dumps were collected, the analyses used in the experiments described in Section A of this chapter were used to attempt to correlate the dumps of the ManyEyes subprograms among the benign memory dumps. This experiment will be repeated without entropy analysis, without metamorphism, and without both.
We ran two sets of experiments with statistical analysis static executable files, one in which the malware files were divided into five pieces and one where they were divided into fifteen pieces. Each experiment was run with a randomly selected malware and two hundred randomly selected benign executables. Each set of experiments did 1000 runs to ensure that a breadth of malware/benignware combinations had been tested.
Table 1 shows the average results of our experiments with malware split into five parts. High values of the statistics of cluster accuracy, precision, recall, and the F-score indicate clusters have more malware parts. We see that on average about half of malware pieces were correlated into a single cluster.
Table 1. Average Statistics for 5-split Malware Experiments
| Average Accuracy | 0.9827 | 
| Average Precision | 0.6927 | 
| Average Recall | 0.51924 | 
| Average F-score | 0.5935 | 
Table 2 shows the results of the experiments using malware split into fifteen pieces. The precision increased significantly, but the recall decreased as well. These results show that our statistical techniques can associate some K-ary code pieces, but this correlation was imperfect and worsens as the pieces become smaller in size and larger in number.
Table 2. Average Statistics for 15-split Malware Experiments
| Average Accuracy | 0.9488 | 
| Average Precision | 0.7935 | 
| Average Recall | 0.3593 | 
| Average F-score | 0.4946 | 
Figure 2 shows accuracy, precision, recall, and F-score for comparison of memory dumps of processes using a 20-piece instance of ManyEyes malware with fifteen keylogger processes and five log aggregators. When metamorphism was enabled and extreme-entropy code was removed from consideration, the recall was much higher than the other runs. This is surprising because the metamorphic engine tries to make correlation harder. This may have been because of the limited variety of its dead-code instructions. When metamorphism was disabled (by using identical code in the parallel key-monitoring subprograms), the recall and F-score increased slightly when extreme entropy was removed.
 
Figure 2. Performance of Clustering of 20-Piece ManyEyes Processes
Figure 3 shows statistics for the 6-piece ManyEyes experiments. Precision was much higher than in the 20-piece experiments. F-score and recall were highest when metamorphism was disabled and extreme entropy was not removed. These tests gave expected results with the disabling of metamorphism and removing extreme entropy, both resulting in better correlation of ManyEyes’ processes. The precision of each of these experiments was perfect except for when metamorphism was disabled, and extreme entropy was removed which was the worst performing of the 6-piece tests. Unlike with the 20-piece tests, removing extreme entropy did not improve performance. Uninitialized memory can contain code from other processes, which means that entropy analysis alone may not suffice to isolate a live process’ executable code. These tests show that the correlation of K-ary processes using statistics is possible, but also not guaranteed.

Figure 3. Statistics of Clustering of 6-Piece ManyEyes Processes
This thesis explored the potential stealth of K-ary malicious code. We demonstrated a K-ary code design which combined traditional code-obfuscation techniques. Through this implementation we saw that steganography, encryption, and metamorphism can increase its stealth. Metamorphism both protects malware by reducing signatures and preventing correlation of pieces of malware.
Because correlating pieces of K-ary code can make it easier to detect, this thesis explored a statistical method using cosine-similarity comparison of bigram histograms on running processes. These experiments showed that it was effective. Memory dumps of ManyEyes were also compared to memory dumps of other benign programs. These tests did show some success, but the correlation was lower than with the static images.
One possible future direction for a K-ary malware design is that parts of the malware could monitor the others and restore any that are removed by a system administrator or antimalware program (Filiol, 2007). Shifting or permuting code across files could also protect K-ary files against signature-based detection. Grammars can automatically generate K-ary codes (Gueguen, 2010). Techniques that would allow K-ary code to collaborate stealthily is another are another potential area for future research.
The degree to which K-ary code will be used by malware authors to obfuscate their malware remains to be seen and its potential threat is unclear. Our results suggest that statistical analysis can decrease the threat of K-ary code should it ever be weaponized. However, the analytical techniques used in this thesis were imperfect and need to be improved for malware detection.
Table 3. Statistics for Correlation of 6-Piece ManyEyes Processes with Extreme Entropy Removed
| Accuracy | .9603 | 
| Precision | 1 | 
| Recall | .1667 | 
| F-score | .2857 | 
Table 4. Clustering of 6-Piece Malware Processes with Extreme Entropy Removed
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 2 | 0 | 
| 2 | 2 | 0 | 
| 3 | 2 | 23 | 
Table 5. Statistics for Correlation of 20-Piece Malware Processes with Extreme Entropy Removed
| Accuracy | 0.9357 | 
| Precision | 1.0000 | 
| Recall | 0.5500 | 
| F-score | 0.7096 | 
Table 6. Clustering of 20-Piece Malware Processes with Extreme Entropy Removed
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 12 | 0 | 
| 2 | 3 | 0 | 
| 3 | 3 | 27 | 
| 4 | 1 | 3 | 
Table 7. Statistics for Correlation of 6-Piece Malware Processes
| Accuracy | .9603 | 
| Precision | 1.0000 | 
| Recall | .1667 | 
| F-score | .2857 | 
Table 8. Clustering of 6-Piece Malware Processes
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 2 | 0 | 
| 2 | 2 | 0 | 
| 3 | 1 | 27 | 
| 4 | 1 | 2 | 
Table 9. Statistics for Correlation of 20-Piece Malware Processes
| Accuracy | 0.9111 | 
| Precision | 1.0000 | 
| Recall | 0.2000 | 
| F-score | 0.3333 | 
Table 10. Clustering of 20-Piece Malware Processes
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 4 | 0 | 
| 2 | 2 | 0 | 
| 3 | 2 | 0 | 
| 4 | 2 | 0 | 
| 5 | 4 | 0 | 
| 6 | 2 | 0 | 
| 7 | 2 | 0 | 
| Accuracy | .8175 | 
| Precision | .0952 | 
| Recall | .3333 | 
| F-score | .1481 | 
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 3 | 0 | 
| 2 | 3 | 19 | 
| Accuracy | 0.6963 | 
| Precision | 0.0667 | 
| Recall | 0.1333 | 
| F-score | 0.0889 | 
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 3 | 28 | 
| 2 | 2 | 0 | 
| 3 | 2 | 0 | 
| 4 | 3 | 0 | 
| 5 | 3 | 0 | 
| 6 | 2 | 0 | 
| 7 | 2 | 0 | 
| 8 | 2 | 0 | 
| 9 | 2 | 3 | 
Table 15. Statistics for Correlation of 6-Piece Malware Processes, Metamorphism Disabled
| Accuracy | .9683 | 
| Precision | 1.0000 | 
| Recall | 0.3333 | 
| F-score | 0.5000 | 
Table 16. Clustering of 6-Piece Malware Processes, Metamorphism Disabled
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 3 | 0 | 
| 2 | 2 | 0 | 
| 3 | 1 | 20 | 
Table 17. Statistics for Correlation of 20-Piece Malware Processes, Metamorphism Disabled
| Accuracy | 0.6714 | 
| Precision | 0.0667 | 
| Recall | 0.1000 | 
| F-score | 0.0800 | 
Table 18. Clustering of 20-Piece Malware Processes, Metamorphism Disabled
| Cluster | Number of Malware Parts | Number of Benign Processes | 
| 1 | 3 | 25 | 
| 2 | 2 | 0 | 
| 3 | 2 | 0 | 
| 4 | 3 | 0 | 
| 5 | 3 | 0 | 
| 6 | 4 | 0 | 
| 7 | 2 | 0 | 
| 8 | 2 | 5 | 
Al-Kuwari, S., Davenport, J. H., & Bradford, R. J. (2011). Cryptographic hash functions: recent design trends and security notions. IACR Cryptology ePrint Archive, 2011, 565.
Alvarez, R. (2018, March 26). Dissecting a Metamorphic File-Infecting Ransomware (Raul Alvarez). Retrieved from https://www.youtube.com/watch?v=vJ08_6CCd6g.
Arntz, P. (2017). Explained: Packer, Cryptor, and Protector. Retrieved from https://blog.malwarebytes.com/cybercrime/malware/2017/03/explained-packer-crypter-and-protector/.
Bachaalany, E. and Koret, J. (2015). The Antivirus Hacker's Handbook. Wiley. Retrieved from https://learning.oreilly.com/library/view/the-antivirus-hackers/9781119028758/.
Bazrafshan, Z., Hashemi, H., Fard, S. M. H., & Hamzeh, A. (2013, May). A survey on heuristic malware detection techniques. In 5th IEEE Conference on Information and Knowledge Technology (pp. 113-120).
Bulazel, A. (2016, November 22). AVLeak: Fingerprinting Antivirus Emulators for Advanced Malware Evasion. Retrieved from https://www.youtube.com/watch?v=a6yOwvFds78.
Bulazel, A., & Yener, B. (2017, November). A survey on automated dynamic malware analysis evasion and counter-evasion: PC, mobile, and web. In Proceedings of the 1st ACM Reversing and Offensive-oriented Trends Symposium (p. 2).
Chen, W. (2018, October 9). Encapsulating Antivirus (AV) Evasion Techniques in Metasploit Framework. Retrieved from https://www.rapid7.com/globalassets/ _pdfs/whitepaperguide/rapid7-whitepaper-metasploit-framework-encapsulating-av-techniques.pdf.
Chen, X., Andersen, J., Mao, Z. M., Bailey, M., & Nazario, J. (2008, June). Towards an understanding of anti-virtualization and anti-debugging behavior in modern malware. In IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN) (pp. 177-186).
Chess, D. M., & White, S. R. (2000, September). An undetectable computer virus. In Proceedings of Virus Bulletin Conference (Vol. 5, pp. 1-4).Cohen, F. (1987). Computer viruses: theory and experiments. Computers & Security, 6(1), 22-35.
Dalla Preda, M., & Di Giusto, C. (2011, August). Hunting distributed malware with the κ-calculus. In International Symposium on Fundamentals of Computation Theory (pp. 102-113). Springer, Berlin, Heidelberg.
Desai, P., & Stamp, M. (2010). A highly metamorphic virus generator. International Journal of Multimedia Intelligence and Security, 1(4), 402-427.
Eskandari, M., & Hashemi, S. (2011). Metamorphic malware detection using control flow graph mining. Int. J. Computers and Scientific Network Security, 11(12), 1-6.
Filiol, E. (2007). Formalisation and implementation aspects of k-ary (malicious) codes. Journal in Computer Virology, 3(2), 75-86.
Gaikwad, P., Motwani, D., & Shinde, V. (2015). Survey on malware detection techniques. International Journal of Modern Trends in Engineering and Research, 21(7), 1-25.
Gao, Y., Lu, Z., & Luo, Y. (2014, August). Survey on malware anti-analysis. In Fifth IEEE International Conference on Intelligent Control and Information Processing (pp. 270-275).
Garfinkel, S., Farrell, P., Roussev, V., & Dinolt, G. (2009, August). Bringing science to digital forensics with standardized forensic corpora. Digital Investigation 6 , S2-S11
.Giles, M. (2019, March 9). Triton is the world’s most murderous malware, and it’s spreading. Retrieved from https://www.technologyreview.com/2019/ 03/05/103328/cybersecurity-critical-infrastructure-triton-malware/.
Guo, F., Ferrie, P., & Chiueh, T. C. (2008, September). A study of the packer problem and its solutions. In International Workshop on Recent Advances in Intrusion Detection (pp. 98-115). Springer, Berlin, Heidelberg.
Idika, N., & Mathur, A. P. (2007). A survey of malware detection techniques. Purdue University, 48, 2007-2.
Ingram, M. (2000). “Love-But” virus damage estimated at $10 billion. Retrieved from https://www.wsws.org/en/articles/2000/05/bug-m10.html.
Keen, S. (n.d.). What Is a Variant in a Computer?. Retrieved from https://smallbusiness.chron.com/variant-computer-32671.html.
Kulchytskyy, 0. & Kukoba, A. (2019, May 23). Anti Debugging Protection Techniques With Examples. Retrieved from https://www.apriorit.com/dev-blog/367-anti-reverse-engineering-protection-techniques-to-use-before-releasing-software.
Kumar, P. (2017). An Introduction to N-grams: What Are They and Why Do We Need Them?. Retrieved from https://blog.xrds.acm.org/2017/10/introduction-n-grams-need.Lastline. (2015).
Labs Report at RSA: Evasive Malware’s Gone Mainstream. Retrieved from https://www.lastline.com/labsblog/labs-report-at-rsa-evasive-malwares-gone-mainstream/.
Lee, J., Jeong, K., & Lee, H. (2010, March). Detecting metamorphic malwares using code graphs. In Proceedings of the 2010 ACM symposium on applied computing (pp. 1970-1977).Liu, S. (2020, January 9). Security software – Statistics & Facts. Retrieved from https://www.statista.com/topics/2208/security-software/.
Moubarak, J., Chamoun, M., & Filiol, E. (2018, April). Developing a Κ-ary malware using blockchain. In NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium (pp. 1-4).
Newman, L. H. (2017, July 26). Hacker Lexicon: What is Steganography?. Retrieved from https://www.wired.com/story/steganography-hacker-lexicon/.
O'Kane, P., Sezer, S., & Carlin, D. (2018). Evolution of ransomware. IET Networks, 7(5), 321-327.
Palmer, D. (2019, July 26). MyDoom: The 15-year-old malware that’s still being used in phishing attacks in 2019. Retrieved from https://www.zdnet.com/article/mydoom-the-15-year-old-malware-thats-still-being-used-in-phishing-attacks-in-2019/.
Peterson, W. W., & Brown, D. T. (1961). Cyclic codes for error detection. Proceedings of the IRE, 49(1), 228-235.
Popov, I. V., Debray, S. K., & Andrews, G. R. (2007, August). Binary Obfuscation Using Signals. In USENIX Security Symposium (pp. 275-290).
Ramilli, M., & Bishop, M. (2010, October). Multi-stage delivery of malware. In 5th IEEE International Conference on Malicious and Unwanted Software (pp. 91-97).
Ramilli, M., Bishop, M., & Sun, S. (2011, October). Multiprocess malware. In 6th IEEE International Conference on Malicious and Unwanted Software (pp. 8-13).
Rowe, N. C. (2015, July). Finding contextual clues to malware using a large corpus. Proceedings of the ISCC-SFCS Third International Workshop on Security and Forensics in Communications Systems, Larnaca, Cyprus.
Rowe, N. C., & Rrushi, J. (2016). Introduction to Cyberdeception (pp. 100-102). Springer.
Saeed, I. A., Selamat, A., & Abuagoub, A. M. (2013). A survey on malware and malware detection systems. International Journal of Computer Applications, 67(16).
Schiffman, M. (2010). A Brief History of Malware Obfuscation: Part 1 of 2. Retrieved from https://blogs.cisco.com/security/a_brief_history_of_malware_obfuscation_ part_1_of_2.
Schmall, M. (2002). Heuristic Techniques in AV Solutions: An Overview. Retrieved from https://www.symantec.com/connect/articles/heuristic-techniques-av-solutions-overview.
SentinelOne. (2019, July 4). Hiding code inside images: How malware uses steganography. Retrieved from https://www.sentinelone.com/blog/hiding-code-inside-images-malware-steganography/.
Skylight. (2019, July 18). Cylance, I Kill You. Retrieved from https://skylightcyber.com/2019/07/18/cylance-i-kill-you/.
Solairaj, A., Prabanand, S. C., Mathalairaj, J., Prathap, C., & Vignesh, L. S. (2016, January). Keyloggers software detection techniques. In 10th IEEE International Conference on Intelligent Systems and Control (ISCO) (pp. 1-6).
Suarez-Tangil, G., Tapiador, J. E., & Peris-Lopez, P. (2014, December). Stegomalware: Playing hide and seek with malicious components in smartphone apps. In International Conference on Information Security and Cryptology (pp. 496-515). Springer, Cham.
Szor, P. (2005). The Art of Computer Virus Research and Defense. Addison-Wesley. Retrieved from
Webroot. (2019). 2019 Webroot Threat Report. Retrieved from https://www-cdn.webroot.com/9315/5113/6179/2019_Webroot_Threat_Report_US_Online.pdf.
Wong, W., & Stamp, M. (2006). Hunting for metamorphic engines. Journal in Computer Virology, 2(3), 211-229.
Yoon, S. (2019, April 9) Steganography in the Modern Attack Landscape. Retrieved from https://www.carbonblack.com/2019/04/09/steganography-in-the-modern-attack-landscape.
You, I., & Yim, K. (2010, November). Malware obfuscation techniques: A brief survey. In Proceedings of the 2010 IEEE International Conference on Broadband, Wireless Computing, Communication, and Applications (pp. 297-300).
Young, A., & Yung, M. (1996, May). Cryptovirology: Extortion-based security threats and countermeasures. In Proceedings 1996 IEEE Symposium on Security and Privacy (pp. 129-140).
Z3roTrust. (2018, August 30). Digital Steganography as an Advanced Malware Detection Evasion Technique. Retrieved from https://medium.com/@z3roTrust/digital-steganography-as-an-advanced-malware-detection-evasion-technique-40d4eeb19830.
Zer0Mem0ry. (2016). RunPE. Retrieved from https://github.com/Zer0Mem0ry/RunPE/blob/master/RunPE.cpp.
Zeter, K. (2014, November 3). An Unprecedented Look at Stuxnet, the World’s First Digital Weapon. Retrieved from https://www.wired.com/2014/11/countdown-to-zero-day-stuxnet/.