REPORT DOCUMENTATION PAGE

Form Approved OMB
No. 0704-0188

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503.

1. AGENCY USE ONLY (Leave blank)

2. REPORT DATE

September 2015

3. REPORT TYPE AND DATES COVERED

Master�s thesis

4. TITLE AND SUBTITLE

SIMILARITIES AND DIFFERENCES IN PATTERNS AND GEOLOCATION OF SSH ATTACK DATA

5. FUNDING NUMBERS

N/A

6. AUTHOR(S) Macy, Jeffry P.

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)

Naval Postgraduate School

Monterey, CA 93943-5000

8. PERFORMING ORGANIZATION REPORT NUMBER

N/A

9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES)

N/A

10. SPONSORING / MONITORING AGENCY REPORT NUMBER

N/A

11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. IRB Protocol number ____N/A____.

12a. DISTRIBUTION / AVAILABILITY STATEMENT

Insert distribution statement

12b. DISTRIBUTION CODE

13. ABSTRACT (maximum 200 words)

Cyber attacks are becoming more prevalent across all sectors of government, business, and academia. Academic networks can be more vulnerable to attack because of their lack of resources and funding. This thesis analyzed unsuccessful Secure Shell (SSH) login attempts with data extracted from the DenyHosts service on Naval Postgraduate School�s (NPS) network, and compared it to SSH logon data from a Kippo SSH honeypot independent from the NPS Network to determine patterns in activity associated with geolocation. Additionally, this thesis analyzed frequency of the originating IP address, tried to determine if proxies are being used and how regularly. We identified similar characteristics of attacking hosts for both networks, and noted a preponderance of use of vulnerable platforms and ports.

14. SUBJECT TERMS

SSH, Kippo, Denyhosts, honeypot

15. NUMBER OF PAGES

16. PRICE CODE

17. SECURITY CLASSIFICATION OF REPORT

Unclassified

18. SECURITY CLASSIFICATION OF THIS PAGE

Unclassified

19. SECURITY CLASSIFICATION OF ABSTRACT

Unclassified

20. LIMITATION OF ABSTRACT

NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)

Prescribed by ANSI Std. 239-18

THIS PAGE INTENTIONALLY LEFT BLANK

SIMILARITIES AND DIFFERENCES IN PATTERNS AND GEOLOCATION OF SSH ATTACK DATA

Jeffry P. Macy II

Lieutenant, United States Navy

B.A., Piedmont College, 2005

Submitted in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE IN CYBER SYSTEMS AND OPERATIONS

from the

NAVAL POSTGRADUATE SCHOOL

September 2015

Approved by: Neil C. Rowe

Thesis Advisor

J. D. Fulp

Second Reader

Cynthia Irvine

Chair, Department of Cyber Academic Group

THIS PAGE INTENTIONALLY LEFT BLANK

ABSTRACT

Cyber attacks are becoming more prevalent across all sectors of government, business, and academia. Academic networks can be more vulnerable to attack because of their lack of resources and funding. This thesis analyzed unsuccessful Secure Shell (SSH) login attempts with data extracted from the DenyHosts service on the Naval Postgraduate School�s (NPS) network, and compared it to SSH logon data from a Kippo SSH honeypot independent from the NPS Network to determine patterns in activity associated with geolocation. Additionally, this thesis analyzed the frequency of the originating IP address, then tried to determine if proxies were being used and how regularly. We identified similar characteristics of attacking hosts for both networks, and noted a preponderance of use of vulnerable platforms and ports.

Our methodology was unable to ascertain if of the attacks were automated, but we have high confidence that the remote sites were compromised because of their preponderant use of vulnerable software. Also we identified common use of ports 5060 and 8080 suggesting possible botnet activity associated to these sites.

THIS PAGE INTENTIONALLY LEFT BLANK

TABLE OF CONTENTS

I. INTRODUCTION........................................................................................ 1

A. Background............................................................................. 1

B. Purpose........................................................................................ 1

C. Benfits of Study.................................................................... 1

D. Scope and methodology................................................. 1

E. Organization of study..................................................... 2

II. SIMILAR WORK IN SSH HONEYPOT GEOLOCATION ANALYSIS.............................................................................................. 3

A. INTRODUCTION TO LITERATURE REVIEW..................... 3

B. WHAT IS A HONEYPOT?............................................................ 4

C. Honeypot Experiments..................................................... 4

D. Comparison of SSH Attacks Across Different University Networks.................................................... 5

E. Post Attack behavior....................................................... 7

F. Chapter summary................................................................. 9

III. TEST ENVIRONMENT, DATA ORIGINATION, AND TOOL DESCRIPTIONS.................................................................................. 11

A. description of networks............................................. 11

B. Description of systems.................................................. 12

1. Tools..................................................................................... 12

a. Kippo................................................................... 13

b. DenyHosts........................................................... 16

c. MaxMind............................................................. 17

d. NMAP................................................................. 17

e. Shodan................................................................ 17

f. IP2Location.net.................................................. 18

IV. FORMATTING AND ANALYSIS OF DATA.................................... 19

A. Introduction........................................................................ 19

B. Data COllection and organization..................... 19

1. DenyHosts Data.................................................................. 19

2. Kippo SSH Honeypot Data................................................ 20

3. Data Consolidation............................................................. 22

4. Filtering for Duplicate IP Addresses................................. 24

V. Data Comparison Results........................................................ 32

1. Results.................................................................................. 32

a. Geolocation Patterns........................................... 32

b. Hardware............................................................ 33

c. Operating Systems.............................................. 34

d. Common Ports.................................................... 36

e. SSH Version........................................................ 37

f. Anonymous Proxy.............................................. 38

g. Session Data........................................................ 38

h. Downloaded Files............................................... 39

2. Conclusion........................................................................... 39

VI. ConClusion........................................................................................ 40

List of References................................................................................ 41

initial distribution list.................................................................... 44

THIS PAGE INTENTIONALLY LEFT BLANK

LIST OF FIGURES

Figure 1. NPS network..................................................................................... 11

Figure 2. Honeypot network............................................................................ 12

Figure 3. Post compromised human activity.................................................... 13

Figure 4. Top 10 input overall.......................................................................... 13

Figure 5. Top 10 successful inputs................................................................... 14

Figure 6. Top 10 failed inputs.......................................................................... 14

Figure 7. Latest �passwd� commands entered by attackers............................ 14

Figure 8. Latest �wget� commands entered by attackers................................ 15

Figure 9. Latest scripts executed by attackers................................................. 15

Figure 10. Kippo TTY log................................................................................ 16

Figure 11. Total IP activity gathered from the honeypot................................. 16

Figure 12. DenyHosts daily logs...................................................................... 19

Figure 13. Raw data from DenyHosts............................................................. 20

Figure 14. Grep command using regular expressions..................................... 20

Figure 15. IPs_Only.txt output after grep command with regular expressions..................................................................................................... 20

Figure 16. IP activity gathered from the honeypot.......................................... 21

Figure 17. Example of Kippo Graph csv file................................................... 21

Figure 18. MaxMind file upload page.............................................................. 22

Figure 19. MaxMind csv file............................................................................ 23

Figure 20. DenyHosts and honeypot data....................................................... 24

Figure 21. COUNTF equation......................................................................... 25

Figure 22. COUNTIF equations results........................................................... 25

Figure 23. Filtering for IP address matches..................................................... 26

Figure 24. Ip2location.net demo tool............................................................... 27

Figure 25. Nmap output for single IP address................................................. 27

Figure 26. Sodan IP address search location information................................ 28

Figure 27. Shodan IP address search ports and services information.............. 29

Figure 28. 31 IP addresses with number of sessions........................................ 30

Figure 29. Final compilation of IP address data.............................................. 31

Figure 30. IP geolocation distribution.............................................................. 32

Figure 31. Device types.................................................................................... 33

Figure 32. Commonly used operating systems................................................. 35

Figure 33. Vulnerability search results............................................................. 36

Figure 34. Percentage of commonly used ports for all hosts........................... 36

Figure 35. SSH version distribution................................................................. 37

Figure 36. Session count.................................................................................. 38

THIS PAGE INTENTIONALLY LEFT BLANK

LIST OF TABLES

Table 1. Hardware Specifications.................................................................... 12

THIS PAGE INTENTIONALLY LEFT BLANK

LIST OF ACRONYMS AND ABBREVIATIONS

ACK Acknowledgement

ASN Autonomous System Number

CSV Comma Separated Value

CVE Critical Vulnerabilities and Exposures

DMZ Demilitarized Zone

GMT Greenwich Mean Time

IP Internet Protocol

ISP Internet Service Provider

NPS Naval Postgraduate School

SIP Session Initiation Protocol

SSH Secure Shell

SYN Synchronize

UDP User Datagram Protocol

VPN Virtual Private Network

THIS PAGE INTENTIONALLY LEFT BLANK

ACKNOWLEDGMENTS

I would like to thank my wife, Megan for her support and understanding through this entire process. I would also like to thank my two sons, Jack and Maddox for giving up a lot of their Daddy time so that I could complete my research.

THIS PAGE INTENTIONALLY LEFT BLANK

II. SIMILAR WORK IN SSH HONEYPOT GEOLOCATION ANALYSIS

A. INTRODUCTION TO LITERATURE REVIEW

According to RFC 4252, the Secure Shell protocol (SSH) supports secure remote login over an insecure network. The SSH protocol consists of three major components: the transport-layer protocol, the user-authentication protocol, and the connection protocol [1]. The transport-layer protocol provides server authentication, confidentiality, and integrity with perfect forward secrecy� [1]. �The user authentication protocol is used for authentication between the client and the server� and �the connection protocol multiplexes the encrypted tunnel into several logical channels� [1].

The authentication part of SSH can be implemented by three different methods: �public-key, password, and host-based client authentication� [2]. In public-key authentication, the user creates an asymmetric key pair on the client and then uploads the public key to the server. During logon, the client sends a signature created with the private key of the user to the server, then the server verifies the validity of the private key with the public-key part of the key pair. If the signature is validated, the user is granted access [2]. The second method is by using a password. The user would issue a command to the server of ssh user@x.x.x.x. Then server would respond asking for the password. The user enters the password and, if correct, is given access [2]. Host based authentication �works by having the client send a signature created with the private key of the client host, which the server checks with that host's public key� [2]. When the host�s identity has been determined, access is granted.

The authentication, confidentiality, and integrity established by the SSH protocol makes it the preferable way for users to safely interact with remote hosts. However, if not properly configured, SSH can become insecure, giving attackers access to systems otherwise thought to be secure. To study methods attackers are using to gain access to remote hosts via SSH, many security researchers have begun testing with honeypots.

Security researchers have published many papers on SSH honeypot analysis using various honeypots to analyze malicious activity. However, less research has been conducted analyzing attacks on different networks in order to determine if the attacks are discriminatory. This paper will use data gathered from the Kippo Honeypot and the DenyHosts program across two unrelated networks in an attempt to determine if the attackers are specifically targeting networks with certain affiliations.

B. WHAT IS A HONEYPOT?

According to the SANS institute, �Honey Pot Systems are decoy servers or systems setup to gather information regarding an attacker or intruder into your system� [3]. Honeypots can be installed anywhere on a network depending on the desired data to be gathered. Since honeypots aren't meant to be used, any connection to them is deemed, �at best an accidental error or, more likely, an attempt to attack the machine� [4].

Ideally, there are two main reasons to install a honeypot. The first is to �learn how intruders probe and attempt to gain access to your systems� [3]. Since honeypots typically log all interactions with the system, the system owner is able to understand the attack methodologies to better protect their system from future attacks. The second is to �Gather forensic information required to aid in the apprehension or prosecution of intruders� [3]. The research material listed in this chapter includes different attempts at the implementation of honeypots and their results.

C. Honeypot Experiments

A high interaction honeypot was used for one study [4]. The data was collected over a six month period and was based on �the lessons learned from the observation of the attackers when logged on a compromised machine.� The honeypot was a �standard Gnu/Linux installation, with kernel 2.6, with the usual binary tools. No additional software was installed except the http Apache server� [4]. A Linux distribution was installed as a virtual machine in VMWare 11, which was running the same version of Linux as the host.

Once installed, the researchers modified the tty_read, tty_write, and exec system call to enable the researchers �to intercept the activity on all the terminals of the system. The modification of the exec system call [enabled them] to record the system calls used by the intruder� [4]. Then, the �captured information [was] logged directly into a buffer of the kernel memory of the honeypot itself.� Once captured the information gathered was organized into an SQL database which was used to identify: � i) the IP address of the attacking machine, ii) the login and the password tested, iii) the date of the connection, iv) the terminal associated (tty) to each connection, and v) each command used by the attacker.�

D. Comparison of SSH Attacks Across Different University Networks

Based on data collected over a four month period from the SSH daemon, another paper analyzes SSH attacks against hosts in the Computer Science Department at the College of William and Mary [5]. An interesting outcome of his research �was the discovery that the behavior of malicious hosts, or bots, is surprisingly deterministic� [5]. His research was able to identify specific �time[s] that a bot sleeps between attacks, or the inter-arrival time of failed logins from a source�, and was concluded to be �nearly constant across all hosts in a suspected botnet� [5]. His research was also able to identify �if an attack source is a bot� based on �the number of parallel login attempts from a source and the average number of failed attempts per day� [5].

A third paper analyzed real-world SSH attack data obtained from Quarantainenet, �a Dutch company that develops network management and security tools and provides admission control and malware detection for their customers, including more than half of Dutch universities� [6]. The data was then input into GeoPlugin, which uses the MaxMind database for geolocation. Most all IP address geolocation is currently done at the country level. According to MaxMind, who test their databases on a periodic basis, �their databases were 99.8% accurate on a country level, 90% accurate on a state level in the US, and 81% accurate for cities in the US within a 50 kilometer radius� [7].

The authors attempted to use geolocation at the city level to answer their main research question, �Which cities in the world are responsible for most of the security incidents� [6]? The results of their tests listed the top 20 cities by number of attacks per city. The top three cities responsible for the most attacks during the time between October 29, 2010 and November 4, 2014, were Seoul, Taipei, and Beijing with 735, 618, and 563 attacks respectively.

The object of another research experiment was the brute-force attacks conducted against eight different Kojoney honeypots on six university campuses. These networks �were completely separated and had no explicit or logical links to interconnect them. [8] Additionally, each network used a different ISP. Each honeypot was installed on �low-end PCs with CentOS Linux operating system[s]� [8]. The Kojoney software on each PC was altered by the researchers to include the following functionality [8]:

� Add password logging to the authentication mechanism to log the passwords used in all login attempts.

� Add user-agent detection to find out what client software was used by attackers.

� Add support for XMPP [9] to create a warning system that could alert the system administrator about ongoing attack activities.

� Add support of P0f as an OS fingerprinting tool.

� Upgrade the IP geolocation function to provide accurate information about attackers� origin.

� Upgrade the shell-prompt mechanism to make the system more realistic.

In addition, scripts were written to extract attack data from the honeypot log files and insert them into a local database. For aggregation and analysis, the local databases were regularly synchronized with a central database server.

The honeypots were active for 47 days, August 20, 2011 through October 6, 2011 [8]. During that time, the eight honeypots received �nearly 98,180 connection requests which were originated from 1153 IP addresses and 79 countries� [8]. The test isolated three of the originating 1153 IP addresses, which were used against six of the honeypots. Also, on more than half of the honeypots, 50% of the IP addresses were involved in the attacks [8]. Out of all of the login attempts, 66.42% of them tried to use �root� as the username and 19% percent of the attempts used the username and password combo �root:root�.

The top five sources of these attacks were from the United States, China, Poland, Canada, and, Argentina with frequencies of 17.9%, 10%, 9.1%, 6.6%, and 6.1% respectively. The researchers also found �more than 82% of connections were established from a Linux system and only 3% was from [a] Windows machine� utilizing the most common user agent, SSH-2.0libssh-0.1, 85.3% of the time [8].

The researchers conclude the study with assertions that because Linux is such a widely used operating system, it has become a �bigger target to hackers in general� but �in terms of overall security, it is still far superior to windows� [8]. They go on to defend their opinion by explaining that �[t]he open source nature of Linux allows for more peer review of the code to find and fix the code before zero day hacks can be done� [8].

E. Post Attack behavior

Another paper attempted to analyze SSH attacks in a different way. Instead of exploring methods on how to keep attackers out of their network, they studied �post-compromise attack behavior� [10]. They set up four honeypots, all of which were running a slimmed-down version of Fedora Core 3 text mode environment updated as of October 10, 2006 [10]. �[A] modified OpenSSH server [was used] to collect attempted passwords, syslog-ng to remotely log important system events, including logins and password changes, Strace to record all system calls made by incoming SSH connections, and the Honeynet Project's Sebek tool [2] to secretly collect all keystrokes on incoming SSH connections.� [10]. The only other modification to the honeypot was code used to record all passwords tried during the attempted logins.

Before configuring the honeypots, the researcher ran some tests to determine which usernames were most common. The usernames admin, mysql, oracle, sarah, and louise were then configured on the honeypots, admin as the root user and the other four as non-privileged user accounts [10]. The tests �also revealed that the most commonly tried passwords were '(username)', '(username)123', 'password', and '123456', where (username) represents the username being tried� [10]. The researchers rotated these passwords among the honeypots and, after a compromise, the next password in the list was used [10]. Finally, in order to encourage attackers to enter the non-root accounts, two for the honeypots were setup with strong root passwords. The other two honeypots had root accounts that rotated through the passwords: �root�, �root123�, �password�, and �1234456� [10].

The data collection was facilitated by two dedicated servers, one to collect syslog data and the other to collect �Sebek data, Strace data, and hourly snapshots of the .bash_history and wtmp files� [10]. To ensure the honeypots were not used for malicious activity once they were compromised, the researchers used pre-built images which were reloaded following each compromising attack.

All four honeypots were run for a �24-day period from November 14 to December 8, 2006� [10]. During that period, �attackers from 229 unique IP addresses attempted to log in a total of 269,262 times (an average of 2,805 attempts per computer per day). Out of these, 824 logged in successfully, and 157 changed an account password� [10]. The researcher found that even though commonly used usernames and passwords were used on the honeypots, only about .31 percent of the attacks were successful [10]. This key observation led the researchers to believe that most, if not all, of the attacks were coming from a �low-skill[ed] attacker is using scripts to attack dozens of systems at once� [10].

To gather more detailed information about the attacks on the honeypots, the researchers developed a group of seven states that would be monitored for each honeypot [10]:

1. CheckSW � 'Check software configuration' allows the attacker to gain more information about the system's software or its users.

2. Install � 'Install a program.' This refers to new software being installed by an attacker.

3. Download � 'Download a file.' This refers to remote file downloads by the attacker.

4. Run � 'Run a rogue program.' This refers to the attacker running a program that was not originally part of the system.

5. Password � 'Change the account password.' This refers to changing the password of the compromised account.

6. CheckHW � 'Check the hardware configuration.' This refers to actions that allow the attacker to gain more information about the system's hardware (uptime, network, CPU speed/type).

7. ChangeConf � 'Change the system configuration.' This refers to attacker activity that permanently changes the state of the system.

The data collected about the state definitions indicated no difference between the attacks on root and user accounts. The data did, however, disclose the most popular course of action, which �was to check the software configuration, change the password, check the hardware and/or software configuration (again), download a file, install the downloaded program, and then run it� [10].

The researchers believe their results from the experiment contributed in two ways [10]. First, they concluded that administrators should not use any of the usernames and passwords tested in the experiment and that �[d]irect remote root logins should be disabled, only allowing select users to 'su' into the root account once logged on [10]. Second, administrators can use the findings to choose �security tools to combat the most common attacker actions� which include downloading/installing/running rogue software and checking the software configuration� [10].

F. Chapter summary

The research summarized above indicates a large interest in improving the security of hosts that use the SSH service. It appears that attacks don�t appear to have many patterns. All appear to be scripted in some form or another, but there does not appear to be any specific direction to any of these attacks. This paper tries to extend this research further to determine if the attacks identified on networks of different affiliations can reveal any further details. Only then can we better understand the motives of attackers.

THIS PAGE INTENTIONALLY LEFT BLANK

III. TEST ENVIRONMENT, DATA ORIGINATION, AND TOOL DESCRIPTIONS

In this chapter we describe the test environment, origin of the two data sets being analyzed, and the tools used for analysis. Two networks gathered data for our experiments. One was the NPS network that we used as our control. SSH login data from this network came from the DenyHosts server which collects login data from the servers running the SSH service. Figure 1 shows the layout of the NPS network.

A. description of networks

The NPS network has two outward-facing DNS servers located behind a firewall in the DMZ. Then another firewall separates the DMZ from the intranet.

Figure 1. NPS network.

The DenyHosts daemon runs on every server in the DMZ and intranet offering the SSH service. Each of those servers then communicates with the central DenyHosts server that maintains the SSH logs for the entire network. At regular intervals, the DenyHosts server updates the other servers with newly blocked IP addresses.

The gateway router was fed from an AT&T T-1 line running to the NPS campus but not connected through the firewall. Figure 2 below is a logical representation of the honeypot network.

Figure 2. Honeypot network.

As shown above, the honeypot was connected to a hub. Table 1 gives the hardware specification for the honeypot.

B. Description of systems

The honeypot host (Dell OptiPlex 745) used the Ubuntu 14.04 LTS operating system as a platform for our honeydrive3 virtual machine.

Table 1. Hardware Specifications.

Honeypot (OptiPlex 745)
Processor	Intel(R) Pentium(R) 3.4 GHz
Memory	4 GB
HDD	Seagate 160 GB
NIC	NetXtreme BMC5754 Gigabit Ethernet PCI Express

1. Tools

The tools we used for our experiment included the Kippo SSH honeypot, DenyHosts, and the MaxMind geolocation database. For our experiment, a honeydrive3 virtual machine was created in Virtualbox to use the Kippo SSH honeypot [11]. Honeydrive3 is a Linux honeypot distribution built as an open-box virtual appliance (OVA) with the Xubuntu Desktop 12.04.4 LTS installed.

a. Kippo

The Kippo SSH honeypot is a tool included in the honeydrive3 distribution. It is designed to mimic a real Debian 5.0 file system with the ability to add and remove files. Kippo also has fake file contents to allow an attacker to �cat� files like /etc/passwd [12]. Kippo saves all downloaded files for later inspection. The Kippo data acquired from each session is viewable on the Kippo Graph Web page. Kippo Graph is a script used to view all of the honeypot statistics in an organized fashion, providing the ability to monitor the current status of the honeypot remotely as well as download the SSH data.

Three of the seven Web pages in Kippo Graph were used for the analysis of our data, Kippo Input, Kippo Playlog, and Kippo IP. The Kippo Input page summarizes overall post-compromise activity, human activity inside the honeypot, top 10 inputs (overall), top 10 successful inputs, top 10 failed inputs, passwd commands (password-change attempts), wget commands, and executed scripts. Examples of each metric are displayed in Figures 3 through 7.

Figure 3. Post compromised human activity.

Figure 4. Top 10 input overall.

Figure 5. Top 10 successful inputs.

Figure 6. Top 10 failed inputs.

Figure 7. Latest �passwd� commands entered by attackers.

Figure 8. Latest �wget� commands entered by attackers.

Figure 9. Latest scripts executed by attackers.

When clicking on the play buttons shown on the Kippo Input page (Figures 7, 8, 9) the user is redirected to the Kippo Playlog page. The Playlog page allows for the replay of an attacker's actions once inside in the honeypot. An example of the playlog is shown below in Figure 10.

Figure 10. Kippo TTY log.

The Kippo-IP page displays all of the IP activity gathered from the honeypot. The last five sessions are shown in Figure 11.

Figure 11. Total IP activity gathered from the honeypot.

b. DenyHosts

DenyHosts is a Python-based script designed for Linux system administrators to defend against SSH dictionary and brute force server attacks [13]. It allows administrators to monitor all SSH failed and successful login attempts, the usernames and passwords used in each attempt, and the source and destination IP addresses in each attempted connection. It �can be run from the command line, cron or as a daemon� [3]. Based on the data collected from each login attempt, the administrator can elect to blacklist malicious host IP addresses so that any future traffic is immediately dropped at the firewall.

c. MaxMind

According to the MaxMind website, the �GeoIP2 Precision Insights service provides our most accurate information about the location of an IP address to the zip or postal code level, includes confidence factors for geolocation data, describes the ISP/Organization, and provides insight into the type of user behind the IP� [14]. Its key IP address categories used in our analysis were country, city, postal code, time zone, latitude/longitude, ISP/organization, domain, Autonomous System Number & organization, accuracy radius, confidence factors, and user type.

d. NMAP

�Network Mapper is a free and open source utility for network discovery and security auditing� [15]. It has the capability to identify key characteristics about hosts on a network such as the services offered, the operating systems used, and the firewall used. Nmap was used to gain operating-system and port information for each IP address analyzed.

e. Shodan

Shodan is a search engine designed for the Internet of Things (IoT). Much like other search engines, it crawls the internet, but instead of only indexing websites, it queries every IP address for host information including location, hardware type, operating-system type and version, associated domain, open ports, and versions of services being offered over those ports [16]. Our methodology used Shodan to validate information gathered from the other tools.

f. IP2Location.net

The IP2Location website aids users in finding geolocation information of an IP address, using type information without violating the Internet users privacy [17]. We used this tool for identifying anonymous proxies in our set of data.

IV. FORMATTING AND ANALYSIS OF DATA

A. Introduction

In this chapter we will explain our methodology for analyzing the data from DenyHosts and our honeypot.

B. Data COllection and organization

The data collected from the NPS network was extracted from the DenyHosts server as discussed in Chapter 3. The DenyHosts daemon creates a zipped log file for each day in operation as shown in Figure 12.

Figure 12. DenyHosts daily logs.

1. DenyHosts Data

To efficiently organize the data in the logs, the logs were unzipped using the gunzip command. Figure 13 represents the raw data provided by the DenyHosts program. The first column is the date in year, month, and day format (YYYY-MM-DD) followed by the time (GMT). The next column distinguishes whether the information is coming from the DenyHosts daemon locally on the server, labeled �DenyHosts�, or whether it is coming from another machine running DenyHosts, labeled as �sync�. Then the last column indicates whether the server received new hosts to add to the blocked list, whether it sent new hosts to add to the blocked lists on other machines running DenyHosts, and what specific IP addresses were added to the blocked list.

Figure 13. Raw data from DenyHosts

Next the grep command was used in conjunction with regular expressions to extract all of the IP addresses in each log file. Each line containing an IP address was then piped into a new file called IPs_Only.txt. Figure 14 shows the command used to grep through the log files and pipe them into a single file, and Figure 15 illustrates an example of the output. [18]

Figure 14. Grep command using regular expressions.

Figure 15. IPs_Only.txt output after grep command with regular expressions

2. Kippo SSH Honeypot Data

A second set of data was pulled from the Kippo Graph Web page as a comma separated value (csv) file as shown in Figure 16. The first column lists the IP address of the host attempting to access the honeypot. The second column shows how many attempted connections were made by that IP address. The third column lists the number of times the login attempts were successful and the last column is the date of the last attempted connection by the IP address.

Figure 16. IP activity gathered from the honeypot.

An example of this file showing the top 10 highest number of sessions per IP address is shown in Figure 17. The downloaded csv file only includes two columns, the IP address and session count.

Figure 17. Example of Kippo Graph csv file.

3. Data Consolidation

The honeypot IP addresses were then copied into the IPs_Only.txt file originally containing the DenyHosts IP addresses. Then the file containing all 8,161 IP addresses was uploaded to the MaxMind GeoIP2 Precision Insights Batch Lookup Service as shown in Figure 18.

Figure 18. MaxMind file upload page.

Once uploaded, the text file is analyzed with the MaxMind database and it returns a csv file. The csv file contains information for each IP address for continent_code, continent_name, country_iso_code, country_name, subdivision_iso_code, subdivision_name, city_name, metro_code, postal_code, latitude, longitude, registered_country_iso_code, represented_country_iso_code, represented_country_type, is_satellite_provider, autonomous_system_number, autonomous_system_organization, domain, ISP, organization, user_type, country_confidence, subdivision_confidence, city_confidence, postal_confidence, and accuracy_radius. With our methodology, we only used ip_address, country_name, subdivision_name (state), city_name, latitude, and longitude. An example of the csv file from MaxMind after the removal of the unwanted categories is shown in Figure 19.

Figure 19. MaxMind csv file.

The MaxMind csv file was then converted to xlsx format. Next the honeypot data was copied into a separate sheet called �Honeypot� leaving the DenyHosts data in its own separate sheet which was renamed �DenyHosts�. Figure 20 illustrates the changes made to the MaxMind csv file.

Figure 20. DenyHosts and honeypot data.

4. Filtering for Duplicate IP Addresses

To filter out IP addresses found in both sets of data, a COUNTIF equation was used shown in Figure 21. The COUNTIF equation, written in the Honeypot sheet under the column Match(1=yes, 0=no), compares a range of data, DenyHosts!$A$2:$A$1881, from the DenyHosts sheet and compares it to each cell in column �A� on the honeypot sheet. If the any of the IP addresses from the DenyHosts sheet match an of the IP addresses on the Honeypot sheet, a �1� is produced beside each IP address. If no match is found a �0� is produced. Figure 22 shows an example of the results from the COUNTIF equation.

Figure 21. COUNTF equation.

Figure 22. COUNTIF equations results.

To filter out all of the �0� entries, the filter function was used with �0� so that the column would only show the value �1�. Figure 23 shows the results of applying the filter.

Figure 23. Filtering for IP address matches.

A total of 31 IP addresses were in both sets of data. Each of these IP addresses was checked with the ip2location.com demo tool to identify if any were known anonymous proxies. Each IP address was also analyzed with the Shodan website and Nmap to identify if any open ports on each host. Figure 24 shows an example of the output from the ip2location.com demo tool with the red arrow pointing to the anonymous proxy results [17]. Figure 25 shows the Nmap output for a single IP address and Figure 26 shows the output for an IP address search in Shodan.

Figure 24. Ip2location.net demo tool.

Figure 25. Nmap output for single IP address.

The �v option stands for verbose and will display additional information on the terminal. The �O option initiates an operating system (OS) scan which checks the Nmap database for known OS signatures and tries to find the best match for the host using its signatures. Finally the �Pn option is used in case the host is blocking ping probes; Nmap sends SYN packets to the host over 1000 commonly used ports and waits for a SYN ACK response [15].

The results of an example scan are shown in Figure 25. The host had three ports open, 80, 5060, and 8080. There was no exact operating-system match but Nmap states there is a 91% probability of being Linux 2.6.32. Figures 26 and 27 show the results of a Shodan website query of the same IP address.

Figure 26. Sodan IP address search location information.

Figure 27. Shodan IP address search ports and services information.

The search revealed information about the country, organization, ISP, Autonomous Systems Number (ASN) ports open, and services offered, along with the last time this data was updated. This information was then compared with MaxMind and Nmap to ensure the most up to date data was used for our analysis.

Lastly the complete list of IP addresses encountered with the number of sessions was downloaded from the Kippo honeypot as shown in Figure 16. Then we filtered for all 31 IP addresses to acquire the session counts. Figure 28 shows the results after filtering.

Figure 28. 31 IP addresses with number of sessions.

Results listed in the chapter were then combined with the number of sessions per IP address to finalize our data for analysis. Figure 29 shows the final compilation of the data gathered on all 31 IP addresses.

Figure 29. Final compilation of IP address data.

Description: Screen Shot 2015-08-24 at 2.39.43 PM.png

V. Data Comparison Results

In this chapter we will discuss the results from our tests outlined in Chapter 4, and identify the similarities and differences in the patterns and geolocation of the data analyzed from NPS's network and the Kippo honeypot. We also tried to determine if the attackers used proxies to route their attacks, if the attacks were automated, and if the hosts with IP addresses associated with NPS were attacked more than the Kippo honeypot. Finally we analyzed files downloaded to our honeypot from IP addresses appearing in both sets of data.

1. Results

a. Geolocation Patterns

Our methodology identified 31 individual IP addresses in both data sets. The distribution of the IP addresses is shown in Figure 30.

Figure 30. IP geolocation distribution.

The top four IP address-originating countries based on percentage of IP addresses were China with 29%, India with 13%, and Brazil and Germany with 10%. Because of the high percentage from China and India, we looked deeper to identify if they had originated from the same cities and Internet Service Providers (ISPs). Two of the IP addresses from China originated in Shanghai, and one each from Beijing, Kunming, Lanzhou, and Xi�an. Upon deeper inspection, both of the IP addresses in Shanghai belonged to different ISPs, Shanghai University and Oriental Cable Network Co., Ltd. The four IP addresses from India included one each from Anchal, Bhagwat New Dehli, and Noida, but we could not identify their ISPs with any of our tools.

b. Hardware

The hardware analysis used information collected by MaxMind, Shodan, and Nmap. While 61% of the devices were unknown, we could identify key attributes of the devices. Figure 31 shown the breakdown of hardware devices found in our data.

Figure 31. Device types.

Two of the devices were Apple Airport Extremes, which suggests that the attempted logins came from a home user or at least a compromised home computer. One device was a combination of a Web and mail server based on its open ports and because it was running Apache and Sendmail. Another group of devices was five HP Procurve 7102dl secure routers. The popularity of this router in our data could mean that a vulnerability allows malicious users to access this router as a pivot point for malicious activity.

The other four devices were two Virtual Private Network (VPN) routers, a server running Apache and Bind, and a W422G wireless router. A VPN router is an excellent way to ensure the anonymity of an attacker attempting access to a remote system. These could be infected with malware creating another pivot for malicious activity.

c. Operating Systems

Next we identified the operating systems of all of the hosts. Linux accounted for 74% of the operating systems used. The others included AVtech and Apple embedded operating systems and three which were not identified. Figure 32 shows a breakdown of the operating systems.

Figure 32. Commonly used operating systems.

We believe the popularity of the Linux 2.6.x versions indicates multiple vulnerabilities in those versions since the 2.6.9 version was originally released on 19 October, 2004 [20]. Our opinion was supported by the National Vulnerability Database which yielded 159 Critical Vulnerabilities and Exposures (CVE) associated with Linux 2.6.x. Figure 33 shows a portion of the search results.

Figure 33. Vulnerability search results.

d. Common Ports

Further analysis looked for common port usage among all 31 IP addresses. Figure 34 shows the percentage of ports open across all 31 IP addresses.

Figure 34. Percentage of commonly used ports for all hosts.

Initially not all hosts had port 22 open, possibly indicating deliberate use of the port only at certain times. The majority of all hosts had ports 80, 5060, and 8080 open. Port 80 appears to be open on the devices for Web access, but ports 5060 and 8080 are usually unnecessary and seem suspect. An article written by Lenny Zeltser called �Targeting VoIP: Increase in SIP Connections on UDP port 5060�, attributes an increase in port 5060 activity to SIP brute-forcing activities by botnets [22]. Port 8080 is typically an alternate to port 80, and is used for proxies. It is possible, whether intentional or unintentional, that devices using it could be acting as proxies for malicious activity.

e. SSH Version

Figure 35 shows the distribution of SSH versions used on the 31 hosts. According to the OpenSSH website, all these possess vulnerabilities allowing attackers to gain access to these devices [21].

Figure 35. SSH version distribution.

f. Anonymous Proxy

Although we used the IP2Location website, we were unable to identify any IP addresses as being anonymous proxies.

g. Session Data

We analyzed the session information of the 31 IP addresses to identify patterns in activity. The DenyHosts daemon does not log any SSH login attempts after an IP address has been blocked, so we used the Kippo login data for the IP addresses; we believe that the session information for the NPS network is very similar to our Kippo results. Figure 36 shows the number of login attempts for each IP address over the seven-month period the data was collected. IP addresses with less than three sessions have been removed since they are likely not deliberate login attempts.

Figure 36. Session count.

Several IP addresses have numerous attempted logons; however there does not appear to be any brute forcing, which would be indicated by several hundreds if not thousands of sessions. This data suggests that attackers are selectively trying to gain access without raising any suspicion. For example, 54 login attempts over a span of months may not trigger any alerts on a system, but if conducted within a week would trigger further analysis and could lead to the blacklisting of the offending IP address.

h. Downloaded Files

None of the 31 IP addresses were successful in downloading any files to the Kippo honeypot.

2. Conclusion

Based on the results produced from our methodology, it is unclear if the attacks are automated, but we have high confidence that the remote sites were compromised because of their preponderant use of vulnerable software. Use of ports 5060 and 8080 suggests botnet activity associated to these sites. Unfortunately, the design of the DenyHosts daemon prevented us from determining if the NPS network was attacked more often than our honeypot because if its affiliation.

VI. ConClusion

When attempting to profile attack behavior, it is important to analyze the data gathered with multiple tools to ensure its accuracy. Our methodology was successful in identifying similarities in patterns and geolocation information in the data collected from both networks. It identified several contributing factors that may have caused the 31 hosts to be compromised and therefore used to conduct malicious login attempts against the NPS and honeypot networks.

The results of our methodology could be improved if both networks employed honeypots. A drawback in comparing honeypot and DenyHosts data was the latter�s DenyHosts inability to log IP addresses after they have been blocked. However, DenyHosts is an invaluable tool at thwarting SSH brute force attacks and should be used on any host offering the SSH service.

Future work related to this topic should include multiple data sets from multiple network affiliations to ensure the lowest occurrence of bias possible. Each network should try to use identically configured honeypots for the best data comparison. If the DenyHosts daemon is used it should be in conjunction with IPTables or similar software capable of logging failed login attempts of blocked IP addresses. Other possible research could involve tracking the use of user names and passwords across multiple networks.

List of References

[1] T. Ylonen and C. Lonvick, �The secure shell (SSH) protocol architecture,� 2006 [Online]. Available: https://tools.ietf.org/html/rfc4251. [Accessed: 31-Mar-2015]

[2] T. Ylonen and C. Lonvick, �The secure shell (SSH) authentication protocol,� 2006 [Online]. Available: https://tools.ietf.org/html/rfc4252. [Accessed: 31-Mar-2015]

[3] �SANS: Intrusion Detection FAQ: What is a Honeypot?� [Online]. Available: http://www.sans.org/security-resources/idfaq/honeypot3.php . [Accessed: 31-Mar-2015].

[4] E. Alata, V. Nicomette, �, M. Dacier, and M. Herrb, �Lessons Learned from the deployment of a high-interaction honeypot,� arXiv [cs.CR], arxiv.org, 06-Apr-2007 [Online]. Available: http://arxiv.org/abs/0704.0858. [Accessed: 10-Apr-2015]

[5] C. Kenna, �Analysis of and Response to SSH Brute Force Attacks,� 2010 [Online]. Available: http://www.cs.wm.edu/~kearns/710-papers.d/Kenna.pdf. [Accessed: 10-Apr-2015]

[6] M. G. T. van Polen, G. C. M. Moura, and A. Pras, �Finding and Analyzing Evil Cities on the Internet,� in Managing the Dynamics of Networks and Services, Springer Berlin Heidelberg, 2011, pp. 38�48 [Online]. Available: http://link.springer.com/chapter/10.1007/978-3-642-21484-4_4. [Accessed: 10-Apr-2015]

[7] �MaxMind - Frequently Asked Questions.� [Online]. Available: https://www.maxmind.com/en/faq#accurate . [Accessed: 10-Apr-2015].

[8] E. Kheirkhah, S. M. P. Amin, H. A. Sistani, and H. Acharya, �An Experimental Study of SSH Attacks by using Honeypot Decoys,� Indian J. Sci. Technol., vol. 6, no. 12, pp. 5567�5578, Dec. 2013 [Online]. Available: http://www.indjst.org/index.php/indjst/article/view/43618. [Accessed: 10-Apr-2015]

[9] �The XMPP Standards Foundation.� [Online]. Available: https://xmpp.org/. [Accessed: 10-Apr-2015].

[10] D. Ramsbrock, R. Berthier, and M. Cukier, �Profiling Attacker Behavior Following SSH Compromises,� in Dependable Systems and Networks, 2007. DSN �07. 37th Annual IEEE/IFIP International Conference on, 2007, pp. 119�124 [Online]. Available: http://dx.doi.org/10.1109/DSN.2007.76. [Accessed: 20-May-2015]

[11] �HoneyDrive 3 Royal Jelly edition,� BruteForce Lab�s Blog , 26-Jul-2014. [Online]. Available: https://bruteforce.gr/honeydrive-3-royal-jelly-edition.html . [Accessed: 23-Jun-2015]

[12] Desaster, �desaster/kippo,� GitHub . [Online]. Available: https://github.com/desaster/kippo . [Accessed: 01-Jul-2015]

[13] �Welcome to DenyHosts.� [Online]. Available: http://denyhosts.sourceforge.net/. [Accessed: 24-Jun-2015]

[14] �MaxMind - Frequently Asked Questions.� [Online]. Available: https://www.maxmind.com/en/faq#accurate . [Accessed: 10-Apr-2015]

[15] �Nmap: the Network Mapper - Free Security Scanner.� [Online]. Available: https://nmap.org/. [Accessed: 06-Aug-2015]

[16] �Shodan.� [Online]. Available: https://www.shodan.io/. [Accessed: 24-Aug-2015]

[17] �Free Product Demo | IP2Location.com.� [Online]. Available: https://www.ip2location.com/demo. [Accessed: 17-Jul-2015]

[18] �Regex - Extracting IP address from a line from ifconfig output with grep - Stack Overflow.� [Online]. Available: https://stackoverflow.com/questions/11482951/extracting-ip-address-from-a-line-from-ifconfig-output-with-grep. [Accessed: 16-Jul-2015]

[19] �HP 7000 dl Router Series - HP Networking, HP 7102 dl Router, J8752A, HP 7203 dl Router, J8753A.� [Online]. Available: http://pro-networking-h17007.external.hp.com/us/en/products/routers/HP_A7000dl_Router_Series/index.aspx . [Accessed: 18-Aug-2015]

[20] �LinuxVersions - Linux Kernel Newbies.� [Online]. Available: http://kernelnewbies.org/LinuxVersions . [Accessed: 20-Aug-2015]

[21] �OpenSSH Security.� [Online]. Available: http://www.openssh.com/security.html. [Accessed: 21-Aug-2015]

[22] S. I. S. Center, �InfoSec Handlers Diary Blog - Targeting VoIP: Increase in SIP Connections on UDP port 5060,� SANS ISC. [Online]. Available: https://isc.sans.edu/diary/Targeting+VoIP%3A+Increase+in+SIP+Connections+on+UDP+port+5060/9193. [Accessed: 20-Aug-2015]

THIS PAGE INTENTIONALLY LEFT BLANK

initial distribution list

1. Defense Technical Information Center

Ft. Belvoir, Virginia

2. Dudley Knox Library

Naval Postgraduate School

Monterey, California