Analyzing Attacks on Client-Side Honeypots from Representative Malicious Web Sites

 


Brian A. Foley
Department of Computer Science
Naval Postgraduate School
Monterey, CA, USA
brian.foley@nps.edu
Neil C. Rowe
Department of Computer Science
Naval Postgraduate School
Monterey, CA, USA
ncrowe@nps.edu
Thuy D. Nguyen
Department of Computer Science
Naval Postgraduate School
Monterey, CA, USA
tdnguyen@nps.edu



Abstract�Client-side cyberattacks are becoming more common relative to server-side cyberattacks. This work tested the ability of the honeyclient software Thug to detect malicious or compromised servers that secretly download malicious files to clients, and tested its ability to classify these exploits. We tested Thug�s analysis of delivered exploits in different configurations. Results on randomly generated Internet addresses found a low rate of maliciousness of 5.6%, and results on a blacklist of 83,667 suspicious Web sites found 163 unique malware files. Thug demonstrates the usefulness of client-side honeypots in analyzing harmful data presented by malicious Web sites.[1]

Keywords�honeypots, client-side, cyberattacks, signatures, malware

This paper appeared in the Proc. International Conference on Computational Science and Computational Intelligence, Research Track on Cyber Warfare, Cyber Defense, and Cyber Security, December 2022.

                                                                                                                                                   I.     Introduction

Client-side honeypots ([3], [6]) are tools for detecting and analyzing malicious Internet sites. They connect to a site and record all the files transmitted to it, then analyze them for malicious signatures. Usually, client honeypots are Web crawlers, visiting a list of sites to find and characterize them. Unlike traditional honeypots, client-side honeypots are active, initiating the connections and collecting the data. This work studied Thug, an open-source client-side honeypot from the Honeynet Project (www.honeynet.org). The Project is an international nonprofit security research organization investigating attacks and exploits and providing open-source tools to improve cybersecurity.

A.                   Client-side attacks through drive-by exploits

Client-side Web attacks (�drive-by exploits�) occur when Web users visit a webpage that delivers an HTML document containing malicious code [5]. Early work [7] found references to malicious Web sites in many otherwise legitimate Web pages and email messages despite blacklisting sites like SiteAdvisor. The malicious code can exploit vulnerabilities in the Web browser, browser plugins, or operating system to compromise the user�s Web browser. Then the malicious actors can download and execute additional malware causing further compromise of the user�s system. This process can occur without the user�s knowledge who simply visited a Web page.

A drive-by exploit is a four-stage process [15]. Attackers first load malicious code into the HTML documents of a website. Attackers can lure visitors by sending spam email with links to their servers, abuse search engines to report their pages through search engine optimization, use social media to publish their links, or modify legitimate Web servers. Users then visit malicious or compromised servers and unknowingly retrieve the attackers� malware. Usually, attackers target a specific browser or operating system vulnerability for exploitation which they detect by sending crafted packets. In the final stage, the vulnerable software on the client (victim) is exploited, often to gain control of the victim�s machine.

Client-side attacks can be detected in unrequested and unnecessary files downloaded by a Web browser [1]. Anti-malware scanning can be done on them, and suspicious behaviors like downloading non-picture files from unrelated sites can be observed. Client-side malicious activity can be more effectively detected if a client carefully acts like a normal browser using deception methods [10].

B.                   Previous Work

����� A honeypot client Monkey-Spider analyzed webpages through search engine seeding [4]. This was a low-interaction honeypot client that separates Web crawling and webpage analysis for malicious code using ClamAV. For starting sites for crawling, they used the Web Services API for Google, Yahoo, and MSN search and collected the first 1,000 results from five search keywords and URLs from commercial blacklists (lists of known malicious sites). These sites were their initial Web crawler visits (�seeds�). The researchers used 20,457 seeds downloaded from 20,005,756 URLs during the Web crawling. They found that 1.0% of the sites they visited showed malicious activity. [5] and [12] were similar projects.

Other previous work compared low-interaction and high-interaction honeypot clients [9]. It tested seven kinds of malware on 20,000 malicious sites. It obtained a 1.8% rate in finding malware with the low-interaction honeypot HoneyC, 2.3% with high-interaction Capture-HPC, and 4.6% when sending files to the SiteAdvisor malware-analysis site. SiteAdvisor appears to use additional information beyond the results of website visits to determine maliciousness, such as spam and phishing site reports. A 2007 report from SiteAdvisor identified 4.1 percent of websites as malicious [14]. This discrepancy with client-side honeypots could be due to Web crawlers focusing on extracting URLs over downloading their content, which meant fewer malicious files downloaded. Another reason could be duplicate visits by crawlers to popular websites like YouTube and Amazon that were unlikely to have malware.

Another project studied the effectiveness of client-side honeypots in identifying malicious webservers that used fingerprinting and bot-detection techniques to complicate analysis [8]. They used two honeypot clients and two kinds of analysis software. One client was the low-interaction client Thug, and the other was the high-interaction honeypot client Cuckoo Sandbox. The tools used were Lookyloo, which captures webpages and redirection data, and VirusTotal, a common cybersecurity tool for identifying malicious code. The project created a custom Web server with the Django Python Web framework using 21 cloaking methods to prevent analysis by clients. All four tools had difficulty pulling malicious content from the Web server using cloaking techniques. However, most tools did not support newer technologies, API links, and browser versions. These results suggest modern websites can effectively cloak their malicious-download techniques from honeypot clients and other analysis tools.

                                                                                                                                       II.    Experimental Setup

A.    Thug

Thug is a Python-based low-interaction honeypot client that does both static and dynamic analyses to inspect suspicious malware [2]. Thug uses the Google V8 JavaScript engine wrapped through STPyV8 to analyze malicious JavaScript code and uses Libemu wrapped through Pylibemu to detect and emulate shell codes. Thug currently emulates 42 different browser types and provides 90 vulnerability plugins for analysis. After analyzing the specified URLs, Thug stores its results in a NoSQL MongoDB database running on the host machine (Fig 1). We chose this client-side honeypot based on its capabilities, open-source license, and active maintenance and support. [13] analyzes the methods and speed of Thug but does not measure its accuracy nor test Thug in real-world environments.

Fig 1. Thug operation

B.    Supporting Tools Used

Thug uses MongoDB to store data, an open-source document-oriented NoSQL database. It records data with field-name and field-value pairs similarly to JSON objects, and a document can include other documents, arrays, or arrays of documents. Thug uses MongoDB to store data from its interactions with websites. It records URLs analyzed, analysis type, connections made, behaviors observed, shell codes identified, cookies downloaded, connection graphs, locations, and certificates. Thug stores collected file samples as GridFS chunks within MongoDB.

Scrapy is an open-source Web-crawling [11] and Web-scraping framework supporting data mining, monitoring, and automated testing. We chose Scrapy due to its speed and good suitability for our needs. We used Scrapy to crawl random IP addresses and record responses from Web servers for further analysis with Thug. Random IPv4 addresses were generated using the Python package Faker. The Web crawl flow chart is shown in Fig. 2Fig. 2. Commercial Web scrapers like Googlebot and Bingbot scrape information in addition to crawling IP addresses. However, we only needed the secondary data sent by Web pages, not the pages themselves, and Scrapy sufficed.

 

Fig. 2. Web-crawling process

We also used ClamAV (www.clamav.net), an open-source antivirus toolkit developed by Cisco Systems. ClamAV has a quick and lightweight interface from the command line. We used ClamAV to scan sample files downloaded from Thug�s analysis to determine if they were malicious.

C.    Functional Testing of Thug

We tested the functions of Thug using publicly available exploits and exploit tools. The Metasploit Framework (www.metasploit.com) is an open-source penetration-testing tool to find, exploit, and validate vulnerabilities. Metasploit provides modules for exploiting different operating systems, applications, and platforms. For this research, we used Metasploit�s browser-exploit modules to test Thug and assess its ability to identify browser and browser plugin exploits used for drive-by downloads. We also tested some exploits not in Metasploit by coding the vulnerability as an HTML Web page on the testing server. The Exploit Database collects public exploits and the corresponding vulnerable software for penetration testers and vulnerability researchers. The database allows searching for exploits by title, CVE number, type, platform, and other parameters. It also provided exploits for the functional testing of Thug. When an exploit module was unavailable in Metasploit, we searched the Exploit Database and tried to add the exploit�s source code as an HTML Web page on the malicious test server.

D.   Source Sites

Our first experiments used random IP addresses. Further experiments used a sample of source sites obtained from commercial threat intelligence vendors by our school. These sites had been �blacklisted� by various sources for sending data with known malicious signatures or otherwise showing suspicious behavior. However, they had not necessarily been observed to send drive-by downloads, and some may have been blacklisted for just happening to host malicious users. Nonetheless, they were a richer source of malicious client activity than the random IP addresses we tested first.

E.    Test Environment

We ran the Thug client on a DigitalOcean cloud platform outside our campus network. DigitalOcean offers cloud computing services and virtualized resources. They call their Linux virtualized platform a �droplet�. Our DigitalOcean machine ran the Oracle VirtualBox hypervisor to host a Linux virtual machine on which Thug and the supporting tools ran. The Thug virtual machine (VM) contained Thug, Scrapy, MongoDB, ClamAV, the malicious test server, and Metasploit, as shown in Fig. 3. We connected to our servers through SSH and established SSH tunnels and virtual network computing (VNC) services for remote configuration and control. DigitalOcean does not implement firewall rules or intrusion-detection rules that could stop or impede HTTP responses to Thug. We also did not change the firewall configuration on our virtual machines and used the default settings. We configured both machines with Ubuntu Server 20.04 LTS. The DigitalOcean Droplet machine had 16 GB of memory and 200GB of secondary storage.

Fig. 3. Test environment setup

F.    Experimentation Plan

We ran three experiments with Thug�s default configuration, emulating Windows XP running Internet Explorer 6.0 with Adobe Acrobat Reader 9.1.0, JavaPlugin 1.6.0.32, and Shockwave 10.0.64.0 plugins enabled. Experiment 1 was a functional test of Thug�s ability to identify and classify drive-by exploits. This analyzed a test Web server under our control. The server either ran the Metasploit modules by HTTP redirects or served HTML Web pages with malicious code from the Exploit Database. The Eicar anti-malware test file (www.eicar.org) served as the malicious payload for all the exploits. We selected the exploits based on the vulnerability identification modules in Thug�s source code and several other exploits without corresponding modules. We assessed Thug�s ability to identify which exploit was used in the drive-by download and successfully retrieve the anti-malware test file.

Our assessment criteria of Thug�s performance were based on the amount of information Thug provided for each exploit.

         A correct result meant that Thug correctly identified the exploit by name or provided an associated CVE number.

         A partially correct result meant Thug did not identify the vulnerability by name but noted evidence of malicious activity by identifying the suspicious behavior or retrieving the exploit�s malicious shell code. These clues were then used to identify the exploit based on its signatures.

         A non-functional result meant we could not configure the exploit to work correctly, either through misconfiguration of the Metasploit software, the HTML code of the exploit, or an incompatible environment.

         An incorrect result meant that Thug could not identify the exploit or provide evidence of malicious activity.

Experiments 2 and 3 were a four-step process using Thug to analyze IP addresses. The first step found IP addresses or URLs with running Web servers. Experiment 2 used addresses identified using the Scrapy Web crawler to scan random IP addresses and check if a Web server was running on that machine. Experiment 3 used addresses obtained from a commercial blacklist. Experiments 2 and 3 both then fed the received URLs and IP addresses to Thug for its analysis. Then the data stored on MongoDB was reviewed for malicious activities and Thug�s analysis. The final step extracted the collected sample files obtained in the analysis and scanned them for malware with ClamAV. The obtained malware was then categorized into different groups based on the analysis description provided by ClamAV.

                                                                                                                                                            III.   Results

Experiment 1 showed that Thug could identify our malicious Web server and its artifacts. Overall, Thug recognized 85 malicious exploits from a group of 99 known exploits (Table 1 and Table 2). Thug correctly identified 45 by name or provided a CVE number. It also identified another 40 as malicious from general behavior observations such as ActiveX control abuse, identifying malicious pixel IFrames, or extracting and logging the malicious shell code of the exploit. In 13 other cases, the exploit failed to deposit malicious files on our victim machine due to either misconfiguration of the exploit or an incompatible environment. In only one case Thug did not identify the malicious nature of the site at all. It appears that Thug has kept up to date with obfuscations and other deception associated with drive-by downloads. Experiment 2 directed Thug to test random IP addresses. As expected, most results were uninteresting since malicious sites are statistically rare. Minor anomalies of these sites were flagged on occasion. Overall, Experiment 2 crawled and analyzed 37,415 Web sites. Thug identified 2,054 Web servers with suspicious behavior, all of which was abuse of ActiveX control GET and POST methods. During the analysis, 146,768 file samples were downloaded, of which ClamAV identified 18 as infected with malware and another 230 files identified as potentially unwanted applications (PUA).

Table 1. Cyberattacks tested on our malicious Web server by Thug; �AA� indicates instances for which additional analysis was required to identify the exploit by analyzing signatures based on extracted shell code; �SC�, or malicious behavior; �MB�.

Exploit

AA?

Exploit

AA?

AIM goaway
CVE: 2004-0636

MB

AOL Radio AmpX
CVE: N/A

 

Adobe �Collab.getIcon()�
CVE: 2009-0927

MB

Adobe �Doc.media.newPlayer�
CVE: 2009-4324

MB

Adobe Flash Player �newfunction�
CVE: 2010-1297

MB

Adobe �util.printf()�
CVE: 2008-2992

MB

Creative Software AutoUpdate Engine
CVE: 2008-0955

SC

MS DirectShow �msvidctl.dll�
CVE: 2008-0015

SC

IBM Lotus Domino DWA Upload Module
CVE: 2007-4474

 

IBM Lotus �inotes6.dll�
CVE: 2007-4474

SC

EnjoySAP SAP GUI
CVE: 2008-4830

 

Facebook Photo Uploader
CVE: 2008-5711

 

Gateway WebLaunch
CVE: 2008-0220

SC

GOM Player
CVE: 2007-5779

 

ICQ Toolbar
CVE: 2008-7136

 

MS14-064
CVE: 2014-6342

MB

MSXML Memory Corruption
CVEL 2012-1889

SC

Macrovision Installshield
CVE: 2007-5660

 

Macrovision FlexNet
CVE: 2008-4586

 

MS IE XML
CVE: 2006-5745

 

NCTAudioFile2
CVE: 2007-0018

 

RealPlayer �ierpplug.dll�
CVE:2007-5601

SC

Apple QuickTime
CVE: 2007-6166

SC

Shockwave rcsL
CVE: 2010-3653

SC

MS Silverlight
CVE: 2013-3896

MB

Microsoft Access
CVE: 2008-2463

 

SonicWALL NetExtender
CVE: 2007-5603

 

MS OWC Spreadsheet
CVE: 2009-1534

SC

BaoFeng Storm
CVE: 2009-1612

 

Symantec AppStream
CVE: 2008-4388

 

Symantec BackupExec
CVE: 2007-6016

 

MS Visual Studio
CVE: 2008-3704

SC

MS Media Encoder
CVE: 2008-3008

 

MS Internet Explorer Unsafe Scripting
CVE: N/A

 

MS IE WebViewFolderIcon
CVE: 2006-3730

 

WinZip FileView
CVE: 2006-5198

 

Winamp Playlist UNC Path
CVE: 2006-0476

SC

HP LoadRunner
CVE: 2007-6530

 

Yahoo! Messenger
CVE: 2007-4515

 

Zenturi ProgramChecker
CVE: 2007-2987

 

Adobe CoolType
CVE: 2010-2883

MB

Adobe Flash Player
CVE: 2011-2110

MB

Table 2: Further cyberattacks tested.

Exploit

AA?

Exploit

AA?

Adobe Flash Player copyPixels
CVE: 2014-0556

MB

IBM Lotus Notes Client
CVE: 2012-2174

SC

Java 7 Applet
CVE: 2012-4681

MB

ADODB.Recordset
CVE: 2006-3354

 

AnswerWorks
CVE: 2007-6387

 

Baidu Search Bar
CVE: 2007-4105

SC

BitDefender Online Scanner
CVE: 2007-6189

 

ChinaGames �CGAgent.dll�
CVE: 2009-1800

 

GlobalLink 2.7.0.8
CVE: 2007-5722

SC

DivX Player 6.6.0
CVE: 2008-0090

SC

D-Link MPEG4 SHM Audio Control
CVE: 2008-4771

 

Xunlei Web Thunder
CVE: 2007-5064

SC

Lycos FileUploader
CVE: 2008-0443

SC

Ourgame �GLIEDown2.dll�
CVE: N/A

 

HP Compaq Notebooks
CVE: 2007-6333

 

Clever Internet
CVE: 2007-4067

SC

Java Deployment Toolkit
CVE: 2010-0886

 

jetAudio 7.x
CVE: 2007-4983

 

Move Networks
CVE: 2008-1044

 

MS Rich Textbox
CVE: 2008-0237

SC

MySpace Uploader
CVE: 2008-0659

 

Sejoong Namo
CVE: 2008-0634

 

NeoTracePro 3.25
CVE: 2006-6707

SC

Nessus Delete File
CVE: 2007-4031

 

Nessus Command Execution
CVE: 2007-4062

 

Office Viewer OCX
CVE: 2007-2588

SC

Xunlei XPPlayer
CVE: N/A

SC

Cisco Linksys PTZ Cam
CVE: 2012-0284

SC

Move Networks Quantum Streaming Player
CVE: 2008-1044

 

Qvod Player 2.1.5
CVE: 2008-4664

SC

Rediff Bol Downloader
CVE: 2006-6838

SC

Rising Scanner
CVE: N/A

SC

Sina DLoader Class
CVE: 2008-6442

 

StreamAudio Chaincast
CVE: 2008-0248

 

Toshiba Surveillance
CVE: 2008-0399

 

UUSee �Update�
CVE: 2008-7168

SC

VLC Remote Bad Pointer
CVE: 2007-6262

 

Firefox �WMP�
CVE: 2010-2745

SC

MS IE Remote Wscript
CVE: 2004-0549

 

Yahoo! JukeBox
CVE: 2008-0625

 

Yahoo! Messenger CYFT
CVE: 2007-5017

SC

Yahoo! Messenger �YVerinfo.dll�
CVE: 2007-4515

 

 

The PUAs collected consisted of trojans, adware, and trackers. One of the collected PUAs ClamAV identified was the �PUA.Html.Trojan.Agent-37074� PUA. A VirusTotal scan showed that 24 of 61 antivirus tools flagged this file as malicious. Microsoft and Trend Micro security blogs identified the PUA as Exploit:HTML/Phominer.A and Trojan.HTML-.IFRAME.FASGU respectively [16][17]. Their reports indicate this PUA is dropped onto victim computers by other malware or unknowingly downloaded when visiting malicious Web sites. This PUA can steal information from the victim�s computer and embed malicious Iframes to redirect users to other malicious sites.

Fig. 4 shows a typical snippet of malicious Javascript code that was collected by Thug and used to download one malware sample. In this snippet, function calls hide the download of the malware and run its startup executable after download completion.

Fig. 4. Malicious Javascript code snippet

Experiment 3 used Thug to analyze and collect data from a commercial blacklist of IP addresses and domain names. The results of this experiment were more interesting, showing fewer observed malicious behaviors in website interaction (e.g., content delivery by ActiveX) but more kinds of malware. We analyzed 83,667 Web servers, of which Thug identified 953 with malicious activity. As seen in Experiment 2, all malicious behaviors observed on the blacklist Web servers used ActiveX control abuse with GET and POST methods. During the analysis, 602,731 file samples were collected, of which ClamAV identified 2,043 as malware and an additional 869 as PUA. The lower number of observed malicious behavior may indicate that the blacklist only contains web servers that hold malicious payloads and not the exploits used to initiate the drive-by download. Table 3 shows the results of Experiments 2 and 3. None of the random Web servers scanned in Experiment 2 was in the blacklist for Experiment 3.

Table 3. Results from experiments 2 and 3. �%� represents the percent total of sites

 

Sites

Malicious Behavior

Malware

PUA

Blacklist

83,667

953 (1.14%)

2,043 (2.44%)

869 (1.03%)

Random Web

37,415

2,045 (5.47%)

18 (0.04%)

230 (0.61%)

                                                                                                                                                      IV.   Discussion

Although Thug performed very well in categorizing and identifying different drive-by exploits on our malware server in Experiment 1, we did not observe the same range of exploits used in Web servers obtained through our random IP address samples or the servers in the blacklist. ActiveX controls are a common method many websites use to display information or provide interactivity, but they can be used maliciously to collect information or install malware. Our results may indicate that drive-by downloads using methods besides ActiveX controls are rare.

While there was little variation in the type of drive-by exploits used to deliver malware to our test system, collected malware did vary considerably. Experiment 2 provided 13 unique malware files, while experiment 3 provided 163 unique malware files. We classified the malware into 12 broad categories based on the description of the malware provided by ClamAV, shown in Fig. 5. Malware listed in the �Other� categories contained a variety of malware targeted at specific applications, such as Microsoft Office document macros, HTML, and Java applications.

Fig. 5. Malware frequency by malware type and sample pool

In Experiment 2, 5.4% of random Web servers showed signs of malicious activity, which is similar to previous experiments [4][9][14]. However, not all of the 2,054 malicious interactions resulted in the download and execution of malware. This suggests many malicious activities find it more profitable to collect information on unsuspecting users rather than download malware to their machines. This suspicion is further amplified by the fact that the random Web servers delivered more PUAs than malware files, as PUAs often are spyware, adware, or trackers.

Experiment 3 showed interesting results about the amount of malicious behavior observed compared to the amount of malware retrieved. The significant difference between malicious behaviors observed and malware downloaded is likely due to the types of servers included in the blacklist, which typically host the malware payload instead of using malicious redirects or pulling contents from malicious servers. Legitimate Web servers may be compromised by malicious advertisements (malvertising) or attackers embedding malicious redirects. The goal of the blacklist is to prevent the malware from reaching the unsuspecting user without limiting access to legitimate websites. The comparatively lower number of PUAs may indicate that blacklisted Web servers have more ambitious malicious intentions than most Web servers.

Comparisons of the malware collected between Experiments 2 and 3 show interesting results. The only ransomware seen was from the random Web servers used in Experiment 2. Ransomware is particularly damaging to victims, and our random web server sample collected most of the PUAs with few instances of serious ransomware. Another interesting observation was the differences in the malware-targeted operating system between the two experiments. Experiment 2 showed malware almost entirely targeted Windows operating systems. However, most malware in the blacklist targeted Unix and Linux systems. Our test environment uses a Linux-based operating system to host Thug, but we configured Thug to emulate a Windows XP system. The extensive occurrence of Unix-targeted malware suggests that the blacklisted web servers may be using more sophisticated techniques to identify the host operating system of the victim machine, while the malicious web servers from the random sample do not. This observation suggests Thug�s deception capabilities could be improved to better fool servers.

                                                                                                                                                    V.    Conclusions

We demonstrated methods for finding and collecting malware for analysis using client-side honeypots such as Thug. We tested Thug�s functionality, and our experiments confirmed its usefulness in collecting empirical data in a real-world scenario. Its benefit can only be seen when examining the relatively rare malicious sites since it finds many uninteresting anomalies on randomly chosen sites. However, further analysis of our experiment data could yield interesting trends in the types of malware collected or the interactions between the client and Web server that led to the download of the malware. Hence, client-side honeypots definitely provide added value to standard anti-malware tools.

Acknowledgment

Opinions expressed are those of the authors and do not represent the U.S. Government.

References

[1]       Clementson, C. (2009). Client-side threats and a honeyclient-based defense mechanism, Honeyscout [Link�pings Universitet, Institutionen f�r systemteknik]. Retrieved June 27, 2022, from http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-20104

[2]       Dell�Aera, A. (2022). Thug Documentation (3.14) [Python]. https://thug-honeyclient.readthedocs.io/en/latest/genindex.html.

[3]       Grimes, R. A. (2017). Honeypots. In Hacking the Hacker (pp. 107�110). John Wiley & Sons, Inc. https://doi.org/10.1002/9781119396260.ch19

[4]       Ikinci, A., Holz, T., & Freiling, F. (2008). Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients. https://madoc.bib.uni-mannheim.de/27368/

[5]       Invernizzi, L., Comparetti, P. M., Benvenuti, S., Kruegel, C., Cova, M., & Vigna, G. (2012). EvilSeed: A Guided Approach to Finding Malicious Web Pages. 2012 IEEE Symposium on Security and Privacy, 428�442. https://doi.org/10.1109/SP.2012.33

[6]       Joshi, R. C., & Sardana, A. (2011). Honeypot: A new paradigm to information security.

[7]       Obied, A., and Alhajj, R. (2009).� Fraudulent and malicious sites on the Web. Artificial Intelligence, 30, 223-120.

[8]       Pinoy, J., Van Den Broek, F., & Jonker, H. (2021). On the awareness of and preparedness and defenses against cloaking malicious web content delivery. M. S. thesis, Open University of the Netherlands, August 2021. www.open.ou.nl/hjo/supervision/2021-jeroen-pinoy-msc-thesis.pdf.

[9]       Qassrawi, M. T., & Zhang, H. (2011). Detecting Malicious Web Servers with Honeyclients. Journal of Networks, 6(1), 145�152. https://doi.org/10.4304/jnw.6.1.145-152

[10]     Rowe, N. C., & Rrushi, J. (2016). Introduction to Cyberdeception. Springer International Publishing : Imprint : Springer. https://doi.org/10.1007/978-3-319-41187-3

[11]     Velkumar, K., & Thendral, P. (2020). Web crawlers and Web crawler algorithms: aperspective. International Journal of Engineering and Advanced Technology (IJEAT), 9(5), 203�205. https://doi.org/10.35940/ijeat.E9362.069520

[12]     Wang, Y.-M., Beck, D., Jiang, X., & Roussev, R. (2005). Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities. Proc. NDSS Symposium 2006. https://www.ndss-symposium.org/ndss2006/automated-web-patrol-strider-honeymonkeys-finding-web-sites-exploit-browser-vulnerabilities/ .

[13]     Zulkurnain, N., Rebitanim, A., & Malik, N. (2018). Analysis of THUG: A Low-Interaction Client Honeypot to Identify Malicious Websites and Malwares. 2018 7th International Conference on Computer and Communication Engineering (ICCCE), 135�140. https://ieeexplore.ieee.org/servlet/opac?punumber=8510540

[14]     Keats, S., Nunes, D., Greve, P., (2007). Mapping the Mal Web McAfee SiteAdvisor, 12.

[15]     Le, V. L., Welch, I., Gao, X., & Komisarczuk, P. (2013). Anatomy of drive-by download attack. 138, 49�58.

[16]     Exploit:HTML/Phominer. A threat description�Microsoft Security Intelligence. (2017). Retrieved October 16, 2022, from https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?Name=Exploit%3AHTML%2FPhominer.A

[17]     Fuentebella, C. Trojan.HTML.IFRAME.FASGU - Threat Encyclopedia. (2021). Retrieved October 16, 2022, from https://www.trendmicro.com/vinfo/us/threat-encyclopedia/malware/trojan.html.iframe.fasgu/


 



[1] This work was supported by the U.S. Department of Energy and the Defense Intelligence Agency.