Global Internet threats are undergoing a profound transformation from attacks designed solely to disable infrastructure to those that also target people and organizations. This alarming new class of attacks directly impacts the day to day lives of millions of people and endangers businesses and governments around the world. At the centre of many of these attacks is a large pool of compromised computers located in homes, schools, businesses, and governments around the world. Attackers use these zombies as anonymous proxies to hide their real identities and amplify their attacks. Bot software enables an operator to remotely control each system and group them together to form what is commonly referred to as a zombie army or botnet. A botnet is a network of compromised machines that can be remotely controlled by an attacker. In this we propose an approach using honeynet data collection mechanisms to detect IRC and HTTP based botnet. We have evaluated our approach using real world network traces.
Keywords: We would like to encourage you to list your keywords in this section.
‘Bot’ is a shortened derivative of ‘robot’, a program that operates as an agent that enables a user or another program to simulate a human activity. It is possible for an attacker to control a lot of bots over botnet using one command. Botnets(or, networks of zombies) are recognised as one of the most serious security threats today. Fig 1 depicts the communication flow in a botnet.
Our research makes several contributions. First, we propose behaviour based approach to identify both IRC and HTTP C & C in port independent manner by extracting commands sequences from network traffic. Second we develop a system, which is based upon our behaviour based algorithm. The rest of the paper is organised as follows. In section 2, we provide a background on botnet C & C and motivation of our botnet detection approach. In section 3, we describe the usefulness of honeypots in our detection approach. In section 4, we present the architecture of our system and describe in detail its detection algorithm. In section 5, we present experiments and results and conclude the results in section 6.
Background and Motivation Bot
A bot is a malware which installs itself on a weakly protected computer by exploiting the vulnerabilities available in the machine. By converting the victim to a zombie computer, a bot adds the machine to a network of zombies called botnet which is remotely controlled by a set of master named as botnet controller
A botnet is a network of compromised computers maintained and controlled by a set of bot masters. These masters utilize bots to increase and control the number of the zombies in the network. Bot masters control the botnet through a command-and-control (C&C) mechanism. Therefore, they are often called as C&C servers. C&C servers often liaise with other C&C servers to achieve greater redundancy [Bot].
In the last few years, the botnet phenomenon got the general attention of the security research community. One of the first systematic studies was published in March 2005 by the Honeynet Project that studied about 100 botnets during a period of four months . A more methodical approach was introduced by Freiling et al., who used the same amount of botnet data for their study . Cooke et al. outlined the origins and structure of botnets and present some results based on a distributed network sensor system and honeypots . They do not give detailed results that characterize the extent of the botnet problem. Compared to all these studies, our solution is automated analysis of binary samples in honeynet environment. We are proposing complete automate prototype to detect C & C server in IRC, HTTP botnet. We can observe trends and long-term effects of the botnet phenomenon like the average lifetime of a botnet not possible with previous studies. A transport layer-based botnet detection approach was introduced by Karasaridis et al. . They use passive analysis based on flow data to characterize botnets and were able to detect several hundred controllers over a period of seven months. However, such a flow-based approach cannot provide insight into the botnet and the control structure itself. In our study, we can also observe the commands issued by botherders, the malware binary executables used, and similar inside effects of a botnet. Canavan  and Barford and Yegneswaran  presented an alternative perspective on IRC-based botnets based on in-depth analysis of bot source code. We also analyzed the source code of several bot families such as SdBot and Agobot, which can be freely downloaded from the Internet, to get a better understanding of some of the effects we monitored during our observations. In our study, we focus only C & C server and command exchanged between bot and botnet.
Our Proposed Approach
We have developed a algorithm which is based on bots run-time network behaviour and corresponding command sequence used in bot and C & C server conversation. We propose an approach that uses network-based anomaly detection to identify C & C command sequences. Fig 1 shows our malware collection prototype. The goal of malware collection is to collect as many binaries as possible. However, developing a scalable and robust infrastructure to achieve this goal is a challenging problem in its own right, and has been the subject of numerous research initiatives (e.g., [10, 11]). In particular, any malware collection infrastructure must support a wide array of data collection endpoints and should be highly scalable. The goal of our collection prototype is to collect as many binaries as possible by changing the services and configurations of the honeypots. We have established the Distributed Honeynet Prototype using three different internet service providers. Before we can discover what the risks are in the network, we need to discover how attack code reacts with the system. To realise this goal, a collection system is proposed which collects malwares to be dynamically analysed. Also, this system provides protection against significant involvement in attacks after the bot has been run on the system. It uses firewall and intrusion prevention techniques, such as limiting or dropping packets leaving the protected network. Our proposed architecture systematically collects the malwares over internet.
Fig. 1. Malware Collection Framework
Architecture and Algorithm
In this section we discuss the components of the algorithm without any specific tools in mind. Any tool that can perform the tasks described here can be used as a part of the solution. Fig 2 illustrates the logical structure of the proposed solution. The input to our system is bot binaries which are collected via honeynet system, a malware collection platform. There are three main components: Honeynet Based Execution, Payload Parser and Correlation System Component. We have collected the malware by distributed deployment of malware collection framework, which are later passed to Symantec Anti-Virus engine to classify them as bot and non-bot samples. Then these bot binaries are automatically passed to honeynet based open analysis environment.
Honeynet Based Execution Environment
We have developed a honeynet based open analysis execution environment in which bot binaries are executed for 30 minutes times using different timestamps. We set up a Vmware environment on a server with Intel processor running a full patched instance of Window XP, assigned a static, public IP address and infected with one bot for a period of 30 minutes Basically our code reads the bot binaries from a file one by one and executes it on open analysis environment and results generated are sent to the central server as a file including complete payload.
Fig. 2. System Architecture
Our open analysis system provides connection to Internet. The Honeynet based execution environment allows us to inject a malicious bots sample into a system and connect back to its original destination. This enable us to isolate the bot from the network and monitor its traffic in a more controlled way instead of waiting to be infected and then monitoring the traffic passively. Then its traffic segregated based on application content for the observations of network behaviour in terms of source IP address, destination IP address, source ports, destination port, and its command sequences for each flow.
In this section, we discuss how we record the bot network traffic that our system requires for analyse it for the presence of response activity. Since there are no other applications that run and generate traffic, the bot accounts for all network traffic under its host VM’s IP address. Once the response activity is located, we can extract a snippet from the network traffic that precedes the start of the response and thus, likely contains the corresponding command. Moreover, we can collect behaviour profiles, which describe the properties of the bot response behaviour. Note that we have made the deliberate decision to observe the behaviour of the bots when they are connected to the actual botnet. This allows us to detect command and control traffic without any prior knowledge of the protocol and commands that are used between the bot and botmaster.
However, at the same time we do not wish the bots that we are analysing to engage in serious and destructive malicious activity such as denial of service attacks. Thus, we have firewall that rate limit all outbound network traffic. After every 30 minutes data capturing period, Virtual machine is recreated in a clean state, before the next sample of bot is executed.
After the execution of the bot binary, the complete payload has been extracted and sent to the central server so that we can parse it with payload parser to extract the commands token signatures with respect to IRC, HTTP botnet. To capture all network traffic generated by the virtual environment, we use a Honeywall. The Honeywall is able to capture all network packets that are sent and received by the image. These packets are merged into PCAP file and send to central server.
We identify that there is a need of stepwise reduction of the data set to the meaningful subset of flows. The selection of the cut-off for the quick filtering for data reduction requires both quantitative statistical information and human judgement. The first filter is to select TCP-based packets only. The second filter is to remove the packet containing SYN and RST flags Flows containing only TCP packets with SYN and RST flags indicate that communication was never established,, and so provide no information about botnet C & C flows . No application-level data was transferred by these flows. Unfortunately for today’s Internet, probes of the system vulnerabilities are commonplace. While SYN-RST exchanges indicate suspicious activity that may be worth investigations, they do not assist with characterising botnet C & C flows.
Command Token Based Payload Parser
We feed the composite payload corresponding to a bot binary to our Payload parser which extracts the activity response (e.g., scanning, spamming, binary update) and message response (e.g., IRC PRIVMSG) commands sequence from payload. Payload parser detects port-independent protocol matcher to find suspicious IRC and HTTP traffic. This port Independent property is important because many botnet C& C may not use the regular ports. IRC and HTTP connections are relatively simple to recognise. For example, an IRC session begins with connection registration (defined in RFC1459) that usually has three messages, i.e., PASS, NICK, and USER. We can easily recognise an IRC connection using light-weight payload inspection, e.g., only inspecting the first few bytes of the payload at the beginning of a connection. HTTP protocol is even easier to recognise because the first few bytes of a HTTP request have to be "GET", "POST", or "HEAD".
Correlation System Component
Our next stage, correlation, looks for relationships between two or more bots binaries that suggest that they are part of same botnet. The question about whether one bot is correlated to another only makes sense if the two are connected to same C & C server. There are several temporal correlation algorithm for this purpose but all are equally computational expensive. However we have decided to apply our algorithm that we were designing that described the flow into same cluster. We use payload commands signatures and network fingerprint of bot binaries flows. If they are connected to the same C & C server and getting the same type of commands sequence, then it is clustered into one group.
Experiment and Analysis Results
In our experiment set-up, VMware workstation is used to create the default inspection of Window XP. Capturing and analysis of the network traffic and system traces is observed in live execution environment. To capture all the data generated by the virtual environment, we use Honeywall . The Honeywall is able to capture all network packets that are sent and received by the image. These packers are merged into PCAP file and sent to central server after every 1 hour. Currently we have found it useful to separate the PCAP files into 1 hour segment. By segmenting the file, it allows us to allocate the suspicious data more easily. We feed the PCAP data to our payload parser that filter the unused data and extract the command sequences in bot and C & C conversation. Our algorithm is able to detect IRC and HTTP based C & C server. With the help of our malware collection frame work we have collected 650 unique malware samples during the period of Nov, 2009 to July, 2010. Form them 59 malware samples is classified as bot by AV engine. Most of the bots that we have actively examined use some type of systematic scan, presumable for propagation. Most of these ICMP ping scans were used. Approximately 37.5% of samples were doing ICMP ping scans on different subnet. Fig. 3 is the snapshots captured using wireshark.
Applying to IRC:
Most of the IRC communication is with specific IP addresses, one of the samples is downloading abc.exe from 22.214.171.124 using port number 5751. Fig. 4 is the snapshots showing the TCP follow stream which includes the USER, NICK, MODE, JOIN, USERHOST. It is also observed with the help of sebek traces, most of the samples has run Cmd.exe, ping.exe, svchost.exe, HelpSvc.exe, explorer.exe,cndrive32.exe,msvmiode.exe.
Most of the propagation scan activity performed by asc command. The command for the propagation scan used is .asc <port#><threads><delay><time><switches>. For example .asc exp_all 25 5 0 -b -r -e which corresponds to a randomised (-r switch), Class B (-b switch) subnet scan using 25 threads with a 5 seconds delay for an infinite amount of time. Rerely does a piece of malware designate a time for the scan to finish so the 0 is used to express an infinite amount of time.
Fig. 3. ICMP Ping scans
Applying to HTTP:
In case of HTTP botnet we have observed, the communication of bots with web based C & C server by identifying the GET, HEAD, POST parameters. In most of our results shows the HTTP based C & C communication and download some executable. Following snapshots shows the HTTP communications.
Fig. 4. IRC Bot Communication
Below are the snapshot captured using wireshark tool .
Below is the result of one of sample showing that it is downloading executable from 208.053.183.222 :
126.96.36.199.01044-208.053.183.222.00080: GET /0calc.exe HTTP/1.1
188.8.131.52.01043-208.053.183.124.00080: GET /mjsn.exe HTTP/1.1 208.053.183.222.00080-184.108.40.206.01044: HTTP/1.1 200 OK
Figure 5 shows the snapshot of secondary infection using HTTP communication.
As per as our Experimental Results and Analysis, we are concluded that most of the C&C SERVER of Type IRC are using ICMP SCAN, IRC TOKENS found in payload are PING,PONG,JOIN,USER,MODE,PRIVMSG,NICK. The attack specific commands found in payload are DDOS, VSCAN. And most of the C&C SERVER of Type HTTP is using ICMP SCAN, HTTP TOKENS found in payload are HTTP, GET, POST, Downloading some exe files like /rbf.exe, /0calc.exe and sending spam mails.
Fig. 5. HTTP snapshot
Botnets have become the most serious threats to the Internet security. Many cybercrimes are botnet related. Botnet detecion is a relatively new and a very challenging research area. In this paper, Our results shows that the botnet problem is of global scale. We presented a system and architecture, a network based botnet detection. We are moving towards the P2P botnet.