Research Statement

Introduction

The Internet is a place where people frequently conduct aspects of their daily life: banking, shopping, reading news and many forms of social interaction. Many businesses depend on Internet hosted services to operate, and along with the growth in its use, there has been a growth in Internet based security threats, forcing organizations to respond with advanced protection mechanisms.

One such mechanism is the deployment of Intrusion detection devices that are network devices that scan for patterns of: known viruses and malicious activities (Scarfone and Mell, 2007). An emerging problem is that encryption is being used to protect the confidentiality of communications; masking it from inspection and consequently malware is less likely to be detected thus increasing the likelihood of exposing computers to attacks (Koch et al., 2014).

It is therefore essential to develop the capabilities of intrusion detection systems, so they can operate effectively even when communications are encrypted. Techniques for analyising data sets without breaking encryption have already been shown to be effective (Wright et al., 2006), I believe traditional intrusion detection systems can be improved using these techniques to enhance their capability when detecting malicious data flows.

Research Positioning

Typically Intrusion detection systems are deployed to analyse network traffic at the edge of network segments, and utilise a combination of detection methodologies including Signature, Anomaly and Stateful Protocol Analysis (Scarfone and Mell, 2007). There have been advancements to improve the effectiveness and performance whilst coping with the increased traffic loads; Mukherjee and Sharma (2012) developed techniques to reliably reduce the data set needed for malicious content detection, Masud et al. (2011) took this further and developed machine-learning toolsets that correlate events to detect more complex behavior e.g. botnet[1] communications.

When encryption is used to protect network communications; malicious activity can be masked from intrusion detection systems (Liao et al., 2013) and a number of approaches have been investigated to mitigate the impact this has.

Goh (2010) proposed a system where user computers sent a copy of the network communications to an intrusion detection system, as well as the intended recipient. To maintain privacy the communication packets are cut up and fragments are securely transmitted to individual devices within a detection cluster. In this way no one device has all the information, but the cluster is able to correlate findings to detect nefarious activities.

Other techniques have focused on analysing the size, frequency and concurrent volume of encrypted packets. Koch and Rodosek (2010) proposed identifying the sessions associated with known users from their keystrokes as they authenticated to Internet systems, and Wright et al. (2006) presented a technique to identify networking protocols from within an encrypted traffic stream. Koch et al. (2014) took this further by incorporating additional protocol detection techniques developed by Bar-Yanai et al. (2010) to construct a framework of processing modules; the aim being to detect malware accurately without the need to build large knowledge bases.

Current research appears to focus on approaches to circumvent the encryption, by figuring out who a user is or what they are doing. Instead the aim should be to focus on the detection of malicious activities.

Research Overview

There are traits within data streams that provide useful information for detection purposes and do not impact the privacy of users. Foroushani et al. (2008) and Yamada et al. (2007) suggested that in communications between users and websites, request packets are typically smaller than the responses. Masud et al. (2011) showed that ‘machine-to-machine’ communications tend to have smaller packets, and that the latency of a response from a machine is typically lower than that from a human. These traits would not be masked by encryption and an anomalous occurrence could signify malicious activity.

The aim of this research is to investigate mechanisms for accurate detection of malicious events, without compromising the confidentiality of the communications. Data mining techniques combined with neural networks are already used as part of the intrusion detection toolset (Masud et al., 2011). Part of this research is to evaluate and understand how such systems are affected and sometimes limited by the use of encryption. It is also necessary to understand the effectiveness of the intrusion detection neural network system and how it can be improved utilising different approaches to detection.

Research has demonstrated a capability to detect potentially malicious activities from the relative size, shape and frequency of network packets (Wright et al., 2006). Piccitto et al. (2007) developed a technique to detect a specific phrase from an encrypted Skype conversation and Craddock et al. (2014) detected anomalous computer behavior by analysing encrypted network traffic. These are examples of information gathering from data structures rather than content; similar techniques and approaches could be used by intrusion detection systems to detect a compromised computer or malicious activities without the need for invasive encryption breaking.

To achieve the research aim I will evaluate existing machine learning toolsets used within intrusion detection systems, approaches used to encrypt communications and the associated impacts. I will investigate sources of data that could indicate anomalous behavior. Using the results from this work propose enhancements to the toolsets, so they are capable of detecting malicious activity with an acceptable level of accuracy and without breaking encryption.

Research Methodology

This research will deliver enhancements to the machine learning toolsets used by intrusion detection systems. The evaluation of existing systems through secondary research will provide a basis for the design of laboratory experiments to identify potential sources of anomalous behavior. To maintain ethical standards, training data will be taken from academic sources and datasets will be generated using scenarios e.g. testing the impact associated with the encryption, are required. To generate the data required for formulate enhancements to machine learning tool-sets, further laboratory and in situ experiments will be designed.

[1] The term botnet refers to a network of compromised computers (or drones).