Cyber Security Using Machine Learning: SNORT
In the present scenario, when everything is going online and the amount of data that is generated online is massive, a big concern arises: Is the data on the web safe? How secure is our cyberspace? What is actually cybersecurity? Can machine learning algorithms be used to provide security to the cyberspace? We will also see how SNORT is used to achieve the same.
Computer security or IT security is the protection of computer systems from theft or damage to their hardware, software or electronic data, as well as from disruption or misdirection of the services they provide. The field is of growing importance due to increasing reliance on computer systems, the Internet and wireless networks such as Bluetooth and Wi-Fi, and due to the growth of “smart” devices, including smartphones, televisions and the various tiny devices that constitute the Internet of Things. Due to its complexity, both in terms of politics and technology, it is also one of the major challenges of the contemporary world. (Source: Wikipedia)
The potential threats of the huge cyberspace are not hidden from anyone. Protecting our cyberspace is still a hot topic of research. In computers and computer networks an attack is an attempt to expose, alter, disable, destroy, steal or gain unauthorized access to or make unauthorized use of an Asset. There are two terms that are used very frequently while talking about cybersecurity: Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS).
IDS is the detection of any attack that has happened. IPS is the prevention of any such attack. It is easier to detect an attack than to completely prevent one. Machine learning can be used to increase the reliability of cybersecurity methods. In particular, we will talk about how machine learning can be used in Intrusion Detection Systems.
IDS can be classified into two main categories based on operational logic:
- Signature-based IDS
- Anomaly-based IDS
Signature-based IDS works with certain definitions of known vulnerabilities that are considered as attacks. Its operation logic is based on the basic classification problem. Incoming events are compared with signatures if a match found then an alert occurs; otherwise, it means no malicious event is found. It has low flexibility and it uses low-level machine learning structures. This system has very high accuracy for known attacks but fails in case of new attacks (zero-day attacks).
Anomaly-based IDS checks the behavior of the traffic and whenever there is an anomaly in the usual behavior, an alarm is raised. It has high flexibility and it uses high-level machine learning structures.
A lot of research has been going on in this area using both supervised and unsupervised algorithms. For the academic purpose, there are a lot of datasets available on the web for public use. The most popular is KDD99. The KDD data set is a well-known benchmark in the research of Intrusion Detection techniques. A lot of work is going on for the improvement of intrusion detection strategies while the research on the data used for training and testing the detection model is equally of prime concern because better data quality can improve offline intrusion detection.
The supervised approach usually deals with known attacks. It follows an algorithm that runs on well-defined attacks, that is, signature-based IDS. The dataset contains various definitions of malicious activities. The system works with labeled events occurred in the network. One of the several intrusion defined in the dataset is created by the network flow data. Artificial neural network when encounters any intrusion, it looks for the definition of that intrusion in the dataset. If any definition is found, an alarm is raised. However, if any definition is not found, the intrusion is ignored.
This approach has a very high accuracy in recognizing well known malicious activities. False alarm rates are very low in this case. Bayesian networks along with Support Vector Machine (SVM) are used to detect attacks in a supervised approach. Many artificially intelligent antivirus used in applications that require high security and very low alarm false rates, such as computers containing military information, computers operating missile etc. Many institutions in the USA have turned to the supervised approach for the security of documents and critical information.
However, this approach fails in case of 0-day attack.
Unsupervised approach for detection of cyber attack is used when the dataset doesn’t contain any definitions. The class of the attack, its features or anything about the attack is unknown. This approach assumes that huge change in the network flow happens only when any malicious agent has entered the system. The behavior of the network is monitored continuously. A threshold is set and whenever the anomaly crosses this threshold, alarm is raised.
In this approach the neural network functions on the network data rather than any class definitions. Thus it is very efficient in detecting 0 day attacks. However if the attacker produces the data intelligently, it can be surpassed. Moreover it creates a lot of false alarms. This is a major issue and research is going on to improve this algorithm.
Both techniques have advantages and disadvantages, to combine advantages in an efficient way, and eliminate disadvantages completely, some hybrid approaches are developed. A part of detection mechanism is working with the supervised algorithm, and another part is working with the unsupervised algorithm. In recent years most of the researches focus on hybrid detection approaches.
Snort is a free and open source network intrusion prevention system (NIPS) and network intrusion detection system (NIDS) and used all around the world. Snort’s open source network-based intrusion detection system (NIDS) has the ability to perform real-time traffic analysis and packet logging on Internet Protocol (IP) networks. Snort performs protocol analysis, content searching, and matching. These basic services have many purposes including application-aware triggered quality of service, to de-prioritize bulk traffic when latency-sensitive applications are in use.
Snort can be configured in three main modes: sniffer, packet logger, and network intrusion detection. In sniffer mode, the program will read network packets and display them on the console. In packet logger mode, the program will record packets to the disk. In the intrusion detection mode, the program will monitor network traffic and analyze it against a rule set defined by the user. The program will then perform a specific action based on what has been identified (Source Wikipedia).
Cyber attack detection is like a game between the attacker and the detection system. This is no ultimate winner in this game. Whenever an attack is detected, the attacker comes up with an efficient hacking algorithm that could surpass the detection. And whenever any attack surpasses the detection, new and efficient detection algorithms are developed. It is a never-ending cycle. Machine learning has improved the detection algorithms to a great extent. However, intelligent hackers are developing attacks that could surpass these by exploiting loopholes. Intense research is going on to remove these loopholes and come up with better algorithms.