What is Confusion matrix?
The confusion matrix is one of the most popular and widely used performance measurement techniques for classification models. While it is super easy to understand, its terminology can be a bit confusing.
Therefore, keeping the above premise under consideration, this article aims to clear the “fog” around this amazing model evaluation system.
To get things started, I have included a working example of this matrix to make things simple and easy to understand. However, the application of this technique can be used in more than two classes.
Understanding Accuracy Of Confusion Matrix
Accuracy is defined as the number of true positives and negatives divided by the sum of the true positives and negatives and the sum of false positives and negatives.
no. of true positives + no. of true negatives
(no.of true positives + no. of true negatives + +no. of false positives + no.of false negatives)
Basic Terms to Keep in Mind
- True Positive (TP): These are the events that were correctly predicted by the model as “occurred = Yes.”
- True Negative (TN): These are the events that were correctly predicted by the model as “not occurred = No.”
- False Positive (FP): These are the events that were predicted as “occurred = Yes,” but in reality, it was “not occurred = No.”
- False Negative (FN): This is the opposite of FP, i.e. predicted as “not occurred = No,” but in reality, it was “occurred = Yes.”
What is Cyber Security?
Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks, here digital attacks can be stealing information or spying on another system, and much more. There are cybersecurity experts present whose work is to protect users or prevent the digital attack, here digital attack is also known as cybercrime.
Cyber Crimes are increasing day by day, here is some example of cybercrime that took place in 2021 and this will give you the idea that how important is cybersecurity :
- Australian broadcaster Channel Nine was hit by a cyberattack on 28th March 2021, which rendered the channel unable to air its Sunday news bulletin and several other shows.
- In March 2021, the London-based Harris Federation suffered a ransomware attack and was forced to “temporarily” disable the devices and email systems of all the 50 secondary and primary academies it manages. This resulted in over 37,000 students being unable to access their coursework and correspondence.
- A cybercriminal attempted to poison the water supply in Florida and managed by increasing the amount of sodium hydroxide to a potentially dangerous level.
- Acer suffered a ransomware attack and was asked to pay a ransom of $50 million, which made the record of the largest known ransom to date
Machine Learning benefits Cyber-security.
We all know that as a human being we are slow as compare to machine and for detecting cyber security threats we need a to deploy machine to discover the threats fast. For this we need to train the machine that how to point out the security breach etc. So we train the machine based on historic attacks so that it can detect the threats.
Steps to Improve the model.
Forensic analytics — the combination of advanced analytics, forensic accounting and investigative techniques — is making breakthroughs every day in identifying rare events of fraud, corruption and other schemes. To meet rising regulatory and customer demand for fraud mitigation, forensic analytics can reveal signals of emerging risks months — or sometimes even years — before they happen. Of course, predicting anomalous events can also create false positives.
In an effort to reduce false positives in fraud investigations, careful attention should be spent on steps including:
- Create an analytics repository — Consolidate and integrate data from disparate sources so analytical models can take an enterprise-wide approach to anomalous activity detection.
- Employ network mapping and analysis — Explore fraudsters’ networks, affinities and relationships, as well as others committing similar illicit acts.
- Leverage both supervised and unsupervised modeling — Supervised modeling employs algorithms to sift through data, applying historical fraud patterns and digital fingerprints of fraudsters to new data and scoring the level of risk involved in new events based on historical data. Unsupervised modeling uses algorithms to sift through data independent of patterns relating to known historical cases, looking for new events following unprecedented patterns.
- Use natural language processing (NLP) — Sift through unstructured data, including emails, messaging, audio and video files to unearth unexpected nuance to communication or connections otherwise unclear in structured, text-only data. For example, the ability of NLP to analyze word choice, tone and possible stress levels expressed in a voicemail can sometimes offer more insight during investigations than text on page alone could offer.
- Training and self-learning: one of the most important thing — Training the ML model to learn from a variety of data sources, such as risk issues the organization has confronted in the past. The corresponding models can adapt over time to future risks.
- Feedback and continuous improvement — Incorporate feedback from results of each investigation, from the continually growing body of forensic accounting and investigation knowledge and insight and from the input of stakeholders across the enterprise in an effort to continuously improve forensic analytics solution effectiveness.
All this will help in increasing the accuracy.