Kdd & Nsl+ dataset -Canadian Institute for Cybersecurity
About the Canadian Institute for Cybersecurity
The Canadian Institute for Cybersecurity (CIC) is a comprehensive multidisciplinary training, research and development, and entrepreneurial unit that draws on the expertise of researchers in the social sciences, business, computer science, engineering, law and science.
Based at the University of New Brunswick in Fredericton, the institution is the first of its kind to bring together researchers and practitioners from across the academic spectrum to share innovative ideas, create disruptive technology and carry out groundbreaking research into the most pressing cybersecurity challenges of our time. CUI is the sub department of CIC to guid new researcher and provide material
Dataset Information
In our recent dataset evaluation framework (Gharib et al., 2016), we have identified eleven criteria that are necessary for building a reliable benchmark dataset. None of the previous IDS datasets could cover all of the 11 criteria. In the following, we briefly outline these criteria:
Complete Network configuration: A complete network topology includes Modem, Firewall, Switches, Routers, and presence of a variety of operating systems such as Windows, Ubuntu and Mac OS X.
Complete Traffic: By having a user profiling agent and 12 different machines in Victim-Network and real attacks from the Attack-Network.
Labelled Dataset: Section 4 and Table 2 show the benign and attack labels for each day. Also, the details of the attack timing will be published on the dataset document.
Complete Interaction: As Figure 1 shows, we covered both within and between internal LAN by having two different networks and Internet communication as well.
Available Protocols: Provided the presence of all common available protocols, such as HTTP, HTTPS, FTP, SSH and email protocols.
Attack Diversity: Included the most common attacks based on the 2016 McAfee report, such as Web based, Brute force, DoS, DDoS, Infiltration, Heart-bleed, Bot and Scan covered in this dataset.
Heterogeneity: Captured the network traffic from the main Switch and memory dump and system calls from all victim machines, during the attacks execution.
Feature Set: Extracted more than 80 network flow features from the generated network traffic using CICFlowMeter and delivered the network flow dataset as a CSV file. See our PCAP analyzer and CSV generator.
MetaData: Completely explained the dataset which includes the time, attacks, flows and labels in the published paper.
All these criteri applied to generate + version of the data with the following files name
KDDTrain+.ARFF The full NSL-KDD train set with binary labels in ARFF format
KDDTrain+.TXT is The full NSL-KDD train set including attack-type labels and difficulty level in CSV format
The full NSL-KDD train set including attack-type labels and difficulty level in CSV format
KDDTrain+_20Percent.ARFF A 20% subset of the KDDTrain+.arff file
KDDTrain+_20Percent.TXT A 20% subset of the KDDTrain+.txt file
KDDTest+.ARFF The full NSL-KDD test set with binary labels in ARFF format
KDDTest+.TXT The full NSL-KDD test set including attack-type labels and difficulty level in CSV format
KDDTest-21.ARFF A subset of the KDDTest+.arff file which does not include records with difficulty level of
21 out of 21
KDDTest-21.TXT A subset of the KDDTest+.txt file which does not include records with difficulty level of 21
out of 21.
Download from here archive
Join the Canadian Institute for Cybersecurity
Membership to CIC is available to researchers and post-graduate students, industry and government professionals, public institutions and government departments.