Information gain score computation for N-grams using multiprocessing model

Darshan, S.L.S.; Kumara, M.A.A.; Jaidhar, C.D.

Please use this identifier to cite or link to this item: https://idr.l2.nitk.ac.in/jspui/handle/123456789/8315

Title:	Information gain score computation for N-grams using multiprocessing model
Authors:	Darshan, S.L.S. Kumara, M.A.A. Jaidhar, C.D.
Issue Date:	2017
Citation:	ISEA Asia Security and Privacy Conference 2017, ISEASP 2017, 2017, Vol., , pp.-
Abstract:	Currently, the Internet faces serious threat from malwares, and its propagation may cause great havoc on computers and network security solutions. Several existing anti-malware defensive solutions detect known malware accurately. However, they fail to recognize unseen malware, since most of them rely on signature-based techniques, which are easily evadable using obfuscation or polymorphism technique. Therefore, there is immediate requirement of new techniques that can detect and classify the new malwares. In this context, heuristic analysis is found to be promising, since it is capable of detecting unknown malwares and new variants of current malwares. The N-Gram extraction technique is one such heuristic method commonly used in malware detection. Previous works have witnessed that shorter length N-Grams are easier to extract. In order to identify and remove noisy N-Grams, a popular Feature Selection Technique (FST), namely, Information Gain (IG), which computes score for each N-Gram (feature) in the dataset has been used in this work. N-Grams with the highest IG score are considered as best features, while the remaining N-Grams are neglected. The IG-FST (Information Gain-Feature Selection Technique) is computational resource demanding and takes time to generate IG scores for larger N-Gram datasets, if the processing is to be accomplished in the sequential mode. To address this issue, the present work presents a multiprocessing model that computes IG scores rapidly for larger N-Gram datasets. The proposed model has been designed, implemented, and compared with the sequential mode of IG score computation. The experimental results demonstrate that the proposed multiprocessing model performance is 80% faster than the sequential model of IG score computation. � 2017 IEEE.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/8315
Appears in Collections:	2. Conference Papers

Files in This Item:

There are no files associated with this item.

Show full item record