|
Abstract
|
Machine learning approaches use a variety of features such as opcodes, bytecodes, and system-calls to achieve accurate malware detection. Each of these feature sets provides a unique semantic view, while, considering the effect of all together is more reliable to detect attacks. A malware can disguise itself in some views, but disguising in all views will be much more difficult. By this motivation, multi-view learning (MVL) considers multiple views of a problem to improve the overall performance. In this paper, two approaches are proposed to incorporate some various feature sets and exploit complementary information to identify the category of a file. To alleviate the complexity of the problem sparse representation is employed to make the base classifier. Then, two ways are introduced to combine the effect of multi-view. At first, the consensus of multiple views are used to minimize the overall error of a classifier and as the second, some independent classifiers are learned and weighted voting is used for the final decision. To show the generalization power of the proposed method, several datasets are investigated. Experimental results indicate that in addition to simplicity and high performance, regarding the selected base classifier, the proposed methods are able to handle imbalanced datasets.
|