Biblio
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.
This paper presents our results from identifying anddocumenting false positives generated by static code analysistools. By false positives, we mean a static code analysis toolgenerates a warning message, but the warning message isnot really an error. The goal of our study is to understandthe different kinds of false positives generated so we can (1)automatically determine if an error message is truly indeed a truepositive, and (2) reduce the number of false positives developersand testers must triage. We have used two open-source tools andone commercial tool in our study. The results of our study haveled to 14 core false positive patterns, some of which we haveconfirmed with static code analysis tool developers.
Advanced Persistent Threat (APT) attacks became a major network threat in recent years. Among APT attack techniques, sending a phishing email with malicious documents attached is considered one of the most effective ones. Although many users have the impression that documents are harmless, a malicious document may in fact contain shellcode to attack victims. To cope with the problem, we design and implement a malicious document detector called Forensor to differentiate malicious documents. Forensor integrates several open-source tools and methods. It first introspects file format to retrieve objects inside the documents, and then automatically decrypts simple encryption methods, e.g., XOR, rot and shift, commonly used in malware to discover potential shellcode. The emulator is used to verify the presence of shellcode. If shellcode is discovered, the file is considered malicious. The experiment used 9,000 benign files and more than 10,000 malware samples from a well-known sample sharing website. The result shows no false negative and only 2 false positives.