Science of Security Paper - Developing Scientific Foundations for the Operational Cybersecurity Ecosystem
Hi folks - I wanted to share an advanced copy of my new Science of Security paper with the community. In many ways our research "connected the dots" between existing cybersecurity efforts across different government agencies that have been identified by the government as the way forward. From the Science of Security championed by National Security Agency and National Science Foundation, to the Cybersecurity Measurement and Management Architecture championed by the Department of Homeland Security and the Department of Defense, and the revolutionary intelligence methodologies (object-based production / activity-based intelligence) championed by the Intelligence Community and particularly the National Geospatial-Intelligence Agency.
While each of these cybersecurity and intelligence efforts can stand on its own and provide great benefit, bringing these efforts together demonstrates how each agency's investments can see a greater return on investment by working together to develop scientific foundations for the operational cybersecurity ecosystem. Just like we need infrastructure in areas like Meteorology to understand and predict the weather we need operational cybersecurity science infrastructure to understand and predict events in our cyber ecosystem of the future.
The paper can be downloaded from either of these links:
LinkedIn's Slideshare:
http://www.slideshare.net/shawnriley2/cscss-science-of-security-developing-scientific-foundations-for-the-operational-cybersecurity-ecosystem
or from the CSCSS web site:
http://cscss.org/wp-content/uploads/2015/08/CSCSS-Science-of-Security-Developing-Scientific-Foundations-for-the-Operational-Cybersecurity-Ecosystem.pdf
or from ResearchGate:
https://www.researchgate.net/publication/280949461_Science_of_Cybersecurity_-_Developing_Scientific_Foundations_for_the_Operational_Cybersecurity_Ecos...
The attached discusses how, when attackers, analysts and users are all considered, U.S. Military Joint Doctrine http://www.dtic.mil/doctrine/new_pubs/jointpub.htm provides a better approach than Semantic eScience to the Science of Security.
Attachment | Taxonomy | Kind | Size | |
---|---|---|---|---|
Semantic_eScience_observations_R._Zager.pdf | PDF document | 446.93 KB | DownloadPreview |
I thought this might be of interest.
Shawn
Conference: Proceedings of the 14th European Conference on Cyber Warfare and Security (ECCWS)
The Semantic Approach to Cyber Security Towards Ontology Based Body of Knowledge
Adiel Aviad, Krzysztof Wecel and Witold Abramowicz
Poznan University of Economics, Poznan, Poland
aaviad@iai.co.il
k.wecel@kie.ue.poznan.pl
w.abramowicz@kie.ue.poznan.pl
Abstract:
Cyber defence must cope with a wide variety of possible attacks, appearing each day at increasing pace. In addition, an organization should master defence technologies and prioritize what defence should be taken, given that resources are limited. Evaluation of possible attacks and risks is therefore crucial. Since the relevant knowledge is complex and rapidly changing, ontology may be useful in integrating and sharing the knowledge required for evaluation of cyber security and for prioritizing defences.
Keywords:
cyber security, semantic web technology, attacks, threats, ontology
The Semantic Approach to Cyber Security. Towards Ontology Based Body of Knowledge. Available from: https://www.researchgate.net/publication/280090856_The_Semantic_Approach_to_Cyber_Security._Towards_Ontology_Based_Body_of_Knowledge [accessed Aug 28, 2015].
The attached comment discusses Semantic eScience in the context of the Philosophy of Science and Joint Doctrine. In summary,
- The Philosophy of Science suggests that there are significant limitations to the predictive power of Semantic eScience;
- Semantic eScience could be more strongly aligned with Joint Doctrine.
The effectiveness of Semantic eScience relative to other methods is not discussed because the effectiveness of Semantic eScience as a cybersecurity tool can only be established by field studies.
Attachment | Taxonomy | Kind | Size | |
---|---|---|---|---|
Semantic eScience Joint Doctrine | PDF document | 534.62 KB | DownloadPreview |
I apologize that you are still having issues understanding some of the key points of the paper. It might help with your understanding if you brushed up on Web Science, Knowledge Engineering, and AI with a focus on semantic technology and knowledge representation languages such as OWL/RDF. I tried to write it at a level that would be understandable by undergraduates but I'm not from the academic world, my experience is all DoD cyber operations and my view is from an operational perspective. If you are in the DC metro area and have access to a room and white board I'd my happy to try to explain it further in person. Sorry the paper wasn't clearer and that it is difficult to understand.
Best,
Shawn
Hi-
I just received an email inidcating a free MOOC will be offered on Knowledge Engineering & Semantic Web Technologies at open.hpi.de. I thought this might be of interest and it's free for everyone who wants to takes it. I have no affiliation with the school but wanted to share the information about this educational opportunity.
Knowledge Engineering with Semantic Web Technologies - start on November 2nd
The knowledge contained in the World Wide Web is available in interlinked documents written in natural language. To make use of this knowledge, technologies such as natural language processing, information retrieval, data and knowledge mining must be applied. In this MOOC, you will learn the fundamentals of Semantic Web technologies and how they are applied for knowledge representation in the World Wide Web. You will learn how to represent knowledge with ontologies and how to access and benefit from semantic data on the Web. Furthermore, you will also learn how to make use of Linked Data and the Web of Data, currently the most popular applications based on Semantic Web technologies. Could we catch your interest? Here you can sign up for the course. https://open.hpi.de/courses/semanticweb2015
Monday, November 02, 2015 08:00 (UTC) to Monday, December 14, 2015 23:30 (UTC)
Language: English
The course is free and open to everyone!
Course information
The web has become an object of our daily life and the amount of information in the web is ever growing. Besides plain texts, especially multimedia information such as graphics, audio or video have become a predominant part of the web's information traffic. But, how can we find useful information within this huge information space? How can we make use of the knowledge contained in those web documents? Traditional search engines for example will reach the limits of their power, when it comes to understanding information content. The Semantic Web is an extension of the traditional web in the sense that information in the form of natural language text in the web will be complemented by its explicit semantics based on a formal knowledge representation. Thus, the meaning of information expressed in natural language can be accessed in an automated way and interpreted correctly, i.e. it can be 'understood' by machines.
Semantic Web technologies enable the explicit representation of knowledge and its further processing to deduce new knowledge from implicitly hidden knowledge. Thus, information access and information search will be more precise and more complete compared to today's traditional information retrieval technology. Previously heterogeneous data can be mapped and combined based on common knowledge representation and schemata easily extended in a dynamic way.
In this MOOC, you will learn the fundamentals of Semantic Web technologies and how they are applied for knowledge representation in the World Wide Web. You will learn how to represent knowledge with ontologies and how to access and benefit from semantic data on the Web. Furthermore, you will also learn how to make use of Linked Data and the Web of Data, currently the most popular applications based on Semantic Web technologies.
Requirements for this course:
a basic knowledge of the foundations of mathematical logics, i.e. propositional logics and first order logics a basic understanding of web technologies, such as URL, http, HTML, and XML-based technologies a basic knowledge of database technology, esp. relational databases and SQL query language
You'll find additional video lecturing material on www.tele-task.de.
More Information - https://open.hpi.de/courses/semanticweb2015
Happy learning!
Shawn
Since Activity-Based Intelligence isn't a well known area like Web Science or Semantic Technology is, I wanted to share a couple of links that might help expand your understanding of ABI and why Director Clapper gets so excited about it.
Activity-Based Intelligence: Revolutionizing Military Intelligence Analysis
NGA Activity Based Intelligence slide deck
http://usgif.org/system/uploads/3357/original/ABI_Slides_Approved_for_Public_Release_13-231_1_.pdf
Activity Based Intelligence: Understanding the Unknown
http://www.afio.com/publications/LONG_Tish_in_AFIO_INTEL_FALLWINTER2013_Vol20_No2.pdf
Best,
Shawn
Here is another link I thought might help advance understanding. It's a vendor perspective on ABI/OBP and looks at Graph Databases and In Memory Processing.
http://www.kmimediagroup.com/gif/424-articles-gif/organizing-the-knowns/6361-organizing-the-knowns
Shawn
Here is a link to DIA's Modernizing Defense Intelligence: Object Based Production and Activity Based Intelligence slide deck. I know it was in the paper references but thought I should include it in the discussion list here.
Robert,
I wasn't sure if you realized that one of the graphics you keep labeling as Semantic eScience is actually a graphic showing 3 different but related areas of Web Science. Data Science, Informatics, and Semantic eScience are each their own areas of Web Science with different focuses.
https://www.linkedin.com/pulse/data-science-informatics-semantic-escience-shawn-riley
Data Science - The focus is on creation, sorting, gathering of data. Heavy focus on statistical analysis, very little focus on context.
Informatics - The focus is on presentation and organization of the information. Heavy focus on context, little focus on experience.
Semantic eScience - The focus is integration and conversation of knowledge. Heavy focus on context and experience.
My observations on the web sciences has found Data Science to be the easiest of the 3 areas. Data Science requires the least amount of domain specific knowledge to get produce results. Informatics requires a greater amount of domain specific knowledge because the person has to further consider context, presentation, and organization of the data and information. Semantic eScience requires the deepest amount of domain knowledge because it moves past statistical analysis and mathematical algorithms to introduce logic based reasoning and inference capabilities. Semantic eScience provides a greater range of tools to test, measure, and validate the knowledge. This requires deep domain knowledge to understand the data and information and to produce reliable knowledge.
Just as we see entire university curriculums for Data Science and Informatics degrees today, Semantic eScience will be the next area of Web Science to become a degree field.
Anyway, I hope this helps to provide greater clarity to the graphic in question.
Shawn
Robert,
At the bottom of page 3 of your feedback you say:
Similarly, the Principles of Joint Intelligence, including the Attributes of Intelligence Excellence, should be used as the measure of effectiveness, replacing the concepts of Measurable Security and Human Factors in Science of Cybersecurity Developing Scientific Foundations for the Operational Cybersecurity Ecosystem.
I'm not sure you understand that the concept of Measurable Security in context of the CSCSS paper is specifically the U.S. Government's sponsored Making Security Measurable effort. Are you recommending we (Govt, Industry, and Academia) replace all the machine readable cybersecurity measurement and management formats and languages such as STIX, CYBOX, MAEC, CVE, CWE, CAPEC, OVAL, etc with the principles of Joint Intelligence? The languages, formats, and knowledge repositories developed over the past decade through U.S.Goverment sponsorship are helping the community significantly speed up response and understanding. All of the cybersecurity information being shared across government cyber centers is all in the Measurable Security formats and languages.
I think anyone who has ever spent time working Vulnerability management understands why the Measurable Security effort known as Common Vulnerability Enumeration (CVE) is important and the value it provides.
The Measurable Security efforts for Cyber Threat Intelligence include STIX, CYBOX, and TAXII. You can see more about them on US CERT's page. https://www.us-cert.gov/Information-Sharing-Specifications-Cybersecurity
Just to show community adoption of these Measurable Security languages and formats, HailATAXII.com serves up free TAXII feeds of STIX content to the community. Since inception, almost every month has been a record breaker. However, the month of August represented a new milestone.
HailATAXII.com August stats:
> 22,000 TAXII visits, of them > 2,000 of them unique TAXII clients, for a total of > 9,000,000 TAXII requests.
This shouldn't deter us from further improvement to the Measurable Security specs, but it also shows us that STIX/TAXII can work and is currently working as part of the Making Security Measurable effort.
Under the Enhanced Shared Situational Awareness (ESSA) Informaiton Sharing Architecture (ISA), all the data and information being shared and exchanged across the U.S.Government is entirely in the languages and formats developed under the Measurable Security effort. I'm pretty sure you'll find these work hand in hand with Joint Intelligence as a matter of process.
Best,
Shawn
Ultimately using a Semantic eScience technology stack configured for cybersecurity and able to read the various machines readable languages of the cybersecurity measurement and management architecture is to help the humans understand the data and information being shared faster and more accurately.
If we look at the Enhanced Shared Situational Awareness (ESSA) Information Sharing Architecture (ISA) that came out of the CNCI-5 we can see the various information sharing functions and information sharing exchanges. (Slides 7 & 8 http://csrc.nist.gov/groups/SMA/ispab/documents/minutes/2013-12/essa_isa_intro_requirements_overview.pdf )
Currently, that information is all stored and exchanged using a traditational Informatics approach where we throw humans at it to connect the dots and figure out what is going on. Humans have to have the knowledge to read and understand the data and how to piece it all together to understand the bigger picture of activity so they can then determine and recommend courses of action to leadership. Introducing Semantic eScience adds a layer of technology, designed to work with the existing informatics approach, that takes ALL the data and organizies and connects the dots so people can find stuff faster.
An example from our analyst training class showed a simple summery task of taking 50 malware analysis reports for the same APT campaign and summerizing the knowledge with a standardized set of questions to answer. They were also asked to produce a summery visualization showing the relationships between key observables in the malware across different stages of the attack lifecycle (Installation, Command & Control, and Objectives). The average time for a human analyst was 7hrs. The same set of questions and same malware malware analysis reports using the research prototype semantic eScience technology took on average 5 minutes.
The bottom line is that Semantic eScience of Security is just there to help the human analysts do the same tasks we ask them to do manually today. Sort of like giving an accountant accounting software to do the accounting instead of the ledgerbook and calculator.
We're working on getting the ontologies that we've developed for STIX/MAEC/CybOX/et al as part of this research made available in GitHub as a means to share with the community. In addition, we're also going to make available the ontology documentation generation tool (SpecGen) that we've been working on to update so that generates diagrams of the ontologies in UML notation. I'll follow up once we have everything in the public domain.
Shawn, could you please explain the divergence between your proposed Semantic eScience approach and the approach used on DARPA's Crash-Safe and High Assurance Cyber Military Systems programs towards establishing a science of security? Could you explain why you believe Semantic eScience will prevail in becoming the foundation for the science of security?
Feel free to respond in detail. I have a background in OWL/RDF and UML as well as constructive logic.
--
Rick
Hi Rick,
Great questions! I've not worked on either the DARPA CRASH or the DARPA High-Assurance Cyber Military Systems (HACMS) programs before but having a quick read of them allowed me to understand the problem these programs are trying to solve. Here is the summary information from their respective program pages.
Clean-slate design of Resilient, Adaptive, Secure Hosts (CRASH)
The Clean-Slate Design of Resilient, Adaptive, Secure Hosts (CRASH) program is focused on the design of new computer systems that are highly resistant to cyberattack, can adapt after a successful attack to continue rendering useful services, learn from previous attacks how to guard against and cope with future attacks, and can repair themselves after attacks have succeeded. This program addresses computer architectures from processors and instruction sets, includes operating systems and programming languages, and extends up to application level tools.
High-Assurance Cyber Military Systems (HACMS)
The goal of the HACMS program is to create technology for the construction of high-assurance cyber-physical systems, where high assurance is defined to mean functionally correct and satisfying appropriate safety and security properties. Achieving this goal requires a fundamentally different approach from what the software community has taken to date. Consequently, HACMS will adopt a clean-slate, formal methods-based approach to enable semi-automated code synthesis from executable, formal specifications. In addition to generating code, HACMS seeks a synthesizer capable of producing a machine-checkable proof that the generated code satisfies functional specifications as well as security and safety policies. A key technical challenge is the development of techniques to ensure that such proofs are composable, allowing the construction of high-assurance systems out of high-assurance components.
Both CRASH and HACMS, in my mind anyway, are focused on solving similar problems. Both are focused on the host or system and the security of that specific host or system. Both of these science of security research areas are ultimately aimed at improving the security, resiliency, and design of the technology (host or system) in the operational cybersecurity ecosystem.
The CRASH and HACMS research are focused on host & system security to advance the state of the art in host and system security design, engineering and architecture. CRASH and HACMS are aimed at making the technology in the operational cybersecurity ecosystem more resilient and secure than with traditional technology with all the science and evidence to support the new approach.
CRASH and HACMS are both great research areas where we need to make improvements.
My Semantic eScience research project is focused on a different problem entirely as it's not focused on advancing the security of the host or system but rather it's focused on analytic tradecraft transformation and making the humans analyzing the security of the operational cybersecurity ecosystem more effective and efficient by giving them a technology stack that can read, understand the meaning of, and organize the operational cybersecurity data coming from across cyber operations.
I spent the past couple decade in the operational cybersecurity ecosystem and the majority of the past decade my time has been spent doing on the job mentoring and training of analysts to understand the data coming from host and network sensors, data from security audits and evaluations, malware analysis and incident data, threat intelligence and indicators of compromise, etc. Tracking APT campaigns, counter cybercrime operations, incident response, and other activities are all data driven in modern cyber operations
In a lot of ways cyber defenders are a lot like scientists but without the formal scientific methods and procedures. Consider the following definition of "science" from our favorite personal assistant Google.
Science - the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.
"the world of science and technology"
Cyber defense in a lot of ways is the intellectual and practical activity encompassing the systematic study of the structure and behavior of the cybersecurity in the operational ecosystem through observation and experimentation.
When you consider the cybersecurity technology in the operational ecosystem much of what it is doing is checking the structure or observing the behavior of the wider ecosystem technology, the communications coming from the technology, and humans using it. When we deploy signatures (AV, IDS, etc), rules (firewall, SIEM, etc), or checklists for assessing structure such as using OVAL checklists it's a form of experimentation. Snort rules, Yara signatures, OVAL checklists etc are called detection test mechanisms under the Measurable Security languages and formats. We deploy these and then analyze the resulting observations to better understand the security of the ecosystem and what activity is happening in it.
Humans have to understand all the security data and what it means, they have to understand the data and how to assemble and organize the data together to reveal the bigger picture of activity based on their observations and experimentation of the cybersecurity data.
Google also defines 'Science' as "a systematically organized body of knowledge on a particular subject" such as the Science of Security or the Science of Criminology.
The Semantic eScience research resembles this definition in that the research aimed to identify big data technology that could systematically organize a body of knowledge on the cybersecurity of the organization's operational ecosystem.
Technology that would collect, analyze, and organize the operational cybersecurity data and information coming from sensors (AV, IDS, etc), humans (incident reports, threat intelligence, etc), machines (vulnerability testing, security configurations, etc), and knowledge repositories (CVE, CWE, CAPEC, etc).
Below is some of the typical cybersecurity data and information that needs to be collected, analyzed, and organized into a body of operational cybersecurity knowledge.
Configuration/Anomaly Reporting - Infrastructure Information - Risk Posture - Anomalies
Knowledge of Threat Actors - Threat Actor Infrastructure - Threat Actor Personas - Collected Threat Actor Indicators - Threat Actor Attribution - Trend Analysis - Victim Information
Incident Awareness - Incident Information - Incident Data - Infrastructure Impact and Effects - Investigations/cases - Alerting Indicators - Victim Information
Indications and Warnings - Events and Alerts - Tipping and Cueing - Warnings - Impact assessments - Potential Indicators
Vulnerability Knowledge - Vulnerabilities - Exploits - Potential Victim Information
Mitigation Strategies - Coordinated Action Plans - Courses of Action - Understanding of Achievable Mitigation Effects
Mitigation Actions and Responses - Computer Network Operations (CNO) Awareness - Action Tasking and Status - Effectiveness Reporting - After Action Reporting and Lessons Learned
The Semantic eScience research then looked to identify how the operational cybersecurity data and information is exchanged today in the operational cybersecurity ecosystem and identified the DHS sponsored Making Security Measurable effort which includes machine readable formats and languages to represent and exchange the information outlined above. CVE, CWE, CAPEC, OVAL, MAEC, CYBOX, STIX, TAXII, etc. We also looked at existing cybersecurity scoring systems, methodologies, vocabularies, and frameworks designed to work with these Measurable Security languages and formats such as CVSS, CWSS, CWRAF, TARA, and Cyber Effects. You can read about most of these here efforts here: http://msm.mitre.org/directory/categories/ (all items mentioned are also reference in the research paper).
The Semantic eScience research presented a Semantic eScience technology stack to collect all that cybersecurity data and information in order to apply object-based production. Object-based production allowed the Semantic eScience technology stack to systematically organize a body of cybersecurity knowledge for the operational ecosystem so it could better enable the human cyber defenders to be more efficient and effective in the intellectual and practical activity encompassing the systematic study of the structure and behavior of the cybersecurity in the operational ecosystem.
Where CRASH and HACMS are primarily focused on improving the security of the technology, Semantic eScience is primarily focused on improving the analytic and analysis tradecraft of the cyber defenders.
Happy to discuss this in more detail or in person if you are in the DC metro area. I've had the past 4-5 years to wrap my brain around this research and know it's not the easiest thing to comprehend. Keep asking questions and I'll do my best to answer.
Shawn
Just to follow up on this with a binning example. If DARPA CRASH and DARPA HACMS were being presented at the 2015 NSA Information Assurance Symposium they would of went into Track 3, Defense at Cyber Speed since this research was focused on advancing the technology. If Semantic eScience was being presented at the symposium it would of went into Track 2, Analytic Tradecraft and Mitigations since this research is focused on advancing analytic tradecraft.
Track 1: Cyber Security Solutions
This track will explore technology trends and use of commercial solutions to deliver innovative, flexible architectures while strengthening critical partnerships. Advancing the state of cyber security solutions to meet demand hinges on the rapid production of scalable, robust, and secure enterprise solutions. We must implement emerging technology and leverage available commercial products to mobilize our network infrastructures and develop effective tools that meet our complex cybersecurity challenges.
Track 2: Analytic Tradecraft and Mitigations
This track will explore advances in analytics, big data, cloud security, mitigations and related topics. Evolving tactics, techniques, and procedures advance cyber threats within our mission-critical networks. Advancing our analytic tradecraft and producing mitigations is essential, as is understanding the environment.
Track 3: Defense at Cyber Speed
This track will explore the progress in technology and techniques used to automate network defenses and establish trusted computing environments. As the adversary becomes more agile and develops increasingly sophisticated toolsets, an active cyber defense strategy to intercept, evaluate, and mitigate attacks in near-real-time or at mission speed is an imperative. The ability to continuously monitor the health of the network and respond quickly to incidents as they occur is essential.
Track 4: Building the Nation's Capacity
This track will explore the framework in which leadership shapes forward leaning cross-community strategies, operational decision-makers franchise processes, and implementers apply essential best practices, evolving today's and posturing tomorrow's missions that prepare for and respond to the cyber threat. The rapid and ever changing cybersecurity challenges call for a technically diverse foundation of people, processes, and policies. We must maintain the strategic advantage of our intellectual capital by actively developing and recruiting our Nation's future Information Assurance leaders and innovators.
Shawn, I understand that you differentiate the purpose of your Semantic eScience research project from the intended outcomes of DARPA's CRASH SAFE and HACMS programs. While the placement of subjects in tracks lends little insight into technical understanding, the evidence shows that at least one software assurance session was planned for Track 2. See June 30, Track 2 BUILDING ASSURED SOFTWARE Room 203A/B Briefer: R. Kris Britton.
Rick,
I believe the Science of Security discipline supports research where we look to advance the design, engineering, and architecture of the security in the technology with science. I also think the discipline supports the cyber defense scientists in the field (operations) studying cybersecurity in the operational ecosystem.
For me personally, I observed a growing, critical problem in that the analytic tradecraft knowledge of the individual and the overall cyber defense tradecraft of different industries varies widely. Consider the recent hot topic of threat intelligence. There are whole industries just starting to understand the important role threat intelligence plays in cyber defense but other industry segments have understood and practiced this for 15 years or more.
Most university cybersecurity / information assurance degree programs do not yet have a strong concentration on analytic tradecraft for cybersecurity data sets despite this being a well understood aspect of modern cyber defense. I think recent Intelligence and National Security Alliance (INSA) Cyber Intelligence Task Force captured some great points in their recent paper "Cyber Intelligence: Preparing Today's Talent for Tomorrow's Threats. I highly recommend giving it a read.
http://www.insaonline.org/i/d/a/b/CyberIntel_PrepTalent.aspx
Given the widely inconsistent analytic tradecraft in the wild and varied analytic tradecraft training I wanted to self-fund research to find a solution that would help increase both individual and organizational analytic tradecraft for cyber defense while enabling fewer defenders to do more to address the critical shortage of qualified cyber defenders available.
I think the science of security discipline is strong enough with enough momentum to support both technology & people focused research.
Best,
Shawn
Shawn, I am getting a 404 when dereferencing the following:
http://cscss.org/?dt_portfolio=cyber-science-white-paper
Once I read your paper it may be useful to talk. I am in DC twice a month, or thereabouts.
Recall that I have a background in OWL/RDF and UML as well as contructive logic, so there's no heavy lifting involved for me. Maybe this http://videolectures.net/iswc09_hayes_blogic/ would be a good starting point for a conversation.
--
Rick
--
Rick
Happy to discuss the paper more. I do apologize for the link issue. Please find testing and working links to download / read the research paper.
LinkedIn's Slideshare:
http://www.slideshare.net/shawnriley2/cscss-science-of-security-developing-scientific-foundations-for-the-operational-cybersecurity-ecosystem
or from the CSCSS web site:
http://cscss.org/wp-content/uploads/2015/08/CSCSS-Science-of-Security-Developing-Scientific-Foundations-for-the-Operational-Cybersecurity-Ecosystem.pdf
or from ResearchGate:
https://www.researchgate.net/publication/280949461_Science_of_Cybersecurity_-_Developing_Scientific_Foundations_for_the_Operational_Cybersecurity_Ecos...
Shawn
I thought this presentation slide deck might be of interest to those looking for more information on how semantic technology such as that found in the Semantic eScience of Security stack might help with things like Logical Theory (strong ontology): Axioms, Inference Rules, Theorems, Theory.
http://stids.c4i.gmu.edu/presentations/STIDS2013_Tutorial2_p2a_Obrst.pdf
I think everyone realizes the potential for developing really cool mathematical formulas that can take advantage of the logic, reasoning, inferrence, and other mathematical aspects to reveal new insights and produce new knowledge from the observed structures and activities in the threat and risk data.
Just to help "connect the dots" between the Semantic eScience of Security research and the recommendations contained in the NSF sponsored Cybersecurity Experimentation of the Future (CEF) Report, it might be worth having read of the CEF or at least doing a keyword search for items like semantic, ontology, provanence, etc. Not surprising many of the CEF recommendations for mid-term and long-term research align very closely with research focus and direction we presented in the Semantic eScience of Security project. It might be worth having a read of both documents to understand the many connections between the recommendations of CEF and the research we presented in our Semantic eScience of Security focused paper.
The ontologies that were developed as part of this Semantic eScience of Security research are now public and licensed under Creative Commons Share-Alike International on GitHub.
https://github.com/daedafusion/cyber-ontology
We have another couple things to put up but I wanted to go ahead and share these Making Security Measurable themed ontologies with the SoS community.
Hi folks,
Now that the cross government, industry, and academia Making Security Measurable (MSM) community has stood up a Cyber Threat Intelligence Ontology Working Group to develop the first of the modular RDF/OWL2 ontologies for STIX/CYBOX, I wanted to give a quick reminder that free MOOC on Knowledge Engineering and Semantic Web Technologies starts on Monday Nov 2 and runs through Monday Dec 14. Details on the MOOC area further up in this thread.
I hope some of the SoS community take the opportunity to learn more about knowledge engineering and semantic technologies as we move forward with developing cyber ontologies for the international community to use in the operational cyber ecosystem.
Best,
Shawn
Shawn & All:
Here's [1] an article called "Increasing Insurance Levels Through Early Verification with Type Safety" published in the Journal of Cyber Security and Information Systems." The article presents a computational perspective on software assurance derived from the approach used on CRASH-SAFE and HACMS. Note the approach to technology transition diverges from that included in CRASH-SAFE. CRASH-SAFE proposed a new language called Breeze that was itended to be easier to learn than higher order functional languages. "Increasing Assurance Levels" proposes an approach in "mainstream" architecture such as UML. RDF and OWL do not allow parametricity (aka. parametric polymorphism).
I will be in DC and available on November 15 and 16.
I enjoyed reading your paper and hope you will have the opportunity to meet. You can contact me at rick@rickmurphy.org or richard.murphy@gsa.gov.
--
Rick
Hi Rick,
Thank you for the link to the paper. I enjoyed reading it this morning with my coffee. While I don't focus on software assurance and I'm not a software architect, developer, or engineer I do follow advancements in this area since software is eventually transitioned to production in the operational ecosystem where my focus is.
For those who haven't read both papers. Rick's paper is aimed at software architects, developers, and engineers who are building software programs and outlines a process/methodology for increasing assurance levels through early verification with type safety before the software is transitioned to production. This process/methodology is aimed at helping the software architect, developer, engineer build safer, more secure software before it is transitioned to production in the operational ecosystem.
My paper is entirely focused on the production environment in the operational ecosystem. My paper is not aimed at the software architect, developer, or engineer but rather is aimed at the operators, analysts, and scientists who are studying, analyzing, and defending the production systems in the operational ecosystem. While my paper does touch on software assurance it is specifically focused on the operational ecosystem AFTER software has transitioned to production.
Let us have look at Software Assurance from the Making Security Measurable effort to further our understanding.
Software Assurance
Software assurance begins with code quality and evidence of that quality. You can assume a software defect found during the development of a product may require $1 to remedy. If the defect escapes the development phase and enters the independent testing phase the cost will be approximately $100 to remedy. If the defect escapes the independent testing phase and makes it into production the cost will be approximately $1,000 to remedy. If sensitive data is lost or attackers make the software do things they are not supposed to do, through exploitation of a known software weakness, the costs of this defect may exceed many thousands of dollars to repair -- if repair is even possible -- and the impact could go well beyond anything that money can represent.
Software assurance (SwA) is defined as the level of confidence that software is free from vulnerabilities, either intentionally designed into the software or accidentally inserted at any time during its life cycle, and that the software functions in the intended manner.
The first step in gaining software assurance is to improve the various aspects of quality of the applications/software you directly control and have evidence to support your confidence in the quality of that software. This segment can be further refined into three sections, all of which can include software weaknesses that could be exploitable when the software is in operation:
The software you write directly.
The software that you contract someone else to write for you (typically a domain expert).
The software you, or your developing contractor, includes into your software applications from third-party libraries, open-source, licensed or purchased software utilities.
Organizing, managing and presenting evidence to support assertions about the quality of software can be approached in many ways. Within the some organizations this is one aspect of a system Certification and Accreditation, but another approach, which is fairly new but promising is with an Assurance Case.
The Software Assurance Program of the Department of Homeland Security's National Cyber Security Division co-sponsors Software and Supply Chain Assurance (SSCA) Forums semi-annually with organizations in the Department of Defense and the National Institute for Standards and Technology. DHS also provides online resources to support the industry-wide community and working groups with their Software Assurance Community Resources and Information Clearinghouse (SwA CRIC) and Build Security-In Web site.
Lacking common characterization of exploitable software constructs presented one of the major challenges to realizing software assurance. As part of its Software Assurance public-private collaboration efforts, DHS has continued to provide the sponsorship for theCommon Weakness Enumeration (CWE(tm)) to provide the requisite characterization of exploitable software constructs. CWE better enables the needed education and training of programmers on how to eliminate all-too-common errors before software is delivered and put into operation. Mitigation practices are associated with each CWE identifier, along with common attack patterns (CAPEC(tm)). This aligns with the DHS "Build Security In" approach for software assurance so that software is developed more securely on the front end; avoiding security issues in the longer term. CWE, along with Common Weakness Risk Analysis Framework (CWRAF(tm)) and Common Weakness Scoring System (CWSS(tm)), provides a standard means for understanding residual risks and thus enabling more informed decision-making by suppliers and consumers about the security and resilience of software.
Where Rick's paper is focused on increasing the assurance of the software being developed through early verification with type safety in the pre-production software development lifecycle, my paper focuses on the fusion and organization of the data in CWE, CAPEC, CWRAF, CWSS, etc languages and formats so they can be connected to other cyber security data sets like incident data, threat intelligence, etc.
While CWE and CAPEC are used in Software Assurance they are also used in activities like penetration testing. Pen-testers use TTPs (CAPEC, MAEC) to look for known vulnerabilities (CVE), weaknesses (CWE), and security configuration issues (CCE). CVE, CWE, and CCE are also used in the Structured Threat Information eXpression (STIX) language for cyber threat intelligence under the "Exploit Target" area where it is used to describe the exploit target (CVE, CWE, or CCE) of a threat actor's TTP. Same way, CAPEC is used in STIX under the TTP area to identify the "attack pattern" or blue print the attacker is following. The TTP area also uses the Malware Attributes Enumeration and Classification (MAEC) and CYBOX languages to describe the threat actor's tools and malicious software.
Rick wants to build more secure and safe software before it goes into production and I want to build a knowledge base in production where all that standards-based data coming from different operational areas (Software Assurance, Pen-Testing, Incident Response, Threat Intelligence & Information Sharing, etc) is organized and fused together with provenance and the semantic meaning of the objects, associations, attributes, and activity is clearly defined based on the Making Security Measurable standards.
I thought this article might be of interest to members of community following this thread.
Use Semantics to Keep Your Data Lake Clear
In today's evolving world of big data, many businesses have been inspired by the potential of the data lake approach. A data lake offers advantages to storing large volumes of heterogeneous data and, for the majority of organizations that need to analyze complex data (structured and unstructured), a data lake delays the need to integrate the data with a data warehouse.
But constructing a usable data lake with native formats presents a number of challenges that must be addressed for the data lake to fulfill its promise of making it easier and less costly to extract actionable information from data.
According to Gartner Research Director Nick Heudecker, "Data lakes typically begin as ungoverned data stores. Meeting the needs of wider audiences requires curated repositories with governance, semantic consistency, and access controls."
It's important not to overly constrain the data, but without sensible governance, users soon will find that accessing what they have stored is surprisingly challenging. Idle and overgrown, the data lake quickly will become a stagnant data swamp. But organizations can avoid data swamps by adding semantics to a data lake.
Why Semantics?
Semantics brings a powerful, yet highly flexible structure to unstructured and structured data, in a model that is sustainable over time. It allows users to see relationships between data without first forcing that data into a schema straightjacket and supports ad hoc and unanticipated analytic uses. It affordably breaks down data silos and, more simply, just makes sense. Semantics is intrinsically about logic and rules, and was developed to organize information in a comprehensible fashion. So it should come as no surprise that semantics provides us with a highly usable and consistent taxonomy model for data lakes.
For example, variances between words and their uses within big data sets can be highly problematic. Think of the last time you searched for a word with multiple meanings - perhaps you wanted information on Python programming, but ended up with returns on massive snakes. Think too about the way we use different spellings and abbreviations for the same words - (i.e., California, Calif., CA.). Context matters also: Are the data referring to the city of New York, or the state?
Born in response to issues like these was the semantic data model: A way to extract and define the meaning of the data in a logical way that makes sense both to people and machines. Still, it's important to note that even though semantic models make sense to humans, they primarily are intended to allow software to extract and assign meaning to data independently.
Using a semantic data model, you represent the meaning of a data string as binary objects - typically in triplicates made up of two objects and an action. For example, to describe a dog that is playing with a ball, your objects are DOG and BALL, and their relationship is PLAY. In order for the data tool to understand what is happening between these three bits of information, the data model is organized in a linear fashion, with the active object first - in this case, DOG. If the data were structured as BALL, DOG, and PLAY, the assumption would be that the ball was playing with the dog. This simple structure can express very complex ideas and makes it easy to organize information in a data lake and then integrate additional large data stores.
Semantic Data Models in the Swamp
A workable semantic data model can be created by anyone with an understanding of logic and taxonomies. But when it comes to integrating disparate data sets, the fastest route to success is the use of a common language (nomenclature) across an entire audience and, often, industry.
Semantic data models, in combination with semantic graph databases, bring clarity, relationships, and structure to unstructured information and are designed explicitly to share data, discoveries, and answers. The data sources used for analytics can be - and often are - both internal and external, such as Linked Open Data, a graph of interlinked data sets. The standard for a Linked Open Data set is RDF, the Resource Description Framework, which is a model for describing things and their relationships. Tim Berners-Lee, inventor of the World Wide Web, is fond of describing linked data as "the semantic Web done right."
Developing semantic data models for key industries is currently underway, with healthcare at the forefront, spearheaded by Montefiore Medical Center and its partners, which have created the first semantic data lake for healthcare. Semantic data models act as a translator, enabling variances in industry terminologies and words to be easily integrated with other internal and public data sets.
A semantic data lake is incredibly agile. The architecture quickly adapts to changing business needs, as well as to the frequent addition of new and continually changing data sets. No schemas, lengthy data preparation, or curating is required before analytics work can begin. Data is ingested once and is then usable by any and all analytic applications. Best of all, analysis isn't impeded by the limitations of pre-selected data sets or pre-formulated questions, which frees users to follow the data trail wherever it may lead them.
- See more at: http://data-informed.com/use-semantics-to-keep-your-data-lake-clear/#sthash.XeovXfQ4.dpuf
Just a quick update to share that Cyber Intelligence Ontologies developed as part of this research have been forked on GitHub by the Defense Security Information Exchange (DSIE) as part of transitioning this research into practice.
https://github.com/DSIE/cyber-ontology
I wanted to share an article about the Pacific Northwest National Lab and their CHAMPION solution. PNNL applied semanitc technology so that it could understand massive amounts of network data to help cyber security analysts find activity of interest in near real time. This is very similar to my Semantic eScience of Security research which focused on significantly more complex cyber threat intelligence data instead of network data. They key enabler in both approaches is the use of advanced semantic web technology and descriptive logic to enable the technology to reason about the data is was seeing.
Shawn
Pacific Northwest National Laboratory scientists win five R&D 100 awards
Technologies that impact cyber security, increase our ability to detect trace amounts of chemicals, convert sewage into fuel, view energy processes under real-world conditions and forecast future electric needs are among the newest R&D 100 award winners. R&D Magazine honored five advancements developed by researchers at the Department of Energy's Pacific Northwest National Laboratory at an event in Las Vegas on Nov. 13.
R&D Magazine selects the 100 most innovative scientific and technological breakthroughs of the year from nominations spanning private, academic and government institutions. These honors bring PNNL's total to 98 since the awards' inception in 1969.
Cybersecurity software that knows its stuff
If you're a hacker aimed at stealing credit card information from a retail company and you want to evade detection, massive amounts of network data are your ally. Analysts have the know-how to sort through this digital mess, but they often identify attacks too late. Analytical software developed at PNNL can help find these and other threats in near-real-time. That's because the software, called Columnar Hierarchical Auto-associative Memory Processing in Ontological Networks or CHAMPION, has the knowledge to sort through data like an analyst, but on a much greater scale.
Scientists designed CHAMPION to use human analysts and historical data to learn about the company it's protecting. Starting with advanced Semantic Web technologies, which translate human knowledge into something that's machine readable, CHAMPION then uses descriptive logic to reason whether activity is suspicious. For example, if a retail company's HVAC data back-up account tries to access the point-of-sale system, CHAMPION could use historical data to conclude that this is unusual. Once identified, the software alerts an analyst of the suspicious activity--in time to potentially thwart an attack.
Cybersecurity isn't CHAMPION's only trick. Change its diet of knowledge and the software can learn to analyze financial services or health care data. PNNL licensed the software to Champion Technology Company Inc. to pursue all three applications.
The CHAMPION development team includes PNNL's Shawn Hampton, Rick Berg, Katya Pomiak and Patrick Paulson; Champion Technology Company's Ryan Hohimer, Alex Gibson and Peter Neorr; and former PNNL scientist Frank Greitzer.
Read more at: http://phys.org/news/2015-11-pacific-northwest-national-laboratory-scientists.html
Hi Folks,
There was another really good article discussing Object-Based Production (OBP) in the news. As you might be aware, I've been focused on applying Object-Based Production to cyber security / cyber threat intelligence since BlackHat 2010 in Vegas. It's the heart of what the Semantic eScience of Security paper was about and how it support the science of security core themes while modernizing and advancing analytic tradecraft. Here is a snippit of information about OBP and link to the article.
Shawn
Object-based production is a concept being implemented as a whole-of-community initiative that fundamentally changes the way the IC organizes information and intelligence. Reduced to its simplest terms, OBP creates a conceptual "object" for people, places, and things and then uses that object as a "bucket" to store all information and intelligence produced about those people, places, and things. The object becomes the single point of convergence for all information and intelligence produced about a topic of interest to intelligence professionals. By extension, the objects also become the launching point to discover information and intelligence. Hence, OBP is not a tool or a technology, but a deliberate way of doing business.
While simple, OBP constitutes a revolutionary change in how the IC and the Department of Defense (DOD) organize information, particularly as it relates to discovery and analysis of information and intelligence. Historically, the IC and DOD organized and disseminated information and intelligence based on the organization that produced it. So retrieving all available information about a person, place, or thing was primarily performed by going to the individual repository of each data producer and/or understanding the sometimes unique naming conventions used by the different data producers to retrieve that organization's information or intelligence about the same person, place, or thing. Consequently, analysts could conceivably omit or miss important information or erroneously assume gaps existed.
OBP aims to remedy this problem and increase information integration across the IC and DOD by creating a common landing zone for data that cross organizational and functional boundaries. Furthermore, this business model introduces analytic efficiency; it reduces the amount of time analysts spend organizing, structuring, and discovering information and intelligence across the enterprise. By extension, OBP can afford analysts more time for higher orders of analysis while reducing how long it takes to understand how new data relate to existing knowledge. A central premise of OBP is that when information is organized, its usefulness increases.
A concrete example best illustrates the organizing principle of OBP and how it would apply to the IC and DOD. Consider a professional baseball team and how OBP would create objects and organize information for all known people, places, and things associated with the team. At a minimum, "person" objects would be created for each individual directly associated with the team, including coaches, players, the general manager, executives, and so forth. As an example of person-object data, these objects would include characteristics such as a picture, height, weight, sex, position played, college attended, and so forth. The purpose is to create, whenever possible, objects distinguishable from other objects. This list of person-objects can be enduring over time and include current and/or past people objects or family or previous team relationships.
In a similar fashion, objects could be created for the physical locations associated with the team, including the stadium, training facility, parking lots, and players' homes. The same could be done for "thing" objects associated with the team, such as baseballs, bats, uniforms, training equipment, team cars/buses/planes, and so forth.
With the baseball team's objects established, producers could report information to the objects (for example, games, statistics, news for players, or stadium upgrades), which would serve as a centralized location to learn about activity or information related to the team. Also, relationships could be established between the objects to create groupings of objects that represent issues or topics. For example, a grouping of people-objects could be created to stand for the infield or outfield, coaching staff, or team executives. Tangential topics/issues such as "professional baseball players involved in charity" could be established as well. Events or activities (such as games) and the objects associated with them could also be described in this object-centric data construct. Moreover, the concept could expand to cover all teams in a professional baseball league or other professional sports or abstract concepts that include people, places, or things.
Similar to the example above, the IC and DOD will create objects for the people, places, things, and concepts that are the focus of intelligence and military operations. Topics could include South China Sea territorial disputes, transnational criminal organizations, Afghan elections, and illicit trade. Much like the sports example, IC and DOD issues have associated people, places, and concepts that could be objects for knowledge management.
Read the whole article here: https://www.govtechworks.com/transforming-defense-analysis/#gs.MnGchY0
Here is a link to the latest Cyber Security Measurement and Management Architecture poster that shows common security management processes such as vulnerability analysis, threat analysis, intrusion detection, incident management, etc along with the associated making security measureable standards (STIX, CYBOX, MAEC, CVE, CWE, CAPEC, etc) that align to the various areas.
http://makingsecuritymeasurable.mitre.org/docs/MSM_Measurement_and_Architecture_diagram_handout.pdf
How Data Standards Provide for Measurable and Manageable Security
The measurability and manageability of security is improved through registries of baseline security data, providing standardized languages as means for accurately communicating the information, defining proper usage, and helping establish community approaches for standardized processes.
I thought this might be of interest. This DARPA funded reseach project focused on using cyber security ontologies to apply reasoning to incident response. This research took the same path in regards to developing ontologies based on existing data sets like SCAP, STIX, MAEC, CYBOX, etc that we did in our CSCSS SoS research. Essentially building a "knowledge graph" of the cyber security data that we could then capture expert knowledge in the form of reasoning rules.
TAPIO: Targeted Attack Premonition using Integrated Operational Data Sources
Invincea Labs
http://www.slideshare.net/Invincea/invincea-tapio-ontology-csaw-2014-short-ppt
When the TAPIO slide deck came out, everyone tried for months to get DARPA to share the cyber security ontologies developed as part of the TAPIO project. This would be considered the "intellectual property" so DARPA didn't allow the ontologies to be shared with the wider community.
Since TAPIO and our CSCSS Semantic eScience of Security research took the same basic approach to developing ontologies that could fuse data coming from the full spectrum of cyber threat intelligence (CTI) with the specific enterprise security details from areas like continuous diagnostics and mitigation (CDM) the CSCSS team made our ontologies freely available on GitHub. https://github.com/daedafusion/cyber-ontology
While giving away the "intellectual property" might not be DARPA's style, our team at CSCSS being a non-profit cyber think tank felt the community would benefit from concrete examples of knowledge engineering and specifically knowledge representation of key cyber security data sets, such as sharing the 100 or so modular cyber security ontologies from our CSCSS Science of Security research and providing a technology stack blueprint to enable people to build it themselves would go a long way towards moving the community to using cyber defense knowledge graphs.
If you are not familiar with "ontologies" they are a form of knowledge representation.
Knowledge-representation is the field of artificial intelligence that focuses on designing computer representations that capture information about the world that can be used to solve complex problems. The justification for knowledge representation is that conventionalprocedural code is not the best formalism to use to solve complex problems. Knowledge representation makes complex software easier to define and maintain than procedural code and can be used in expert systems.
For example, talking to experts in terms of business rules rather than code lessens the semantic gap between users and developers and makes development of complex systems more practical.
Knowledge representation goes hand in hand with automated reasoning because one of the main purposes of explicitly representing knowledge is to be able to reason about that knowledge, to make inferences, assert new knowledge, etc. Virtually all knowledge representation languages have a reasoning or inference engine as part of the system
You can read more about knowledge representation and reasoning on Wikipedia here:
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
The link in posting #33 doesn't appear to work.
Hi Robert, Thanks for letting me know. For some reason an extra blank character space was at the end of the URL which prevented this forum from allowing the URL to work. Probably from copy and pasting the URL from the browser window into the forum comment section. URLs on this forum have to stop flush at the end or they don't work. My bad for not testing it. I've removed the extra single space and it now works fine. Best, Shawn
It looks like AI is one of the top tech predictions for 2016. Glad to see AI like our CSCSS research and ontologies go mainstream as we move forward into 2016.
Artificial intelligence goes mainstream
Every second interview at the Web Summit this year, from Facebook to Deloitte and Accenture, mentioned artificial intelligence, and 2016 will be a pivotal year in mainstreaming the technology. FromFacebook doing incredible things with Messenger and adding in artificial agents through its enigmatically titled technology 'M' so consumers can talk with brands to order goods, to questions about whether robots will take our jobs, 2016 will be very interesting.Accenture CTO Paul Daugherty claims artificial intelligence will augment our capabilities and make humans super. "It's really multiple technologies, deep learning, machine learning, semantic ontology, expert systems, video analytics, etc. A lot of different technologies coming together that allow us to create these new capabilities, either for consumers to change the way they live or employees to change the way their jobs work and the way they work in organisations."
This paper provides an excellent discussion of leveraging Joint Doctrine to develop interoperable systems:
Peter Morosoff, Ron Rudnicki, Jason Bryant, Robert Farrell, Barry Smith, "Joint Doctrine Ontology: A Benchmark for Military Information Systems Interoperability", Semantic Technology for Intelligence, Defense and Security (STIDS), 2015, CEUR.
http://ceur-ws.org/Vol-1523/STIDS_2015_T01_Morosoff_etal.pdf
Bingo! Now you're getting it. The modular ontologies we developed in our research to represent Cyber Threat Intelligence, Assets, Vulnerabilities, Weaknesses, Attack Patterns, Malware, etc are in RDF/OWL2. The proposed Joint Doctrine ontologies in the paper were to be in RDF/OWL2 which means they could work hand in hand as I mentioned way back in comment #2. OWL uses "UnionOf" to fuse entities at the atomic level to enable a very fluid and straight forward way to fuse data representing the same entity from 2 or more different data sets. This way you could say, Joint Doctrine is my primary vocabulary that is used to define the Nouns (Entities) and Verbs (Attributes, Relationships, Activity) that make up the axioms in my knowledge graph. Other data sets that mention the Nouns and Verbs contained in the Joint Doctrine could then be mapped using UnionOf to fuse the terms extending our knowledge graph to include the new data set but still connect it into the primary vocabulary of Joint Doctrine.
You might want to build a knowledge-based system that creates a knowledge graph spanning Joint Doctrine using the ontologies mentioned in the paper, Cyber Threat Intelligence and Continuous Diagnostics and Mitigation from the CSCSS ontologies, and Geospatial / Map data using the Geospatial Semantics and Ontology from http://cegis.usgs.gov/ontology.html. This is all possible using modular ontologies and connecting them together in the same way we connected the various standards that make up the cyber security measurement and management architecture. Throw in things like Provenance from the PROV-O standard and you're well on your way.
All of this would still need a big data technology stack similar to the one described in the Semantic eScience of Security paper.
The "What is Security Science" post in this forum says "Security Science - is taken to mean a body of knowledge containing laws, axioms and provable theories relating to some aspect of system security."
If we wanted to capture that knowledge and use it operationally, our thought was to leverage the field of Artificial Intelligence to build a type of Knowledge-Based System known as an Expert System designed to support eScience, a Semantic eScience system. These are systems where we use A.I. knowledge representation languages to represent the knowledge in the same way human's think about knowledge. This is done this way so that we can then capture expert level human reasoning in the form of IF THEN statements. An expert system, such as the Semantic eScience of Security platform discussed in the paper, is divided into two sub-systems: the inference engine and the knowledge base. The knowledge base represents facts and rules. The inference engine applies the rules to the known facts to deduce new facts. Inference engines can also include explanation and debugging capabilities.
We had hoped to eventually develop a catalog of human expert based reasoning rules that could be developed by the SoS research community as well as the operational cyber security community.
Another future aspect we wanted to do was to then add in Natural Language Query (think Apple Siri, Google Now, Amazon Echo, etc) so humans can speak questions to the system and get an answer back based on the knowledge represented in the system.
NSA constantly challenged me over my 20 years supporting the cyber and IA missions to push the boundaries. Perhaps this is still too far out but after investing the past 5 years on this, I can't shake the belief that this type of expert system applied to security science is needed.
Anyone who has followed my activity on here on the SoS VO or on LinkedIn over the past few years knows I've been an outspoken advocate for applying Object-Based Production and Activity-Based Intelligence to cyber threat intelligence and security operations and how this can support security science in the operational ecosystem. I thought the following might be of interest to others who are also advocating for advanced analytic capabilities.
On December 8, the Air Force Research Laboratory posted the following broad agency announcement for Object Based Production and Activities Based Intelligence Technology Development for Indications and Warning (BAA NUMBER: BAA AFRL-RIK-2016-0001). For best consideration in FY16, the agency recommends that interested parties submit white papers by March 1, 2016.
The focus of this broad agency announcement (BAA) is to research, develop, demonstrate, integrate and test innovative technologies for Object Based Production (OBP) and Activities Based Intelligence (ABI) tradecraft in support of multi-domain Automated Indications and Warnings (AI&W).
This BAA seeks capabilities for the discovery and analysis of emerging activities across multiple domains (ground, air, maritime, space and cyberspace) in order to provide tactical and strategic level indications and warning and to reduce the time needed to search and correlate data from multiple sources on multiple systems. Methods are needed to:
- Provide richer and more robust patterns of life to support timely, effective and efficient command and control decisions
- Characterize, locate and compare/contrast activities, driving rapid data exploration and discovery of significant events
- Anticipate events by building a deep understanding of the networks that give rise to specific incidents
This announcement seeks research and development which demonstrates the ability to synergistically apply OBP and ABI technologies. The specific technology areas of interest include but are not limited to:
- Algorithms for exploiting data to learn normal patterns of behavior, detect deviations from normalcy, and anticipate future behavior
- Learning algorithms which construct models of normal activity patterns at a variety of conceptual, spatial, and temporal levels to reduce a massive amount of data to a rich set of information regarding the current status of active models
- Algorithms which discover gaps in existing knowledge, create hypotheses about what the missing data might provide, and provide a recommended collection tasking
- Continuous incremental learning which enables the models of normal behavior to adapt well to evolving situations while maintaining high levels of performance.
- Algorithms for uncovering tactics, techniques and procedures (TTP) to impact ongoing operations
- Mining relevant data from overwhelmingly big data
- Cloud based data and information sharing
- OBP optimized processing and auto-association
Full information is available https://www.fbo.gov/index?s=opportunity&mode=form&id=6dcfbc1a14982f91c3dbc0dbfbc39b10&tab=core&_cview=0
I thought others might find this observation interesting. My August 2015 Science of Security paper on "Developing Scientific Foundations for the Operational Cybersecurity Ecosystem" has largely been ignored by the SoS research community. For example, it was NOT included in the SoS Annual Report for 2015, which includes information about all of the activities associated with the Science of Security (SoS) initiative over the past year. Yet at the same time the research paper has been embraced by the operational community where well respected companies such as FireEye, iSight Partners, etc have already transitioned the research from the paper into their internal operations and engineering specifications. It's fascinating to see how divergent the SoS community can be between research and operations.
One of the main focuses of my Science of Security paper is the creation of a Unified Cyber Ontology and how this can support digitally intensive science in the operational environment. This is exactly what the DoD Cyber Crime Center (DC3) is advocating for as well. The below are comments from a DoD Civilian at DC3 made to the Cyber Threat Intelligence community.
"One of the best treatments of this issue is by Yoan Chabot, Aurelie Bertaux, Christophe Nicolle, Tahar Kechadi in "An Ontology-Based Approach for the Reconstruction and Analysis of Digital Incidents Timelines" published in the Journal of Digital Investigation (2015), Special Issue on Big Data and Intelligent Data Analysis, pp.18. <hal-01176091>
In fact, there is a strong argument for more separate in the CTI work to create a Unified Cyber Ontology (UCO) effort to abstract and express concepts/constructs that are common across the cyber domain."
DC3's point of view is somewhat forensic centric. You can learn more about their view in this paper. http://dfrws.org/2015eu/proceedings/DFRWS-EU-2015-11.pdf
Just a quick follow up post to a paper about TAPIO from DARPA's Integrated Cyber Analysis System (ICAS) program since this is very similar to the approach we used in our Semantic eScience for Science of Security paper. One of the main differences between DARPA's research and our Science of Security research was that we open sourced our 100+ modular Unified Cyber Ontology / Unified Object Model on GitHub to advanced the Science of Security community's research against the 7 core themes and DARPA didn't share with anyone outside the ICAS program. (Link to our 100+ ontologies is here: https://github.com/daedafusion/cyber-ontology )
Enabling New Technologies for Cyber Security Defense with the ICAS Cyber Security Ontology
Abstract--Incident response teams that are charged with breach discovery and containment face several challenges, the most important of which is access to pertinent data. Our TAPIO (Targeted Attack Premonition using Integrated Operational data) tool is designed to solve this problem by automatically extracting data from across the enterprise into a fully linked semantic graph and making it accessible in real time. Automated data translation reduces the costs to deploy and extend the system, while presenting data as a linked graph gives analysts a powerful tool for rapidly exploring the causes and effects of a particular event. At the heart of this tool is a cyber security ontology that is specially constructed to enable the TAPIO tool to automatically ingest data from a wide range of data sources, and which provides semantic relationships across the landscape of an enterprise network. In this paper we present this ontology, describe some of the decisions made during its development, and outline how it enables automated mapping technologies of the TAPIO system. Index Terms--cyber security, ontology, cyber analysis, semantic technologies, ontology patterns, forensic analysis.
Paper - http://ceur-ws.org/Vol-1523/STIDS_2015_T06_BenSalem_Wacek.pdf
Just for clarification, "Semantic eScience" is a discipline of Web Science and describes the technology and engineering approach to create a data-driven archiecture that can read, write, and understand the meaning of the cybersecurity data in the operational ecosystem. A technology stack that brings reasoning and inferrence capabilities along with mathematical scoring systems with semantic technology that is designed for answers questions human's have about the data. Semantic eScience, the web science and technology stack, is NOT a replacement or competition for the Joint Doctrine. In fact, it should fully support it from a technology standpoint, depending of course on the data model and datasets being added.
If it helps, you can read about Semantic eScience being applied to other data intensive science such as Heliophysics in the following papers. Perhaps this will help with understanding Semantic eScience for the infrastructure and web science that it is.
https://www.researchgate.net/publication/254258631_From_science_to_e-Science_to_Semantic_e-Science_A_Heliophysics_case_study
https://www.researchgate.net/publication/252874781_Progress_toward_a_Semantic_eScience_Framework_building_on_advanced_cyberinfrastructure