Science of Security Semantic eScience Infrastructure for the Operational Cyber Ecosystem
In the modern world, every organization has their own cyber ecosystem that is connected to the bigger, global cyber ecosystem. The cyber ecosystem is made of up people, processes, technology, and data. Each organization has their own unique cyber ecosystem based on their individual requirements, resources, and technology choices.
For over a decade government, industry, and academia have developed a number of information security related standardizations that are increasingly being adopted by vendors and forming the basis for security operations management and measurement activities across wide groups of industry and government. The basic premise of the Cyber Security Measurement and Management Architecture is that for any enterprise to operate, measure, and manage the security of their cyber assets they are going to have to employ automation. For an enterprise of any reasonable size that automation will have to come from multiple sources. To make the finding, sharing, and reporting issues consistent and composable across different tools and partners there has to be a set of standardized definitions of the things that are being examined, reported, and managed by those different tools and described by different information sources. That standardization is what comprises the core of the "Making Security Measurable" efforts.
Information security operation, measurement and management, as originally practiced, is complex, expensive, and fraught with unique activities and tailored approaches. Solving the variety of challenges that were facing enterprises with regards to incident and threat analysis and management, patching, application security, and compliance management required fundamental changes in the way vendor technologies are adopted and integrated. These changes include the way enterprises organize and train to utilize these capabilities. Likewise, to support organizational discipline and accountability objectives while enabling innovation and flexibility, the security industry needed to move to a vendor neutral security operations, management and measurement strategy. The strategy had to be neutral to the specific solution providers while also being flexible enough to work with several different solutions simultaneously. Finally, the new approach had to enable the elimination of duplicative and manual activities, improve resiliency, and the ability of organizations to leverage outside resources and collaborate with other organizations facing the same threats and risks.
These objectives are being met by bringing architecturally driven standardization to the scoping and organization of the information security activities that our enterprises practice. By acknowledging the "natural" groupings of activities or domains that all information security organizations address -- independent of the tools and techniques they use -- a framework has been established within which organizations can organize their work independent of their current technology choices and flexible enough to adapt to tomorrows offerings. This framework is called the Cyber Security Measurement and Management Architecture. (http://makingsecuritymeasurable.mitre.org/docs/Cyber_Security_Measurement_and_Management_Poster.pdf)
All of the individual languages (STIX, CYBOX, MAEC, CVE, CWE, CAPEC, etc) used in the Cyber Security Measurement and Management Architecture leverage the W3C XML standardized language to enable the structured information to be machine-readable to support automation and standardization. One of the key aspects of the architecture is the use of knowledge repositories and information sharing. The knowledge repositories store information in a linear fashion or simply put, it stores the information in the order in which it was received. You can think of these knowledge repositories as silos of structured information. Each type of data generally has its own silo or multiple silos. For example, the National Vulnerability Database is a "silo" of vulnerability information in the CVE standardized format. Over the past few years an increasing number of threat intelligence silos have emerged that leverage the STIX and CYBOX standards.
The Cyber Security Measurement and Management Architecture enables automation and interoperability in the cyber ecosystem. This information-oriented approach was designed to sit on top of the cyber ecosystem. You can think of the cyber ecosystem itself as the Data Layer and the Cyber Security Measurement and Management Architecture as the Information Layer on top of it. While the architecture includes knowledge repositories, the knowledge is in silos stored in a linear manner. Humans generally have to do the work to connect the dots between different pieces of structured information coming from different sources and silos. In fact connecting the dots on malicious cyber activity is the key purpose of the Cyber Threat Intelligence Integration Center (CTIIC) announced by the White House in February of this year.
The Cyber Security Measurement and Management Architecture directly supports the 7 core themes of the Science of (Cyber) Security discipline. The architecture provides Common Languages that express security in a precise and consistent way. The architecture provides the fundamental definitions of key security concepts and Core Principles. The architecture supports cyber Attack Analysis, Measurable Security, and analyzing Risk while providing Agility in the form of automation and interoperability. The architecture also supports the core theme of Human Factors such as information about threat actors and their motivations and TTPs.
What is missing from the cyber ecosystem is the basic scientific infrastructure to facilitate scientific knowledge modeling, logic-based hypothesis checking, semantic data integration, application composition, and integrated knowledge discovery and data analysis of the cyber security measurement and management architecture. The Science of Security discipline is data intensive and as such it could benefit from what is known as Semantic eScience infrastructure. Semantic eScience infrastructure provides data integration, fusion, and mining; workflow development, orchestration, and execution; capture of provenance, lineage, and data quality; validation, verification, and trust of data authenticity; and fitness for purpose.
A Science of Security Semantic eScience infrastructure would work seamlessly with the Cyber Security Measurement and Management Architecture and could federate all the silos of structured cyber security information into an organized, cohesive body of knowledge. If a key characteristic of the Cyber Security Measurement and Management Architecture is all the information is machine-readable, then a key characteristic of the Science of Security Semantic eScience infrastructure would be that it can read the structure information and understand the meaning of the objects, attributes, associations, and activity contained in the information. The semantic eScience infrastructure is able to understand the meaning of the information because of the semantic technology included in the eScience infrastructure.
Semantic technology provides a means to capture the actual semantics of the data within the data itself. In addition, it also enables the ability to capture a meta description of the different kinds of objects, their attributes, associations, and activity into a conceptual model which can then be populated with instances of actual data. Described using the industry-standard Resource Description Framework (RDF), it's possible to capture the conceptual model, referred to as an "ontology", and represent the data itself in a single, consistent manner.
Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relationships between the objects. They are meant to represent information coming from all sorts of heterogeneous data sources. This makes ontologies ideal for dealing with all the different structured, semi-structured, and unstructured data that comes in various formats and languages from across cyberspace.
The OWL/RDF data model is similar to classical conceptual modeling approaches such as entity-relationship or class diagrams, as it is based upon the idea of making statements about resources in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The 'Subject' denotes the object, and the predicate denotes a single semantic trait or aspect of the object that can be a literal value or express a relationship between the subject and another object that is the target of the relationship.
You can think of each RDF statement as a piece of evidence that represents an analytic "pivot". Nearly all cyber security analysts understand the concept of analytic pivoting since it is a key part of cyber attack analysis. A collection of RDF statements intrinsically represents a directed multi-graph and it can federate data from multiple databases and sources about the same object. There is virtually nothing that can't be described using this technique.
Each RDF statement also includes full provenance based on the W3C Provenance (PROV) standard and the integrated W3C PROV ontology. Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the cyber ecosystem.
This means that the Science of Security Semantic eScience infrastructure can automatically connect the dots between all the data coming from the Cyber Security Measurement and Management Architecture. It transforms all the information into evidence statements that can be assembled into directed multigraphs so the humans can visualize the data in the way they naturally think about things. As objects (people, places, things, events, etc) connected by their attributes, associations, and activity. Using semantic eScience doesn't require any changes to be made to the Cyber Security Measurement and Management Architecture or additional changes to vendor solutions beyond their support of the XML formats used in the architecture. Just like the Cyber Security Measurement and Management Architecture was designed to sit on top of the people, processes, technology and data in the cyber ecosystem, the Science of Security Semantic eScience infrastructure would sit on top of the that architecture so that it can take all the information from all the silos and "connects the dots" into a living body of knowledge based on individual pieces of evidence with full provenance.
In the next week or so I'll be releasing a white paper on the Science of Security Semantic eScience infrastructure to provide more details to those who are interested developing a strong, rigorous foundation to their cyber ecosystem. This information is not tied to any specific vendor but rather seeks to provide a insight to the direction the wider cyber security community is heading.
In my previous post, A Strong, Rigorous Scientific Foundation to the Operational Cyber Ecosystem, we discussed the cyber ecosystem and the Cybersecurity Measurement and Management Architecture that was developed over the past decade by government, industry, and academia collaboration. We discussed what was missing from the cyber ecosystem is the basic scientific infrastructure to facilitate scientific knowledge modeling, logic-based hypothesis checking, semantic data integration, application composition, and integrated knowledge discovery and data analysis of the cyber security measurement and management architecture.
We discussed that the Science of Security discipline is data intensive and as such it could benefit from what is known as Semantic eScience infrastructure. Semantic eScience infrastructure provides data integration, fusion, and mining; workflow development, orchestration, and execution; capture of provenance, lineage, and data quality; validation, verification, and trust of data authenticity; and fitness for purpose. A Semantic eScience of Security infrastructure would work seamlessly with the Cyber Security Measurement and Management Architecture and could federate all the silos of structured cyber security information into an organized, cohesive body of knowledge.
The Semantic eScience of Security technology stack choices were dictated by what was needed to support the 7 core themes of the science of security discipline while enabling us to develop an organized, cohesive body of knowledge. The goal was to prove that the Semantic eScience could be customized to support the 7 core themes of the Science of Security discipline within the operational cyber ecosystem. Here is how each of the core themes are supported within the Semantic eScience of Security solution.
Common Language - This theme is about expressing security in a precise and consistent way. By integrating the enumerations, languages and formats, and knowledge repositories from the Cybersecurity Measurement and Management Architecture the solution supports the common languages developed through government, industry, and academia collaboration designed to express security in a precise and consistent way. Additionally, the information coming from the various heterogeneous data sets (STIX, MAEC, CVE, etc) is processed into evidence based statements with clear semantics in the industry standard RDF format.
Core Principles - This theme is focused on foundational principles and fundamental definitions of concepts. The solution supports this theme by leveraging the principles and definitions of concepts contained within the Cybersecurity Measurement and Management Architecture. Additionally, the solution can collect actual instances of data that aligns to those principles and concepts allowing for evidence based knowledge to be produced. The Cybersecurity Measurement and Management Architecture includes guidance on practical application.
Attack Analysis - This theme is focused on analyzing cyber attacks and understanding both the threat actor's actions as well as the actions taken by the defenders. This theme also includes sharing the results of attack analysis in the form of cyber threat intelligence. This theme is supported by being able to collect data in all the formats from across the Cybersecurity Measurement and Management Architecture including STIX with all extensions, CybOX, MAEC, CAPEC, CWE, CVE, and the rest. The solution is able to take the information and product evidence based statements with clear semantics and full provenance information. Each evidence based statement represents an analytic pivot that the solution can automatically assemble and fuse together. The solution provides a number of visualization methods for discovering and looking at data including directed graph, timeline/temporal analysis, and geographic analysis.
The solution also includes the MITRE developed vocabulary for characterizing effects on the cyber threat actor []. The vocabulary allows for stating claims or hypotheses about the effects of cyber mission assurance decisions on threat actor behavior. Cyber mission assurance decisions include choices of cyber defender actions, architectural decisions, and selections and uses of technologies to improve cyber security, resiliency, and defensibility (i.e., the ability to address ongoing threat actor activities). The vocabulary enables claims and hypotheses to be stated clearly, comparably across different assumed or real-world environments, and in a way that suggests evidence that might be sought but is independent of how the claims or hypotheses might be evaluated. The vocabulary can be used with multiple modeling and analysis techniques, including Red Team analysis, game-theoretic modeling, attack tree and attack graph modeling, and analysis based on the cyber-attack lifecycle (also referred to as cyber kill chain analysis or cyber campaign analysis).
Measurable Security - This theme is about techniques to measure security. This theme is supported in the solution by its support for the entire making security measurable collection that make up the Cybersecurity Measurement and Management Architecture. The solution can consume and produce data in the XML formats of the architecture. The solution includes logic based reasoning and inference capabilities as well as mathematical scoring systems required to support the range of measureable security activities.
Risk - This theme is focused on making risk assessments more consistent and less subjective. The Semantic eScience of Security solution enables a number of standardized, repeatable ways to score and measure risk at the application, system, and enterprise level. Much of the work in this field has focused on process and methodology, but risk assessment is still based on individual expertise. The focus of this use case is to make risk assessments more consistent and less subjective using the solution's built in mathematical scoring systems. The below scoring systems, risk analysis framework, threat assessment and risk remediation analysis methodologies that are supported by the Cybersecurity Measurement and Management Architecture have been integrated into the solution allowing a much deeper technical understanding of the risk based on evidence and customized for each individual organization based on their specific business mission and risk tolerance.
Common Weakness Scoring System (CWSS)
CWSS provides a mechanism for scoring weaknesses in a consistent, flexible, open manner while accommodating context for the various business domains. It is a collaborative, community-based effort that is addressing the needs of its stakeholders across government, academia, and industry. CWSS is a part of the CWE project, co-sponsored by the Software and Supply Chain Assurance program in the Office of Cybersecurity and Communications (CS&C) of the US Department of Homeland Security (DHS).
Common Weakness Risk Analysis Framework (CWRAF)
CWRAF provides a framework for scoring software weaknesses in a consistent, flexible, open manner, while accommodating context for the various business domains. It is a collaborative, community-based effort that is addressing the needs of its stakeholders across government, academia, and industry. CWRAF is a part of the Common Weakness Enumeration (CWE) project, co-sponsored by the Software Assurance program in the office of Cybersecurity and Communications of the U.S. Department of Homeland Security (DHS).
Threat Assessment & Remediation Analysis (TARA)
TARA is a methodology to identify and assess cyber threats and select countermeasures effective at mitigating those threats. When applied in conjunction with a Crown Jewels Analysis (CJA) or other means for assessing mission impact, CJA and TARA together provide for the identification, assessment, and security enhancement of mission critical assets, which is the cornerstone of mission assurance.
The focus of TARA to enable an understanding all the attack vectors based on the assets an organization has and understanding how to remediate those attack vectors with countermeasures in a standardized, repeatable manner based on evidence.
Agility - This theme is focused on being more agile to reflect the more dynamic environment that systems now reside in. The solution provides automation of many of the time critical, labor intensive, and high-skilled tasks that must occur in an effective cyber intelligence program results in overall cost reduction, time savings, and better utilization of scarce resources. In short, it can act as a "force multiplier" enabling less-skilled analysts to be more productive and more highly skilled analysts to focus on the identification of unknown threats to the enterprise.
The evolutionary Semantic eScience of Security solution enables non-disruptive and continuous evolution of data ingestion, enrichment, fusion, and analysis capabilities as new cyber tradecraft techniques are developed or adopted allowing the solution to remain at the forefront of the science of security discipline in the operational cyber ecosystem. The solution's design is based on a data-driven architecture approach, where data drives everything in the solution, leverages an extensible set of orchestrated services which can be added to, augmented, replaced, or removed in order to provide the ability to keep pace with the rate of change.
As with the solution architecture, the data models used by the platform are designed for evolution through the use of extensible ontologies - semantic models of data and how it interconnects - and by representing data as set of N-triples (e.g., subject-predicate-object) statements. The use of N-triple statements allow virtually anything to be described, designed to form a natural graph making interlinking of data easy, and is easy to augment and update allowing any new aspects and existing data to be updated without the need for traditional export-transform-import due to changes in storage schema.
The Semantic eScience of Security solution provides the automation and an infrastructure that is designed to be able to adapt to keep pace with a constantly changing environment at a dramatically reduced cost, while continuously providing new capabilities to customers. The automation allows organizations that lack in maturity or have insufficiently skilled staff to still take advantage of intelligence to provide them the insights required to anticipate potential attacks, while allowing them to continue to increase their maturity. The automation of cyber tradecraft helps them bridge the gaps of insufficiently skilled resources and increases the effectiveness of their existing staff, whilst keeping costs down.
Human Factors - This theme tackles factors affecting people's security relevant behavior. This includes both defenders as well as threat actors. Defender human factors could range from secure coding to phishing employees to various response times and activities. Where the core theme of attack analysis focused primarily on the tactical behaviors and actions of the threat actor over time in cyberspace, the human factors theme is focused primarily on operational measurements of the human(s) running the operations.
Operational measurements are generally time based and help us to understand how fast this particular Threat Actor's operations cycle is. Here are a few examples to further our understanding.
- Measure the number of days between attacks/sighting/incidents attributed to this threat actor. In other words, how long is the time between attack cycles on average over time? If the APT campaign attacks every 55 days, that knowledge can help the defender plan to ensure they have right resources in play.
- Measure the time between known events and a specific Threat Actor's activity. For example, if a Threat Actor always attacks within 3 days of Microsoft Patch Tuesday, the defender can use this to plan for the attack. Look to see if there are any reoccurring events, holidays, etc. that act as trigger events for specific Threat Actors.
- Measure the time differential between the observed Threat Actor's Exploit Target (0 day or known vulnerability, configuration issue) and the date it was publically disclosed. In other words if we look at the Threat Actor's exploit, if it was a 0 day, how many days was the Threat Actor observed using it prior to the vulnerability in the platform being disclosed publically. This would give us a negative number. This is also useful for measuring the Threat Actor's level of sophistication and resources. This can help us to rank Threat Actors based on the risk they pose.
- Measure the time between version number changes over time in the Threat Actor's tools and malware to gain insight to the Threat Actor's engineering and development cycle speed and resources.
- Measure how long a Threat Actor will use a Legend / Sock Puppet / Fake Persona accounts created for registering domains or social engineering before abandoning it for another persona.
There are a significant amount of operational measurements that can be taken and captured in solution if the attack data and information is attributed to a Threat Actor.
Just like we need infrastructure in areas like Meteorology to understand and predict the weather we need operational security science infrastructure to understand and predict events in our cyber ecosystem of the future. The Semantic eScience of Security infrastructure provides the basic scientific infrastructure needed to provide data integration, fusion, and mining; workflow development, orchestration, and execution; capture of provenance, lineage, and data quality; validation, verification, and trust of data authenticity; and fitness for purpose.
The next post in this series on Semantic eScience of Security infrastructure in the operational cyber ecosystem is from my good friend Paul Patrick. In the linked article, Paul disscusses the "Blueprint of a Semantic eScience of Security Technology Stack".
https://www.linkedin.com/pulse/blueprint-semantic-escience-security-technology-stack-paul-patrick
The next post in the series will be out next week and in it we will discuss how the technology stack enables Object-Based Production (OBP) and Activity-Based Intelligence (ABI) for cyberspace and specifically the cyber security discipline.
Enjoy!
In our continuing Science of Security series on LinkedIn, here is the next post. This post looks at Object-Based Production and Activity-Based Intelligence for cybersecurity and looks at the "human on the loop" activities.
https://www.linkedin.com/pulse/object-based-production-activity-based-intelligence-shawn-riley
I had a few people ask me about the difference between Data Science and Semantic eScience, and why we selected the Semantic eScience for applying the Science of Security to the operational cyber ecosystem. I created a short post with a graphic to help people understand the difference.
https://www.linkedin.com/pulse/data-science-informatics-semantic-escience-shawn-riley