Spoiler alert! In this post I answer the first question of my quest: what should we spend on cybersecurity. To do this we need a consistent way to quantify risk before we can even begin making spending decisions. I propose the model for this is the Factor Analysis of Information Risk (FAIR).
We started this journey trying to answer two simple questions: what should we spend on cybersecurity and what should we spend it on. From my first post, most people use ROI to justify cybersecurity spending. A good example is the Booz Allen ROI model. In my second post I showed how ROI (or Return on Security Investment (ROSI)) are not good metrics to use to justify cybersecurity spending; in fact, any type of spending. We need to take our economics discussion up a notch and focus on using NPV (Net Present Value) and/or IRR (Internal Rate of Return) rather than ROI/ROSI.
Unlike ROI/ROSI, NPV/IRR are forward looking, taking into account the risk of the investment (through the factor K). The tricky part is translating “high” risk into a number. For example, let’s say we’re looking at a $1M investment that generates an annual net benefit of $1.8M. On the surface, this sounds pretty good, but check out what happens to NPV as we calculate our “high” risk investment.
From Table 1, how we define “high” risk has a huge impact on NPV. I realize that 15x is extreme, but it makes a point: we have to find a way to nail down risk and put quantitative meaning behind qualitative terms like “low,” “medium,” and “high.”
Tell me Yoda, What is Risk?
Before we can quantify risk, we must define risk.
There is no one agreed-upon definition of risk. For example, according to ISO 31000, risk is the “effect of uncertainty on objectives.” NIST defines risk as “a function of the likelihood of a given threat-source’s exercising a particular potential vulnerability, and the resulting impact of that adverse event on the organization.” Of the two, I really like the ISO definition because it focuses on the inherent uncertainties associated with risk (we return to uncertainty later in the post). In other words, if we are certain then is there risk? If we have nothing to lose, then is there risk? On a related note, I like to think of risk as the negative consequences of one’s reality (a topic for another day ;-).
As I mentioned in my last post, calculating risk occurs at the intersection of loss, threats, vulnerabilities, costs, benefits and sound business judgement. Or, a more generic list from the NIST cybersecurity framework is “threats, vulnerabilities, likelihoods, and impacts.”
A FAIR Approach to Risk
What’s amazing to me is that even if we can agree upon a basic definition of risk, there is no standard way to quantify/qualify the risk components: vulnerabilities, threats, loss, etc. To illustrate this point, in “Measuring and Managing Information Risk: a FAIR Approach,” Jack Jones talks about the risks associated with a bald tire. Of course, a bald tire is a vulnerability in the rain. Right? But, what if the bald tire is hanging from a rope tied to a tree branch? What’s the vulnerabiilty, now? What if the rope is frayed? So, the rope is now a vulnerability? Or, is it a threat? What if the tree branch extends out over a 200 foot high cliff? How has my risk calculation changed?
This is such a simple example and when Jack talks to people about this scenario there is no consensus on even the most basic principles such as what’s a vulnerability versus what’s a threat! As Mary Chapin Carpenter sings “sometimes you’re the windshield and sometimes you’re the bug…” This is like two chemists not agreeing on the definition of a reactant versus a catalyst versus a product (yes, Mrs. Nittywonker, I did pay some attention during chemistry class).
To Be FAIR
Factor Analysis of Information Risk (FAIR) was first developed by Jack Jones based on his experience as a CSO at Fortune 100 companies. It is a methodology for quantifying and managing risk and it is now a public standard supported by The Open Group: Open FAIR.
In Open FAIR, risk is defined as the probable frequency and probable magnitude of future loss. That’s it! A few things to note about this definition:
- Risk is a probability rather than an ordinal (high, medium, low) function. This helps us deal with the ambiguity of our “high” risk situation mentioned above.
- Frequency implies measurable events within a given timeframe. This takes risk from the unquantifiable (e.g. our risk of breach is 99%) to the actionable (e.g. our risk of breach is 20% in the next year).
- Probable magnitude takes into account the level of loss. It’s one thing to say our risk of breach is 20% in the next year. It’s another thing to say our risk of breach is 20% in the next year resulting in a probable loss of $100M.
- Open FAIR is future-focused. As discussed below, this is one of its most powerful aspects. With Open FAIR we can project future losses, opening the door to quantifying the impact of investments to offset these future losses.
As shown in Figure 1, the Open FAIR ontology is pretty extensive and this post isn’t the place to get into all the inner workings. I urge everyone to go to The OpenGroup to learn more about Open FAIR.
As shown in Figure 1, risk is the combination of Loss Event Frequency (LEF) (the probable frequency within a given timeframe that loss will materialize from a threat agent’s actions) and Loss Magnitude (LM) (the probable magnitude of primary and secondary loss resulting from a loss event).
To give a frame of reference, an example LEF might be “between 5 and 25 times per year, with the most likely frequency of 10 times per year.” In comparison, Loss Magnitude (LM) is a discrete number (e.g. $35M in the next year).
Teasing out Vulnerability and Threat
As I wrote about in my last post, one of my concerns with trying to apply the Gordon-Loeb Model of cybersecurity economics to cybersecurity spending decisions is its lumping together vulnerabilities and threats into a risk-of-bad-stuff-happening axis. The great news is in Open FAIR terms, the Gordon-Loeb’ Model’s “vulnerability/threat” equates to Loss Event Frequency (LEF), allowing us to treat vulnerabilities and threats as two distinct – but related – entities.
Open FAIR defines Threat Event Frequency (TEF) as the probable frequency within a given timeframe that threat agents will act in a manner that may result in loss. In other words, about once a week I drive up to an empty (no other cars and no cops) four-way-stop intersection near my house, making my Contact Frequency (CF) approx. 50 times per year. My Probability of Action (PofA) (blowing through the stop sign) is extremely low since I’m a creature of habit (stop sign=stop). This makes my TEF very low. My wife, on the other hand…
The operative word here is “may” and determining which threat events will turn into loss events is a function of Vulnerability (V). As shown in Figure 1, Vulnerability(V) is a function of Threat Capability (TCap) and Resistance Strength (RS).
Case Study – SysAdmins Accessing PII
Still with me? To get an idea how this works with cybersecurity, let’s evaluate the Threat Event Frequency (TEF) for my System Administrator (SysAdmins) team exploiting Personally Identifiable Information (PII).
1. Estimate the TEF = Contact Frequency (CF) x Probability of Action (PofA)
- Since we’re talking about SysAdmins we can assume a high CF given their access to the network and applications running on the network.
- The PofA is probably low since most SysAdmins are good people and trusted employees.
- From Table 2, we therefore estimate a Low TEF.
2. Estimate the Vulnerability (V) = Threat Capability (TCap) x Resistance Strength (RS)
- To estimate TCap we need to assess the SysAdmin’s skill (knowledge and experience) and resources (time and materials) they bring to bear, versus the overall threat actor community. SysAdmins are generally highly skilled with the time and materials to do great damage. In addition, my SysAdmins have all gone through SANS training so they are quite astute when it comes to security vulnerabilities, controls and exploits. Therefore, I’m estimating my SysAdmins TCap to be Very High (VH), equivalent to the top 2% of the threat actor population.
- To calculate RS we need to evaluate the controls in place to resist negative action taken by the threat community (SysAdmins). On my network all PII is stored encrypted with strong key management. In addition, all users with access to PII must use two-factor authentication with a one-time-password token (Google Authenticator). Because of this, I estimate my RS is also Very High(VH), protecting against all but the top 2% of the threat actor population.
- As shown in Table 3, we therefore estimate a Medium V.
You’re probably wondering why a Very High TCap and a Very High RS result in a Medium Vulnerability (V)? This was my first thought, too. However, in this example a Very High RS means the SysAdmins must jump through some significant hoops to catch PII in the clear. Yes, the SysAdmins have the contact, knowledge and skills to do this, but the risk of being detected while stealing the PII is very high because of what’s needed to overcome the RS. In the end, a Very High RS trumps the Very High TCap, resulting in a Medium Vulnerability (V).
3. Finally, estimate Loss Event Frequency (LEF) = Threat Event Frequency (TEF) x Vulnerability (V)
So the end result of this short analysis is my LEF for my SysAdmins exploiting my PII data is Low.
How does this help? Once I compute the Loss Magnitude (LM) then I can calculate the risk I face from my SysAdmins.
The beauty of this model is we can assign any values we want to the categories (eg. Low is $1, $100,000, $1M, etc.). The challenge of this model is we can assign any values we want to the categories! This makes it really hard to quantify estimated risk.
The good news is we can quantify estimated risk. As discussed above, the majority of the Open FAIR factors are distributions (Min, Max, Ave and Mode). The reason for distributions rather than discrete numbers is the level of uncertainty for each of these factors. For example, from the above example, I say my SysAdmins have a very high Threat Capability (TCap). In reality, most are very capable (Mode), but newer hires might be much less capable (Min) and my long-time employees might be extra-extra-capable (Max). Similarly, the Probability of Action (PofA) might be extra-extra-low for the long-time employees and much higher for the most recent hires.
So, how do we deal with data that has significant uncertainty? We can use Monte Carlo (they speak French there, don’t ya know?) simulations to quantify our Open FAIR factors. For those not familiar with Monte Carlo simulations (or, for those of us who learned it once and quickly forgot it), it’s a means to analyze data with significant uncertainty. The Monte Carlo process analyzes thousands of scenarios to “create a more accurate and defensible depiction of probability given the uncertainty of the inputs.”
The output of the Monte Carlo analysis looks something like this:
A few key points about Table 5:
- This is one loss event scenario. For example, this might be the above case of a SysAdmin exploiting PII. We would run other analyses for other employees (Executives, Staff, etc.) exploiting PII.
- In this scenario, we’re looking at a minimum of one primary loss event every twenty years, a maximum of about once every two years and a most likely frequency of about once every seven years. Similarly, we’re looking at a minimum primary loss magnitude of approximately $70K/event, a maximum of approximately $780K/event and a most likely frequency of $440K/event.
- On an annualized basis we’re looking at a most likely Total Loss Exposure (Primary and Secondary) of approximately $170K/year.
We’ve done it! We’ve converted our qualitative assessment to a quantitative assessment.
You’re probably wondering if I’m being lazy by listing out approximate numbers when the table shows nice discrete numbers? As Jack Jones drives home in his book, the challenge with using discrete numbers is it implies a highly unrealistic level of precision.
FAIR Thee Well!
We’re almost there! Before pulling this all together, it’s important to emphasize how Open FAIR differs from other risk approaches. From my perspective, there are three key differences:
- It’s an ontology of the fundamental risk factors. It establishes a lingua franca of risk to compare different risk situations on a common plane. For example, it allows us to discuss and compare cybersecurity risk with financial risk with health risk, etc.
- It is a means to establish probability of future risk. Just as ROI/ROSI is ineffective in projecting future returns, checklist-based risk assessments are not effective predictors of future risk.
- It’s reproducible, transparent and its underlying assumptions are defensible
Giving The Gordon-Loeb Model a FAIR Shake
The way I see it, calculating potential loss is a means to an end for the Gordon-Loeb Model versus being the end in itself for Open FAIR. In other words, the power of the Gordon-Loeb Model is its impact on making cybersecurity investment decisions and the power of Open FAIR is its impact on estimating cybersecurity loss.
What Professor Gordon stressed to me is the potential value of applying the fundamentals of the Gordon-Loeb Model to Open FAIR to determine optimum investment. These fundamentals are (please go here for background):
- Focus on the underlying assumptions. Like the Gordon-Loeb Model, Open FAIR has a number of underlying assumptions. If we can’t rationally explain the assumptions then the outcomes are suspect.
- Invest, but invest wisely. As discussed in my last post, the cybersecurity investment value increases at a decreasing rate. The Gordon-Loeb Model (after going through some significant math gymnastics) projects, on average, we should invest ≤ 37% of expected loss.
- Related to the above point, cybersecurity investments have productivity functions: the first million invested is more effective than the second which is more effective than the third, etc. This leads to the following:
- There is no such thing as perfect security
- The optimal level of investment does not always increase with the level of vulnerability/threat. The best payoff often comes from mid-level vulnerability/threat investments
This is very exciting! We have a model and process to determine what we should spend on cybersecurity. Yeah! However, we still need to figure out what we should spend this money on. In my next two posts I’m going to lay out both a qualitative and a quantitative approach to answering this question. In the first post I will discuss how to use the SANS 20 controls to evaluate (qualitative) potential control investments. In the second post (the last post in this series) I’ll walk through an example using the Open FAIR methodology/ontology to evaluate (quantitative) potential control investments.
For those readers sticking with me until this point, thank you so much! I know this was a really long post, but I couldn’t find a logical point to break it into multiple posts and still retain its flow and value. Please add your comments so we turn this from a not-so-quick read to an ongoing and engaging discussion.