Sunday, July 12, 2015

Simply Explained: Why Do We Need So Many Security Test Things?

Software is like entropy. It is difficult to grasp, weighs nothing, and obeys the second law of thermodynamics; i.e. it always increases.
- Norman Ralph Augustine

Why is security testing so complicated? Why do we need so many kinds of security tests for web applications including both dynamic analysis (or DAST) and static analysis or (SAST) - in addition to threat modeling, whatever that is? 

During the recent controversy on airplane security topics, I finally thought of an analogy to explain why we need so many different kinds of activities in software security lifecycle endeavors. Typically among the first questions asked about software security is something to the effect of,  "should we use this vulnerability scanner or that?" The answer, of course, is that you need more than a simple vulnerability scan to test modern web applications. 

In the vulnerability management world, things are simpler - scanners interrogate listening ports and test binary services for known vulnerabilities using a catalogue of known flaws. It's a simple test -because- there is a catalogue; the list of vulnerabilities you're looking for has been provided in advance by security researchers and / or vulnerability management vendors.These assumptions break down in the web application space because web applications tend to be unique; they've not been tested or assessed before, so there is no catalogue or list of known flaws to look for. There are generic tests for well-known classes of issues, but these have widely varying effectiveness, and no single test or "scanner" can thoroughly assess any web application running on any framework.

An analogy with airplane manufacturing is one way to think about this. If dynamic analysis is analogous to the wind tunnel test for an airplane, think of static analysis as a sort of X-Ray. 

In the aviation world, they perform complex X-ray imaging techniques like computed tomography (CT). These kinds of tests are used to find subtle defects in materials or cases where precision tolerances are out of spec. These kinds of defects in materials and workmanship might never be detected in a wind tunnel test because the necessary conditions for failure aren't quite right or the duration isn't long enough (you can't keep the plane in the wind tunnel for years in order to perform a complete simulation of its life cycle). In the software world, we perform static analysis as a sort of "X-ray" to look for subtle flaws and imprecise tolerances in the code that might fail at some point in the future if conditions become just right.

Perhaps better known is the so-called "wind tunnel" test where the aircraft is exposed to in-flight conditions to test performance and find out what it takes to create failure. Airplane engine testing includes putting high velocity water and hail, and even dead birds, into the engine while running to see if the engine can handle unwanted input without failure. This is sort of similar to dynamic software analysis where an running application is exposed to inappropriate input in order to see if it fails or if it recovers and keeps running. Instead of hail and dead birds, during security testing, we feed software various kinds of unexpected or unwanted data and input. 

The reason these automated tests are accompanied by manual testing  - the so-called "pentesting" - is that the machines can't think and the test designers can't anticipate every possible failure state in every application. In the airplane world, there are only a few things that can reasonably be expected to interfere with a running engine - clouds, rain, hail, fog, birds, lightning, possibly supercooled liquid water, and various atmospheric gases. There just aren't that many kinds of things in the air. In the software world, abuse cases are supernumerary - there are millions and millions of permutations of unwanted data inputs that can be thrown at a running application. While not all make sense to test, and only a handful may actually cause failure, sometimes only a human can figure out just the right input to produce a failure state that is "exploitable" - meaning it creates failure in a way that yields control of the program.

Lastly, the activity known as threat modeling is also sort of poorly understood. Threat modeling is a sort of systematic design review that seeks to uncover design flaws - often design aspects based on assumptions made during the design phase that, while convenient or expedient, prove to be flawed upon closer examination. In other words, something is working as designed, but the design is insecure. An example in the aviation world would be reviewing the design of the onboard networks to see if there are any potential interconnections between control and entertainment systems that could be abused by a malicious passenger under the right conditions. Security design flaws, and their underlying assumptions, cannot usually be found by any automated test, at least not until the machines can think. 

The output of threat modeling is often more likely to be design flaws (things that are working as designed, but whose design is insecure) more than security defects (things that are insecure because they fail when exposed to unexpected conditions). 

Hopefully this explains why we have so many kinds of test cycles. Each test activity finds things the others cannot, and the only way to thoroughly test and assess product security is to use all three techniques. When the machines can think, and can create their own code, we may have code with security quality indexes that are an order of magnitude higher than we have today. It seems likely that the first thought that will be articulated by the first sentient program, if and when it arrives, will be to critique the quality if the human-written code it was created with.

No comments: