Friday, January 22, 2010

Analyzing the Google Attacks - Plenty of Room for Mistakes

SecureWorks has posted an analysis of the malicious code alleged to have been used to attack Google and other companies, collectively referred to as "Operation Aurora". SecureWorks' posting is one of the first pieces of evidence and technical analysis that goes beyond simple speculation.

The analysis centers on a somewhat unique piece of error-correcting code (called a CRC) that appears to have been developed in China and only published in Chinese language papers.

It is great that some good technical analysis is starting to come out and I recommend those interested to read the posting. It has some technical information but the main points should be accessible to non-technical readers.

However, from an investigator's point of view, there are some shortcomings in this type of analysis and it might prove interesting to discuss a few of these.

Most technical analysis focuses on answering what happened and how an incident occurs. Technicians can reverse engineer (malicious) code, analyze network traffic patterns and review logs of system activity to understand how someone gained access to a system and what they did. This analysis is a very necessary and important step. However, just knowing how an incident occurs is not enough for security professionals.

To understand the risk from these types of attacks requires more information. If our response to an attack is solely based on how it occurs then we risk wasting resources by over reacting or misapplying controls that are ineffective (one of the most common problems in information security). The current Google incident is a perfect example.

Based on the current public information there is tremendous speculation that this may be sponsored or directed by the Chinese government for the purposes of espionage. If that is true, it requires a significant reaction both in terms of spending by (potential) targets and by action from other governments. However, if this is being carried out by a group of teenagers in Romania (just using Chinese systems as a front) simply for the technical challenge, our response can and should be completely different. Therefore, understanding who the adversaries are and their motives changes the risk equation and our response to it (additionally, we need to understand capabilities but that's another topic).

We need to answer not only what happened and how but also by whom and why.

Here we often hit a brick wall: Due to the virtual nature of data and the Internet, it can be very difficult to clearly identify who and why - yet it is not impossible. Unfortunately, many technicians take the what and how information and try to infer answers to who and why- often with poor results.

Inference chains, or inference concatenates, are used by intelligence analysts, investigators and prosecutors to link data points and evidence to develop a conclusion based on what is known or to prove guilt based on evidence. Inference chains can be either weak or strong. Unfortunately, most inferences used to determine "who" perpetrates a cyber attack are weak.

With this in mind, let's go back and look at the technical analysis and where it might have some shortcomings or problems.

One example is the following quote from the SecureWorks posting:
"...outside of the fact that PRC IP addresses have been used as control servers in the attacks, there is no "hard evidence" of involvement of the PRC or any agents thereof."
It is great to see such caution in analysis but it needs to go a little further: How do we know the PRC (People's Republic of China) IP addresses "prove" any involvement by anyone in China or their agents? It might be someone outside of China using a Chinese system. Until we know exactly who the perpetrators are, we don't know what their affiliation with the PRC is. Therefore, we would say that to infer that the perpetrator is Chinese based solely on the use of a Chinese IP address is weak: It might be true but it might not.

Another example of this problem is the conclusion that because the (legitimate) CRC used in the malicious code appears to have been developed in China, the perpetrators must be Chinese (again, using what information to infer who).

The post describes the CRC code and that it appears to have been created in China and only published in simplified Chinese (a form of written Chinese promoted by the PRC).

The inference chain then goes like this:
  1. A specialized CRC code (called CRC-16) was created in China for legitimate purposes;
  2. A simple Google search returns only references to the CRC code in simplified Chinese papers;
  3. Malicious code was developed that, in part, uses a CRC that "matches the structural implementation" of the CRC-16 code;
  4. The malicious code was used to attack Google (and others);
  5. Therefore, the "use of this unique CRC implementation in Hydraq [the malicious code] is evidence that someone from within the PRC authored the Aurora codebase".
Is this a strong inference chain? Not if other reasonable conclusions could be drawn. For example, simplified Chinese is also use in Singapore. We could also equally conclude (based solely on the inference chain above) that the perpetrator was in Singapore. Or perhaps a Chinese emigrant living in France. Within reason, we could come of up several other possibilities.

Again, it may be true, but it may not. Does this level of analysis, common in technical cyber crime studies, give us the information we need to react appropriately (technically, legally or politically) to the threat?

One additional problem with the analysis/inference is to rely solely on a simple Google search and conclude that it represents an exhaustive search of the whole space where the articles related to the CRC-16 code could have been published.

I don't want to be overly harsh on this particular analysis. As I said earlier, I think it answers some important what and how questions and, at a technical level, is an excellent piece of work: We need more like it. Likewise, it does provide some very interesting data points that can begin to be used to build the circumstantial evidence needed to answer the who and why questions. However, that will require more information (both technical and non-technical) to build strong inference chains that point to a single, reasonable conclusion. This can be done but with large international cyber cases it requires significant time, data collection and analysis of literally thousands and thousands of data points. It also requires intelligence and analysis of more than just technical information.

Unfortunately, this rarely happens.

We need to be very careful in how we infer the who and why of international cyber crimes. The consequences of making a mistake could be disastrous.


Operation Aurora: Clues in the Code

No comments: