How is Malware Researched?

Posted on by

You may wonder what goes on behind the scenes between the time when researchers get malware samples and the time virus definitions magically appear on your machine. All researchers do things a little differently, but there are some basic techniques that are common to a lot of them. For instance, which technique a researcher uses as his or her first or primary method of analyzing a file will depend on a lot of things. Some might seem obvious (such as how complicated or armored is the threat or how quickly do we need to analyze this), while others might seem a bit odd (how old is the researcher?).

There are two basic types of research: static and dynamic. Static analysis is a bit like dissecting a frog in a laboratory, where the sample isn’t moving around doing froggy things but you can see what it’s designed to do. Dynamic analysis is like setting up a lab for live frogs, but all you can see is frogs doing their lively froggy things.

Part 1: Static Analysis – Under the Microscope


Text View

There are a few ways to statically analyze malware, which are all different ways to represent machine code (the language any computer speaks) into things a human can make some sense out of. There are three levels at which you can view any file – Text, hex and assembler. Text is…well, what you’re viewing right now. It’s a pretty imperfect and difficult way to view a file, because not all codes have a visible representation. And a lot of the codes translate to weird-looking characters like squiggly lines and happy faces. If you’ve ever tried to view a file that is not strictly speaking a text file in a text viewer, you see what I mean.

If a researcher understands what a sequence of characters represents in machine code, or if they are just looking for words that might give clues as to the function of a file, it can be a very quick and dirty way to figure out where to go next. The only researchers I’ve ever met who look at a file only in text view are those guys that worked on computers way back when you had to program computers on huge stacks of paper punch cards. The rest of us look on at researchers using this technique as if we’re watching a mystic that reads the future by casting bones or reading tea leaves.

In the image below, you can see a file that’s shown in both text and hex view.


Hex View

The next way of viewing the file is in hex. This is a representation of machine code in hexadecimal, which looks a little weird if you think of counting as including numbers from 0 to 9, because it’s base sixteen instead of base ten. If you ever did alternate base math in school, this will probably give you horrible flashbacks. For anyone that needs a refresher: In hex, the numbers go from 0 to 9, but because you don’t stop counting at ten, you count from 0 to 9 and then keep going from A to F. And when you get to F, you start at 10 (which equals sixteen) and increment that last digit, as the numbers get larger.

As with viewing a file in text, you need to understand what those characters represent. Because hex shows you everything, not just printable characters, it’s a little easier. But to a lot of researchers, viewing a file in hex alone is still a bit mystical. If there is encryption or other sorts of obfuscation, it’ll look like a lot of gobbledygook. You’re probably not going to get much by way of hints in readable words.

Assembler View

Last and most definitely not least is assembler. This is the point at which the file starts looking a bit like a recognizable programming instructions that mere mortals can read. But because viewing a bunch of programming instructions is not linear like reading an article such as this one, it’s not like you can start at the top of a file in assembler view and just read it down to the end and understand what’s going on. There’s a lot of jumping and looping around that can be hard to follow when you’re looking at a flat representation of tens of thousands or even millions of instructions. But it’s a good place to start, and it’s the jumping off point for dynamic analysis.

In a Nutshell

You can quickly view the outside of the metaphorical frog, to see if you can get an idea what sort of animal you’re looking at, and what its basic structure and function are. Then you can dissect it to get a better idea of how those structures are linked together. Static analysis allows you to do that with a potential malware sample. While it’s seldom enough information just to look at the structures of a specimen, it can give you a lot of clues about how you might need to set up an environment for dynamic analysis, or what areas are interesting enough for more in-depth analysis. In How Is Malware Researched – Part 2, we will look at different types of dynamic analysis, which is what researchers use to understand how all those instructions come together to do their nefarious deeds.

photo credit: MuseumWales via photopin cc
photo credit: MuseumWales via photopin cc