That Anti-Virus Test You Read Might Not Be Accurate, and Here’s Why

Posted on January 9th, 2013 by Lysa Myers

Here’s a little-known fact within the anti-virus industry: it’s a very inbred industry. People move from one vendor to another, from one aspect of the industry to another, and because of this we’re all much more friendly and cooperative than competitive. Who knows which colleague will be your boss in 5 years’ time? And most of us are in this industry because we want to “save the world,” so to speak, not just to make the best widget. The ideal way to do this is to view the products we make as a customer would, so that we can make them better and more effective.

The vast majority of us in the security industry participate in cross-disciplinary groups (combining researchers, ISPs, domain registrars, law enforcement, academia, etc.) to help improve the overall quality of products or in information-sharing groups to help protect customers and bring cybercriminals to justice. It was participation in one of these groups that led to one of my own career-moves, which was with an independent testing lab. I had been loudly griping about the quality of tests, and I was given the opportunity to directly change methodology and indirectly improve the testing industry to help move testing from a very outdated view to one that is generally more fair, balanced, and representative.

From that time in a very vendor-neutral environment, I not only gained a nuanced view of how hard making a truly good test is, I also gained a much more realistic view of the capabilities of different types of security products. You will never hear me promise a product is bulletproof or perfect. Humans make software, and humans make errors. And this is also why we say the best defense is layered – each layer complements the other to help bolster your overall protection. Independent testing is there to help make products better. But bad tests do nothing to help. They give people a skewed version of reality that can cause them to make poor decisions because they either feel overly safe or overly endangered.

In November, we discussed an article that had been posted about a test published by Imperva, which “proved” that Windows anti-virus programs are effective only 5% of the time. The test was utterly meaningless, as it used less than 100 samples out of the millions of existing Windows malware, and it compared only detection in VirusTotal reports rather than in the actual products. Even VirusTotal says this is a bad idea.

Thomas’ Tech Corner did a test too, a few months back, on Mac anti-virus products. It wasn’t as wildly ridiculous as the Imperva test – it was only stating that Mac anti-virus products were closer to 70% detection ratings. But it has some similar problems to the Imperva test. When you start digging further into the methodology and results, things start to fall down quickly.

Once again, we run into the problem of sample selection. In this case, 51 samples out of thousands of Mac malware is a whole lot less absurd than 82 out of millions. But still, the sample set is very small. It’s customary for testing labs to use at least several hundred samples in reviews of Mac anti-virus products. AV-Comparatives’ recent comparative used 477 samples, for instance.

Another aspect of sample selection is the quality or “relevance” of samples. Ideally, the sample set should only include things that have been verified to have infected customers if you’re going to make statements about whether the product will effectively protect customers. This will exclude samples that are strange corrupted corner-cases, rare malware that only exist in one company’s Zoo, ancient antique malware, harmless remnants, etc. In the Tech Corner case, several of the samples fall into the category of those rare cases that customers are highly unlikely ever to encounter. Extrapolating current AV protection by way of testing samples that pose no danger simply makes no sense.

The next important aspect of testing is the methodology – as much as possible, does the method of testing put the products into a real-world scenario? Does the test try to put the products on roughly equal footing, or will time, settings, and features bias the results? Are the results recorded and repeatable? Is there a way for the software vendors to communicate with the tester to verify the results prior to publishing? Testers are human and testing is incredibly costly, complicated, and difficult. It’s not possible to make a test that makes everything 100% equal, fair, and equivalent to what a customer would experience, but there are tests that come close and those that do not.

When we first looked into Thomas Reed’s Tech Corner review and methodology, we discussed the results with some folks who are involved with the Anti-Malware Testing Standards Organization (AMTSO). While we all applauded that Reed published and included his methodology, we came to the conclusion that the testing methodology used in these tests was flawed in a number of ways that pretty much invalidated the published results.

In terms of replicating a real-world scenario, there are definitely problems. The samples are not representative of what a user would encounter, which is decidedly not ideal. And the author states that he is testing without the on-access scanner, which is how detection would happen in most real-world situations. This is a common scenario amongst even the most highly regarded testing labs, as running on-access tests is unbelievably time-consuming. On Macs in particular, this is extra difficult as OS X has its own countermeasures against running malware, which could interfere with results.

In terms of creating a level playing field between the products, we run into more issues. Reed chose 19 different products, some free and some for-pay, some geared towards business users and some towards home users. The feature-sets on these products will naturally be fairly different – free products are generally less feature-rich than paid products, and business users’ needs are not the same as home users’. He also chose to use all these products with mostly default settings (but not all), which again skews the results. And then the tests are performed over the course of several days, which favors the products that are tested (and updated) later.

The results Reed gets, based on his stated methodology, are a mixed bag. He stated that some products made recording results difficult, and he went through some impressive gyrations to do his best to record things anyway, and that is commendable. The difficulty here lies in repeatability: We tried in our own research labs to replicate the results Reed got, based on the published methodology. Granted, we admit that our view is biased; however, we were able to reproduce some of his findings but not all of them. In speaking with the author after the test was published about this discrepancy, he too found that he couldn’t repeat the results that were published and he could not explain why this was the case. He notes this in his updates after the test, saying it was likely because he had not properly updated the virus definitions before testing.

As you may have inferred from the last paragraph, there was no attempt by Reed to contact the software vendors to communicate the results with them. We were unable to confer with the tester to verify the results prior to publishing. It’s understandable that reviewers don’t want to be bullied by vendors out of publishing valid, if unfavorable, reviews. But testers can proudly stand by their results if they can withstand scrutiny about their methodology.

For better or worse, anyone can write anything they want and publish it. Product companies have to deal with this all the time, which is why we always follow up with reviewers and ask lots of questions. Typically companies are provided the testing results prior to publication for just this reason, because reviewers really want to get it right. None of that happened in this case.

Anti-virus is a difficult space to be in, in part because there is always a ton of controversy, especially over products’ effectiveness. But those of us in the anti-virus industry genuinely want to continue making a better product, and we feel independent testing is a good way to help us do this. The flip side of the lack of privacy on the Internet is that it makes everyone’s voice a little louder. It can be hard to discern which are the reputable, scientifically valid sources of information and which information does not stand up to scrutiny.

Entire websites exist solely to debunk false information and outright scams. That is part of why AMTSO came to exist – its guidelines provide criteria for both testers creating methodology and people reading tests to apply to reviews to decide what is useful and what is not. Hopefully this information will be more widely used and tests will continue to improve so that we can then use that information to improve our products. And then we’ll all live happily ever after.

photo credit: happy via via photopin cc

Share

Even Macs need antivirus software.