Maliciousness in Top-ranked Alexa Domains
by Paul Royal, Research Consultant
For the infographic associated with this post, see http://www.barracudalabs.com/goodsitesbad.
At Barracuda Labs, we use a variety of research technologies to identify and study maliciousness on the web. One of these tools is an automated system that forces a web browser inside a Windows virtual machine to visit a URL to see what happens to the browser, its plugins, and the operating system. The resulting network-level actions of the virtual machine help us determine, without prior knowledge of specific exploits served to the browser or its extensions, whether a URL serves malicious content.
A few months ago we began using the above-described system to examine the Alexa 25,000 most popular domains. As these sites are popular and long-lived, many people assume that it is safe to visit them. However, automated examination of the Alexa top 25,000 each day for the month of February 2012-which found 58 sites serving drive-by download exploits-shows that this assumption does not always hold.
While Alexa does not publish the total number of page views it uses to determine site rankings, there exists sufficient information to determine that number. As an example, Wikipedia, which represented ~0.54% of total Alexa views in February 2012, reported ~15.75 billion views for the previous month. Working backwards, we can thus calculate that Alexa used an average of (15,756 * 1,000,000)/(29 * (0.5416/100)) = ~100.31 billion views each day to rank the popularity of websites.
Using the above number, we can calculate the affected views for a given site in a 24-hour period. As an example, free-tv-video-online[.]me, which via an ad network served visitors malicious content on February 13, represented ~0.0053% of the total Alexa views, which yields 5,366,895 affected views for that day. However, to estimate how many users were served exploit content, this number must be adjusted to account for the average number of views per user. Fortunately, Alexa makes this information available. Continuing with the example, free-tv-video-online[.]me has an average of 7.2 views per user. Thus, for this site, 5,366,895 views equates to 745,402 users served malicious content on February 13. Across all 58 sites that (directly or indirectly) served malicious content, there were 44,160,016 affected views from 10,541,379 users.
Of course, not every user served malicious content was compromised. To estimate the number of successfully exploited users, we used several different sources, including Wikipedia’s browser statistics. To begin, if we examine platform and browser popularity, only about half (or 50.81%) of users (who run Windows and IE or Firefox) possess properties conducive to exploitation.
To convert the number of possibly compromised users into those probably compromised, we conservatively adjusted according to the most popular mechanism of exploitation: the Java plugin. According to Adobe, 73% of PC users have the Java plugin installed. According to Qualys, 42% of users with the Java plugin installed have versions vulnerable to exploitation. Thus, of 10,541,379 users served malicious content, 42% (insecure Java) of 73% (Java installed) of 50.81% (Windows and Firefox/IE), or 1,642,172, were likely compromised.
In addition to our statistical analysis used to estimate the number of users compromised by visiting Alexa top-ranked domains that served malicious content, we offer the following summary observations:
- On average, two of the Alexa top 25,000 domains serve malicious content each day. Statistically, that means at least one popular website will serve malicious content every day.
- Alexa top-ranked domains served malicious content 23 (or 79%) of the days in February. That means this problem is not isolated and occurs on a continuous, regular basis.
- Alexa top-ranked domains that served malicious content spanned across 18 different countries. That means this problem has no geographic barrier.
- Over 97% of sites that served visitors malicious content were at least one year old; over half were on sites more than five years old. That means attackers use well-established, long-lived websites for their drive-by download campaigns.
A table that lists the 58 sites that served the visitor drive-by download exploits, including each site’s Alexa rank, when exploit content was served by the site, the number of affected views and users, and the subset that were likely compromised is available for download here. An archive containing packet capture (PCAP) files showing the exact sequence of events that led to system compromise can be obtained by requesting it through the Barracuda Labs Contact Form.