Cyber Criminals Defraud Display Advertisers with TDSS

We have previously shown how malware-driven traffic across websites costs display advertisers millions of dollars per month [1]. We have also shown how easy it is to generate this type of fake traffic—with fewer than 100 lines of C++ code [2]. In this post we provide the first case study to show how a well known malware rootkit is being used by cyber criminals today specifically to defraud online display advertisers. The case study is a display advertising analogue of a click-fraud study by Miller et al. [3].

In our investigations into the origins of malware-driven traffic across websites we discovered a TDSS rootkit with dll32.dll and dll64.dll payloads. TDSS has been described by Kaspersky as “the most sophisticated threat today” [4]. In this post we show how hijacked PCs controlled by these TDSS payloads impersonate real website visitors across target webpages on which display ad inventory is being sold. We show in this post how this fake traffic is being sold to publishers today through the ClickIce ad exchange. We show further in this post that some unscrupulous publishers are not just knowingly buying this fake traffic. They are in fact optimising their webpage layouts for this fake traffic.

We recorded activity on a hijacked PC controlled by one of these payloads. We have included this below.

TDSS Rootkits

Four versions of the TDSS rootkit have been developed to date. The first version of the rootkit, TDL-1, was developed in 2008 [4,5]. TDSS is known for its resilience. Not only does it hook into drivers and the master boot record—enabling it to be executed early in the startup process. It also contains its own anti-virus system for stripping out competing malware [4,6].

TDSS comprises three core components: the dropper, the rootkit and a payload DLL. The dropper contains an encrypted version of an infector. On execution the dropper decrypts the infector and replaces the original dropper thread [5,6]. The infector creates a hidden file system, decrypts the rootkit and copies it to the master boot record [6]. Finally the infector removes all trace of the dropper and the newly infected system is rebooted.

The third core component of TDSS is the payload. TDSS payloads take the form of a DLL module, which is injected into a user-level process to avoid detection. TDSS’s default payload is TDLCMD. This can connect to a hard-coded TDSS command-and-control server and download further specialised payloads [4,7]. These specialised payloads can be used to perform DDoS attacks, redirect search results and open popup browser windows.

Impersonating Real Website Visitors

Today’s display advertisers use increasingly sophisticated algorithms to target the right banner or rich-media ads at the right website visitors at the right time. These algorithms consider the cookies of individual website visitors and they analyse the browsing history, purchase history, ad-viewing history and ad-engagement history associated with each cookie in real time to determine whether an ad slot should be bought and some specific ad creative should be shown to this specific visitor at this specific time [8].

In this section we describe the activity of PCs infected with TDSS rootkits in a controlled environment. The activity is governed by dll32.dll and dll64.dll payloads. We describe how these payloads impersonate real website visitors across target webpages to the extent that display advertisers mistakenly target their ads at the TDSS bots.

Opening multiple hidden Internet Explorer browser windows
The payload controlling the infected PC opens multiple hidden browser windows. Each browser window is an embedded instance of the version of Internet Explorer already installed on the infected PC. In the screenshot below we show that the hidden browser windows are not listed by Task Manager. We used Nir Sofer’s WinLister to reveal the hidden windows, the names of which start with “clk” [9].

Sharing the PC owner’s cookies
Each hidden browser instance uses the default Internet Explorer cookie store of the infected PC. This would enable the payload to access the PC owner’s Facebook account and to access his/her emails. This particular payload, however, simply borrows the cookies of the unwitting PC owner and then visits target ad-supported webpages impersonating this person. The botnet herder is in effect selling the rich browsing and purchase history of the PC owner to publishers who, in turn, sell these cookies to advertisers. If advertisers are willing to pay more to target these cookies, then publishers will earn more. And, in turn, the suppliers of this type of traffic will also earn more. Retargeting advertisers, for example, will often pay 10 times, 100 times or even 1000 times more to target website visitors with the appropriate cookie history [10].

Requesting target sites from a command-and-control server
The TDSS payload requests target URLs from the command-and-control server by making an HTTP request of the following form:

http://######.com/script.php?sid=ID&q=keywords&ref=http://spoofed-referer&ua=spoofed-user-agent&lang=language

Spoofing the Referring URL
The payload spoofs referring URLs when target webpages are requested. These URLs are never visited by the bot, however it will appear to the target website as though the bot was previously visiting these URLs. The following are examples of spoofed referring URLs used by the payload:

http://duckduckgo.com/?q=auto+insurance+quote+in+hudson+florida
http://blekko.com/ws/?q=online+college+courses+bay+area
http://www.iseek.com/iseek/search.html?query=business+insurance+osha

NB The DuckDuckGo website is served over HTTPS by default, so the spoofed referring URL above is inconsistent with genuine referral from the DuckDuckGo website.

Spoofing the User-Agent header
Regardless of the actual browser or operating system version, the payload controlling the infected PC sets the User-Agent header reported by each hidden browser instance to be Internet Explorer 10 on Windows 7.

Spoofing mouse traces and click events
Once the target webpage has been loaded in a hidden browser window, the payload spoofs engagement with the webpage by spoofing mouse traces and click events across the webpage and across any ads embedded within the webpage. The screenshot below illustrates spoofed mousemove events.

Spoofing geometric viewability
It is a common misconception that if a display ad impression is served to a malware bot, then today’s viewability measurement companies will report the ad impression as not being viewable. These TDSS payloads expose this view as being mistaken.

The traditional (geometric) approach to measuring display ad viewability treats an ad as being viewable if it is enclosed within the boundaries of the browser window (taking scrolling and resizing into account) [13]. Now in the case of the TDSS payload, payload-controlled browser windows are treated by the infected PC as being maximised—despite the fact that these browser windows are hidden. That is it to say, each browser window is treated as being the size of the screen and the top-left corner of each window is positioned at the top-left corner of the screen. All display ads enclosed within these maximised browser windows will be reported as being viewable according to the traditional approach to viewability measurement.

Selling Fake Traffic through the ClickIce Ad Exchange

All traffic generated by the TDSS payloads is sold through a pay-per-click ad network called ClickIce, which ostensibly offers publishers the opportunity to buy traffic (in the form of pay-per-click text ads) from “thousands of small search sites and traffic partners.”

The following diagram illustrates the process:

Each time the TDSS bot makes an HTTP request to its command-and-control server for a target webpage, the command-and-control server then uses the information in this HTTP request together with the IP address of the bot to trigger an ad auction on ClickIce. If a publisher chooses to buy the visitor, then the details of text ad (never to be shown) are sent to the command-and-control server and these details are relayed directly to the bot. The following is an example of such a text ad:

<?xml version="1.0" encoding="UTF-8"?>
<records>
<query>attorneys that have successfully sure auto owners insurance company</query>
<record>
        <title><![CDATA[Sexy Shapewear To Hide Those Holiday Calories]]></title>
        <description><![CDATA[For all of you ladies who have been exercising willpower with food over the holidays, 
        we come bearing good news. Turns out, you can have your cake and eat it to.]]></description>
        <url><![CDATA[D######ion.com]]></url>
        <clickurl><![CDATA[http://#######.209.115/click.php?c=e47#####73856]]></clickurl>
        <bid>0.000399</bid>
</record>
</records>

When the bot receives the text ad it then opens the target webpage of the publisher by following the click URL provided in this text ad. To the publisher it will appear as though a text ad has been shown on the webpage that is listed in the spoofed referring URL and that a website visitor has clicked on this ad. The botnet herder will earn a share of the revenue earned by ClickIce.

Only a fraction of the TDSS-generated traffic sold by ClickIce is sold directly to publishers—around 12% in our analysis. Typically TDSS-generated traffic is passed on by ClickIce to other ad networks through which the traffic is then sold to publishers. Three networks previously reported to be supplying suspicious traffic were seen in our analysis to be selling TDSS traffic supplied by ClickIce. These networks were AdKnowledge, Findology and Jema Media [11,12].

Publishers Optimising for this Fake Traffic

As seen in our screencast, many of the webpages visited by TDSS bots were from “normal” websites—for example TheRisingHollywood.com and Fox News’s uReport.FoxNews.com. This said, many webpages with spam content were also visited—webpages full of display ads and links, with no CSS, styling, images, or content.

Webpages with this spam content are not publicly accessible. It is very difficult to find this spam content by simply browsing the web. When a redirect URL in a ClickIce text ad is followed, a first-party PHP session cookie, PHPSESSID, is set that marks the visitor as a bot and causes the website to show only spam content to this bot. Without a bot PHPSESSID normal webpages are shown to the visitor. This means that some publishers are not just knowingly buying malware-driven traffic. They are in fact optimising their webpage layouts for this malware-driven traffic.

The URLs that link to webpages optimised for TDSS traffic take either of two forms, where spam-domain is one of over forty domains and n is a number:

http://spam-domain.com/landing/?count=n
http://spam-domain.com/gw/?p=n

The sites are hosted on sequential IP addresses in nine distinct ranges. The sites are niche Flash game and video sites following the same layout with slight variations in styling.

64.120.163.166    boardgameman.com
64.120.163.167    dressupenjoygames.com
64.120.163.169    educationgeniusgames.com
 
64.120.175.242    actiongamesteam.com
64.120.175.243    fightingforcegames.com
64.120.175.244    worldstrategygames.com
64.120.175.245    shootingarmsgames.com
64.120.175.246    drivingenergygames.com
 
67.213.218.61     egametrailer.com
67.213.218.61     frag-movies.com
67.213.218.61     gladlygames.com
67.213.218.61     mototestdrive.com
67.213.218.61     travelmoviesguide.com
67.213.218.63     recentcartoons.com
 
109.206.178.226   darlingcartoons.com
109.206.178.227   motobikeshow.com
109.206.178.228   shootermovies.com
109.206.178.229   traveldeluxeguide.com
109.206.178.230   vgametrailer.com
 
109.206.179.91    animeslashmanga.com
109.206.179.92    wildanimalearth.com
109.206.179.110   friendlyanimalvideos.com
109.206.179.112   monstertrucksvideos.com
109.206.179.113   popularmusicmovies.com
 
173.214.248.84    puzzleovergames.com
173.214.248.86    mysterypuzzlegames.com
173.214.248.87    senserhythmgames.com
173.214.248.88    madrhythmgames.com
173.214.248.89    worldtrendvideos.com
173.214.248.90    wunderwaffemechanism.com
 
184.22.217.2      victoryboardgames.com
184.22.217.3      dressuppartygames.com
184.22.217.4      smarteducationgames.com
 
184.82.130.179    dodancing.com
184.82.130.180    extremesportman.com
184.82.130.181    fightduel.com
184.82.130.182    fitnessvideoscentre.com
184.82.149.123    sportsgroundgames.com
184.82.149.124    actiongamesplace.com
 
198.7.56.67       drivingforcegames.com
198.7.56.68       shootingfiregames.com
198.7.56.71       fightingpowergames.com
198.7.56.72       globalstrategygames.com

Some example webpages are shown below, contrasting the content that is shown to bots and the content that is shown to humans. All the links shown that may be seen on the bot-optimised pages are internal links within the spam website, so bot traffic cannot escape these spam websites other than by ultimately clicking on a display ad.

Respectively, links to the web content shown in screenshots above are as follows.

http://senserhythmgames.com/game/7ac2695362bb755f (Content for human visitors)
http://senserhythmgames.com/landing/?count=1 (Redirect URL to mark the visitor as a bot)

http://traveldeluxeguide.com/video/FhNw_sclgBE (Content for human visitors)
http://traveldeluxeguide.com/gw/?p=0 (Redirect URL to mark the visitor as a bot)

http://vgametrailer.com/movie/2U_qO_c3y2I (Content for human visitors)
http://vgametrailer.com/gw/?p=0 (Redirect URL to mark the visitor as a bot)

Concluding Thoughts

In our previous post we showed how easy it is to use malware to defraud display advertisers [2]. In this post we showed how cyber criminals are using malware to defraud display advertisers in practice.

In this post we showed how PCs infected with TDSS rootkits and with dll32.dll and dll64.dll payloads impersonate real website visitors across ad-supported websites. We showed how this fake traffic is being sold to publishers through the ClickIce ad exchange. We also showed how publishers are not just knowingly buying this fake traffic. They are in fact optimising their webpage layouts specifically for this fake traffic.

References

[1] Discovered: Botnet Costing Display Advertisers over Six Million Dollars per Month – spider.io
[2] How to Defraud Display Advertisers with Zeus – spider.io
[3] What’s Clicking What? Techniques and Innovations of Today’s Clickbots – Miller et al.
[4] TDL4 – Top Bot – Kaspersky
[5] TDSS – Kaspersky
[6] Threat Advisory: TDSS.rootkit – McAfee Labs
[7] TDSS botnet: full disclosure – No Bunkum
[8] Introduction to Computational Advertising: Targeting – Broder and Josifovski
[9] WinLister – Nir Sofer
[10] What Does A Display Advertising, Retargeting Campaign Look Like? – AdExchanger
[11] The Six Companies Fueling an Online Ad Crisis – AdWeek
[12] Companies battle to end ‘click fraud’ advertising – CTV News
[13] The Right Way to Measure Display Ad Viewability – spider.io

How to Defraud Display Advertisers with Zeus

In this post we show how easy it is to use Zeus malware to impersonate real website visitors and visit ad-supported websites of our choosing. Unscrupulous publishers are buying this type of malware-driven traffic today and they are then selling this traffic on to unsuspecting display advertisers [1]. Through better understanding, we believe that this problem can be solved.

We recorded activity generated by our custom Zeus malware. We have included this below.

Zeus is arguably the most infamous rootkit malware, originally developed for banking fraud and spread via drive-by-download [2,3,4,5]. It is infamous because the Zeus source code was leaked publicly in 2011, and a deeply disturbing, mushrooming crimeware ecosystem was born out of this leak [6]. Zeus has been reported as infecting at least 3.6 million PCs across the US [7].

The Zeus source code remains available on GitHub and has been widely studied [8,7,9]. In this post we show how easy it is to write and deploy a custom Zeus payload of fewer than 100 lines of C++ code to impersonate the unwitting owner of an infected PC whilst visiting ad-supported websites. The payload borrows the cookies of the PC owner whilst replaying real mouse traces and real click events across target webpages.

This post is a follow-on from an article featured by Wired, Online advertisers: unwittingly funding cybercriminals since 2011. The post also serves as a prelude to our next post, in which we will provide a detailed case study showing how publishers are buying this type of malware-driven website traffic today and display advertisers are being defrauded as a result.

Rootkits
Rootkits operate by sitting between a user application and the OS kernel system calls. They are able to hide themselves and perform malicious activities by providing a wrapper around the OS system calls and changing the responses.
There are two main methods of implementing rootkits: kernel-level and user-level. Kernel-level rootkits work by rewriting the kernel’s lookup table of system calls to point to wrapped versions. In this way the rootkit is able to hide all its files and processes from any user-level application such as Explorer or TaskManager. The rootkit’s version of any system call, for example FindFile, calls the kernel version and then filters the response to remove any of the rootkit’s files from the response. User programs receive a list of files with all the rootkit’s files removed.
The problem with a kernel-level rootkit is that it is relatively easy to detect. This is because the kernel’s system call table is different from the original call table. To tackle this problem user-level rootkits like Zeus inject code into running processes and a thread is created within each process to run this code. The new thread hooks into the system calls to change the behaviour [10].
These threads allow Zeus to hide its files and processes, to read and write to network sockets, to inject data into webpages, to control the infected PC (shutdown, logout, run applications), to take screenshots, to take over the desktop, to access user accounts, to access user data, and to access data for known applications such as email, FTP, SSH, Internet Explorer and Firefox.
Perhaps the most powerful capability of Zeus is that it allows the command-and-control server to specify any script, program, command or URL to execute on the user’s machine. These “payloads” can send spam emails, mine bitcoins or defraud display advertisers. Because of this functional flexibility, PCs infected with Zeus are easily traded between botnet herders. Prices for PCs infected with Zeus have been reported as starting at $15 per 1,000 infected PCs [11].

 

A Custom Zeus Payload to Impersonate Real Website Visitors
We have shown previously that malware bots already routinely mimic human engagement with webpages and display ad creatives [12]. What has not yet been shown is how malware bots are able to build up the sort of cookie-based browsing history that encourages display advertisers to pay more to target these bots with ad creatives.

In this post we show that malware bots do not need to build up this sort of browsing history. They can simply borrow the cookies of the unwitting owners of infected PCs, and then wander the web impersonating these users. The ability to borrow the cookies of a PC owner enables a whole range of malicious activity, ranging from accessing the PC owner’s Facebook account to reading his/her emails. In this post we concern ourselves solely with the cookies used to target display advertising.

To help advertisers better understand the mechanics of malware-driven advertising fraud, we wrote our own Zeus payload, GhostVisitor. The GhostVisitor payload is a Windows executable that opens an invisible Internet Explorer window, borrows the cookies of the PC owner, visits target URLs sent from the command-and-control server, and replays real mouse traces and real click events across the target webpages. The whole payload is fewer than 100 lines of C++ code and can be deployed with a couple of clicks to all machines connected to our command-and-control server. The simplicity of the whole exercise has already led to the commoditisation of Zeus-powered traffic with an established black market for this type of traffic [13].

Deploying Our Custom Zeus Payload

The following diagram describes the process of deploying our payload from the command-and-control server:

The following is a screenshot of the main command-and-control dashboard showing one connected infected PC:

The following is a screenshot of the deployment command being sent:

Creating an Invisible Browser Window which Borrows Cookies

Our GhostVisitor payload creates an invisible Internet Explorer browser window sharing the existing Internet Explorer cookie store as follows:

/*
 * Open an invisible Internet Explorer browser window
 */
CoCreateInstance(CLSID_InternetExplorer, NULL, CLSCTX_LOCAL_SERVER,
    IID_IWebBrowser2, (void**)&browser_);

A list of target URLs are passed from the command-and-control server to the infected machine. All webpage visits initiated by the GhostVisitor payload use the PC’s existing Internet Explorer cookie store. This means that advertisers will associate the PC owner’s browsing history with our GhostVisitor bot—a history of web searches, completed purchases, successful logins and social network interactions. Advertisers target their ad creatives at these high-value cookies, even on “low-value” sites. And because of the money to be made, unscrupulous publishers buy this type of malware-driven traffic. They provide botnet herders lists of target URLs and these target URLs are then fed via command-and-control servers to the infected machines. Publishers either buy the traffic directly from the botnet herders or they buy the traffic through chains of middlemen, at least one of which is typically a cheap PPC/PPV network [1].

Replaying Real Mouse Traces and Real Click Events

When a target webpage has been opened, the GhostVisitor payload replays previously recorded mouse traces and clicks, giving the appearance of a real website visitor engaging appropriately with ads, with a realistic click-through rate.

The mouse traces and click events are sent to the browser as follows:

/*
 * The Windows API SendInput allows both mouse movements and 
 * clicks to be sent to the browser window 
 */
void GhostVisitor::mouseMove(DWORD x, DWORD y) {
    PostMessage(hwnd_, WM_MOUSEMOVE, 0, MAKELPARAM((short)x, (short)y))
}
void GhostVisitor::mouseClick(DWORD x, DWORD y) {
    PostMessage(hwnd_, WM_LBUTTONDOWN, 0, MAKELPARAM((short)x, (short)y));
    PostMessage(hwnd_, WM_LBUTTONUP, 0, MAKELPARAM((short)x, (short)y));
}
/*
 * Mouse traces recorded from real users can be replayed
 * by generating mouse movements and clicks at the appropriate
 * time and coordinates
 */
...
    // encoded mouse trace, consisting of moves and clicks
    // at specific co-ordinates and times
    int mouseTrace[] ={ ... };
...
    for(int i = 0; mouseTrace[i] >= 0; i+=4) {
        type = mouseTrace[i]; time = mouseTrace[i+1];  
        x = mouseTrace[i+2]; y = mouseTrace[i+3];
        if(time > clockTime) {
            Sleep(time-clockTime);
            clockTime = time;
        }
        if(type == MOUSE_MOVE)
            mouseMove(x, y);
        else if(type == MOUSE_CLICK)
            mouseClick(x, y);
    }

Concluding Thoughts

In this post we have shown how easy it is to use Zeus malware to impersonate real website visitors and visit webpages of our choosing. With fewer than 100 lines of C++ code we have shown that we can generate traffic from residential IP addresses with real user cookies and with real mouse traces and click events.

In our next post we will provide a detailed case study showing how publishers are buying this type of malware-driven traffic today and display advertisers are being defrauded as a result. We believe that through better understanding advertisers will be able to better defend themselves against this type of fraud.

References

[1] Confessions of a Fake Web Traffic Buyer – Jack Marshall
[2] Zeus (Trojan horse) – Wikipedia
[3] ZeuS Banking Trojan Now In Rootkit Form – Paul Lubic, JR.
[4] A Botnet Primer for Display Advertisers – spider.io
[5] Display Advertisers: Funding Cybercriminals since 2011 – spider.io
[6] The Russian underground economy has democratised cybercrime – Ian Steadman
[7] On the Analysis of the Zeus Botnet Crimeware Toolkit – H. Binsalleeh et al.
[8] GitHub: Zeus – Visgean Skeloru
[9] Blackhat 2012 EUROPE – Workshop: Understanding Botnets By Building One – Ken Baylor
[10] What is Zeus? – James Wyke
[11] IAmA a malware coder and botnet operator, AMA – Anon
[12] Discovered: Botnet Costing Display Advertisers over Six Million Dollars per Month – spider.io
[13] Russian Underground 101 – Max Goncharov

Display Advertisers: Funding Cybercriminals since 2011

[Originally a guest post on Wired]

Before 2011 online advertising fraud was regarded as a solved problem. Then in 2011 a mushrooming botnet ecosystem was born that changed the requirements for preventing online advertising fraud. This ecosystem makes the traditional statistical approaches to preventing online advertising fraud increasingly futile. The ecosystem was born out of the leaked source code of arguably the most infamous botnet malware, Zeus. Display advertisers are inadvertently funding this botnet ecosystem today. And the more display advertisers continue to fund this ecosystem, the more difficult it becomes to prevent online advertising fraud.

Before 2011 online advertising fraud—particularly fraud targeting online PPC advertising—was regarded as a solved problem, or at least a controllable problem. Best practices had been established and processes were in place. Let’s consider how this came to be.

2004 was the auspicious year of Google’s IPO. This was not just the first major technology IPO after the dot-com bubble burst. It was also the biggest technology IPO.

Despite the excitement over Google’s IPO, analysts at the time expressed reservations about Google’s ability to prevent advertising fraud. These reservations were addressed explicitly in Google’s SEC filing: “If we fail to detect click-through fraud, we could lose the confidence of our advertisers, thereby causing our business to suffer. We are exposed to the risk of fraudulent clicks on our ads by persons seeking to increase the advertising fees paid to our Google Network members. We have regularly refunded revenue that our advertisers have paid to us and that was later attributed to click-through fraud, and we expect to do so in the future.”

Google’s CFO, George Reyes, also confessed shortly after the IPO: “I think something has to be done about [click fraud] really, really quickly, because I think, potentially, it threatens our business model.”

Google’s concerns centred initially on automated click fraud, like Michael Bradley’s Google Clique. Then click farms became the pressing concern for Google.

Google acted in earnest on these concerns. A team was put together, headed up by Shuman Ghosemajumder, processes were established, and Google built out what continues to be regarded as the best fraud-prevention framework for online advertising. Despite Google’s efforts, however, Google, Yahoo! and nine other search engines still got caught up in a famous class action lawsuit, Lane’s Gifts and Collectibles. This happened in 2005. The lawsuit is famous because of the size of the settlement fund that was created by Google in 2006: $90m. The lawsuit is arguably more famous, though, because of the report prepared by the expert witness, Professor Alexander Tuzhilin, in which he set out what the industry should regard as best practice for identifying and preventing click fraud. Google posted Professor Tuzhilin’s report online, and the report has remained up ever since. For many in the industry this document continues to serve as the foundation for providing confidence to PPC advertisers.

2011, the Year Zeus Threw Lightning Bolts
In 2011 the landscape for online advertising fraud changed entirely. It was a watershed year.

The change was started by two infamous source-code leaks. First the source code of arguably the most infamous botnet malware, Zeus, was leaked. Soon afterwards the source code of SpyEye was also leaked. These two source-code leaks “democratised cybercrime.”

Before 2011 malware creators used their creations for their own ends—for credit-card fraud, banking fraud, email spam and denial-of-service attacks. When the source code of Zeus and SpyEye was leaked the malware creators became crimeware vendors, enabling less technically savvy criminals to use these diabolical creations and exploit hijacked PCs, phones and tablets across the world.

Today the uninitiated criminal can set up a botnet for US$595, and “you don’t need to know the first thing about coding.” The rewards for doing so are also increasingly appealing. As confessed last year by the operator of a Zeus botnet of ~10,000 hijacked machines: “Today cybercrime is already more profitable than drug dealing and it will grow even further.”

As this botnet ecosystem continues to grow in sophistication, so the traditional approach to preventing online advertising fraud becomes increasingly obsolete.

Before 2011 there were a limited number of IP addresses from which one could generate fake traffic—either fully automated fake traffic or proxied click-farm fake traffic. This made fraud detection an easier problem. With enough data one could use statistical methods to pick out the small number of IP addresses that exhibit anomalous traffic patterns.

Today’s mushrooming botnet ecosystem changes the requirements for preventing online advertising fraud. Earlier this year the number of hijacked PCs—in the US and also worldwide—was reported as being over 30%. According to our numbers, display and video ad impressions are currently being served to hijacked PCs spanning more than 15% of the US IP addresses we monitor. These hijacked machines are predominantly on residential IP addresses, and the fake traffic from the hijacked machines is interleaved with legitimate traffic generated by the unwitting owners of the hijacked machines. Analysing the ad impressions being bought across entire ad exchanges/networks we have found that the median number of ad requests per month from a single cookie associated with a hijacked PC is typically between 1 and 2. The median number of ad requests per month from an IP address on which there is a hijacked PC is typically between 8 and 32.

Because the generation of fake website traffic can be distributed over millions of hijacked machines across the US, where these machines are on residential IP addresses and the machines are also being used by their owners to surf the web legitimately (with the same cookies potentially being used to generate both fake and legitimate traffic), it is becoming increasingly futile to apply coarse-grained statistical methods to try pick out traffic anomalies.

These coarse-grained statistical methods are particularly vulnerable when applied to protect display advertisers because there are so many publishers selling display ad inventory who are financially incentivised to buy dubious traffic. Contrast this with PPC advertising. Over 80% of current search PPC spend in the US is through Google’s PPC ad network and the lion’s share of this spend—at least 70%—goes toward PPC ads which are shown on websites owned by Google. Whilst the incentive for publishers to buy traffic is so strong that some publishers buy almost all their traffic from dubious traffic sources, many publishers simply top up their traffic to satisfy direct-sales agreements with display advertisers. For example, a whistleblower came forward over the summer to show that traffic from the Chameleon botnet was inadvertently being bought by three of the web’s most high profile publishers: “[redacted] was ordering traffic to fulfill on their inventory to advertisers and were getting this traffic from [redacted]… [redacted] found evidence of bots on their site and fired [redacted]… [redacted], [redacted] are two major publishers that have consumed this type of audience from [redacted].” Coarse-grained methods are at their most vulnerable when applied to publishers who simply top up their traffic with purchased botnet traffic.

Funding Cybercriminals
Botnets cost both advertisers and premium publishers money in the short term. Advertisers are being defrauded—and this defrauding is not transparent, so advertisers cannot simply price it in. Advertising spend is also being diverted away from premium publishers to unscrupulous publishers. As a channel, this makes it harder for display advertising to compete with other types of advertising.

Perhaps the most troubling aspect, however, is that advertisers are inadvertently funding the cybercriminals who are creating and operating the botnet malware—cybercriminals who have been identified as hosting, for example, child pornography and phishing attacks. And the more money these cybercriminals earn from their malware, the more resources they throw at developing the botnet malware. This is an arms race and without appropriate action the future will be markedly bleaker than the present.

If this malware continues to be funded and continues to grow, the implications extend far beyond display advertising. The current plight of display advertisers is part of a much larger problem with increasingly sophisticated botnet malware. The severity of this escalating problem led the White House last year to announce “new initiatives to combat botnets – a collection of computers whose security is compromised by attackers – which are believed to pose one of the biggest risks to Internet security.” The Obama Administration created the National Cyber Investigative Joint Task Force (NCIJTF) to provide a framework for these initiatives. “The National Cyber Investigative Joint Task Force is a comprehensive public/private effort [including the DOJ, FBI, NSA and U.S. Secret Service] engineered to eliminate the most significant botnets jeopardizing U.S. interests by targeting the criminal coders who create them.”

Concluding Thoughts
Display and video advertisers are today inadvertently funding the criminals who are developing increasingly sophisticated botnet malware. The more advertisers fund these criminals, the more difficult it becomes to prevent online advertising fraud. Online advertising is just one application of this botnet malware. By funding these criminals display advertisers are also improving the tools being used to commit other cybercrime.

Securing the Legitimacy of Display Ad Inventory

There is now a strengthening industry resolve to remedy systemic failures across the display advertising ecosystem and to improve the quality of the ad inventory being traded. This is being led, in particular, by the IAB’s Traffic of Good Intent Task Force.

Much needs to be done. Members of the Task Force are starting to explore the problems facing the industry, the potential solutions, industry education and the implications for international law enforcement.

In this post we would like to make suggestions to facilitate the efforts of the Task Force, in particular, and the display advertising ecosystem, more generally.

An Advertising Security Mailing List
Whilst it is encouraging that industry leaders have joined the Task Force and are wanting to take responsibility for securing the legitimacy of the ad inventory being traded, we believe that anyone should be able to contribute to the conversations shaping the future of the industry and we believe that everyone should have access to any information that is provided.

The Bugtraq Mailing List provides an illustrative example of how we believe this should be done. Bugtraq is regarded as the leading general security mailing list. It is typically where information security vulnerabilities are first announced.

If a similar advertising security mailing list was created, then everyone in the ecosystem would be able to review and debate how best to secure the legitimacy of the ad inventory being traded. Everyone would also be able to disclose specific vulnerability details (like, say, this or this)—in much the same way that today’s information security researchers disclose specific vulnerabilities through Bugtraq.

If vulnerability details are disclosed—whether through some advertising security mailing list or otherwise—we suggest that the disclosed details are always specific. The credibility of our collective efforts to improve inventory quality is at stake. So, as an industry we need to be vigilant for sweeping, unsubstantiated generalisations. We suggest also that all details are disclosed with due care. We believe strongly that names should not be given without culpability being clear.

No More Disingenuous Scaremongering
It is disappointing to have to call out a peer, but last week provided a frustrating example of disingenuous scaremongering. This sort of announcement to the industry cannot continue—no matter the choice of medium. If the industry continues to be fed these sorts of untruths, the true facts will almost certainly be lost. We will collectively be like The Boy Who Cried Wolf.

Let’s consider the details. The company that issued the industry announcement is a captcha company. It offers two captcha products for the web. The first of these products, called a CAPTCHA TYPE-IN™ ad, is used across websites today to protect against automated submissions and spam. Instead of showing users traditional captchas, the captcha company shows a proprietary ad format in which there is a text box and the user needs to type in an answer to some question related to the ad. This is not a traditional online display/video ad. It is a form of monetisable captcha. It is served when one would serve a captcha (say, to protect against a spam comment)—not when one would serve a traditional ad. And engagement is as one would expect of a captcha—engagement is not indicative of engagement with a traditional ad.

The second of the company’s web-focused captcha products is called a Pre-Roll Video TYPE-IN™ ad. It provides someone who wants to watch an online video the option of skipping the associated pre-roll ad by typing the advertiser’s brand message into a text box. This product aims to increase human engagement with ads. It is not a way to protect video ads from bot activity. If a user does not enter anything in the text box of a Pre-Roll Video TYPE-IN ad, then this does not mean that the user is automated.

Given the nature of these two products, it seems more than just a little disingenuous of the company to have claimed that “bot traffic patterns remained consistent in a range of 24% to 29% for web advertising” when what the company actually meant was that “bot traffic patterns remained consistent in a range of 24% to 29% for [two types of TYPE-IN ad].” CAPTCHA TYPE-IN ads are not at all like traditional display/video ads; and Pre-Roll Video TYPE-IN ads do not protect against bots. So why did the company make their bold claim? To make matters even more perturbing, the company then extrapolated in some inexplicable way from a sample of 1.4 billion served captchas over two quarters this year and made a claim about the whole of the online display advertising industry: “the global digital advertising industry is on pace to waste up to $9.5 billion in 2013 advertising to bots.” Why make their claim even bolder?

No More Talk of Suspicious Traffic
Whilst last week’s release provides an extreme example, this is part of a broader industry affliction that we believe needs to be addressed urgently. Ambiguous language/terminology is being used widely to describe the industry’s problems with illegitimate ad impressions—both by the demand side and by the supply side—and this makes solving these problems difficult. Whilst we do not doubt that constructive and interesting work is being done by the various companies, we believe that collective efforts across the ecosystem are being undermined by persistent use of imprecise terminology. This is particularly important because the incentives of the demand side run wholly counter to the incentives of the supply side, and if ambiguous language continues to be used the two sides will never agree.

What should we read into each new set of statistics on suspicious ad traffic? Do the demand side and supply side have the same definition of suspicious traffic? Should we understand that suspicious traffic is not traffic of good intent, or that it is non-intentional traffic, or that it is fraudulent traffic, or even that suspicious traffic is botnet traffic?

Let us consider some particular examples. When ad requests are made via proxied security services like ScanSafe, do different companies label these as suspicious requests? When the TalkTalk parental control and malware checker masquerades as a human visitor to websites, do different companies class these as suspicious? Many malware checkers masquerade as legitimate surfers to search for exploit kits embedded within website—because if they did not do this in a clandestine way, then nefarious website owners would serve different content to the malware checkers than they serve to legitimate website visitors. When ads are inadvertently served to auction bots on eBay do different companies label this as fraud—noting that the motive for using auction bots is wholly unrelated to advertising? When ads are served on webpages that have been prerendered by Chrome browsers do different companies label this as suspicious? Certainly ads should not be served on prerendered webpages. But surely “suspicious” is not the right term. This seems more akin to when ads are mistakenly served to Google’s Web Preview bot. Illegitimate? Yes. Suspicious? No.

A Fine-Grained Taxonomy for Illegitimate Ad Requests
Our contention is that if inventory quality is to improve to the extent that it can and should, then it is imperative that the whole industry adopts more fine-grained terminology. If the demand side and the supply side are to engage in suitably constructive dialogue, then both sides need to be talking specifically about the same thing. Furthermore, if fine-grained categories are established then best-in-class solutions can be applied to each fine-grained category of problem. In today’s ambiguous world there is arguably more incentive to provide an inferior catch-all solution than a best-in-class targeted solution.

To this end, we would like to propose for industry consideration an initial taxonomy for illegitimate ad requests. These are the nine labels we currently apply to illegitimate ad requests:

(* PC/Tablet/Phone)

This taxonomy may not be complete. This taxonomy may also not yet be granular enough. We welcome suggestions from across the ecosystem.

In particular, we anticipate brand-safety specialists wanting to add site-categorisation labels to the illegitimate-request taxonomy: like webpage with hate speech and file-sharing website. However, as we are not a brand-safety company (we don’t crawl websites or analyse website content), we will not be so bold here as to propose any brand-safety labels.

We look forward to hearing your suggestions.

Sambreel is Still Injecting Ads. Video Advertisers, Beware.

Background: Sambreel’s AdWare was Ostensibly Shut Down Last Year.
On December 9, 2011, the Wall Street Journal called Sambreel out for illegitimately injecting ads into Facebook and Google webpages via adware browser plugins like PageRage and BuzzDock.

Facebook subsequently blocked its users from using Sambreel’s adware browser plugins whilst accessing Facebook webpages. Sambreel responded by suing Facebook, claiming violations of Sections 1 and 2 of the Sherman Antitrust Act.  The case was thrown out of court.

With Sambreel’s adware publicly exposed, major sell-side platforms and ad exchanges like PubMatic, Rubicon Project, and OpenX dropped Sambreel as a supplier of display ad inventory in 2012.

Sambreel is Alive and Kicking.
Sambreel has two more plugins, Easy YouTube Video Downloader plugin and Best Video Downloader. These plugins are part of a software browser tool suite provided by Yontoo and Alactro. Yontoo and Alactro have been identified as subsidiaries of Sambreel.

When a user who has installed these plugins visits youtube.com multiple display ad slots are injected across the YouTube homepage, channel pages, video pages and search results pages. These display ad slots are being bought today by premium advertisers like Amazon Local, American Airlines, AT&T, BlackBerry, Cadillac, Domino’s, Ford, Kellogg’s, Marriott, Norton, Toyota, Sprint, Walgreens and Western Union. Screenshots are shown below.

The display ad slots injected by Sambreel are also being bought today by malvertisers—advertisers who provide malicious or malware-laden advertisements with a view to spreading malware to new users. Example screenshots are included below. The first screenshot shows a fake alert, which suggests to the user that a Java update is required. If the user clicks the OK button, then the user is taken to the disreputable site shown in the second screenshot.

This sort of malvertising would be unlikely to impact YouTube users without Sambreel’s involvement. Google has strict ad-quality processes, and Sambreel’s plugins bypass these.

How are the Ads Injected?
Sambreel’s plugins inject ads by adding iframe elements to the page, hosted on the c.ztstatic.com domain:

  var ad = document.createElement('iframe');
  ad.setAttribute("width", "300");
  ad.setAttribute("height", "250");
  ad.setAttribute("border", "0");
  ad.setAttribute("scrolling", "no");
  ad.setAttribute("src", "http://c.ztstatic.com/300x250_easyyoutube2.htm" + "?t=" + time + "&id=" + this.uuid);
  adContainer.appendChild(ad);

From within these iframes, the available ad slot is passed on to AdMatter. AdMatter describes itself as a “targeted ad cloud”. The AdMatter homepage is titled Sambreel and contains no content. This has been the case since October 2012, which is when Sambreel was reported as having been blocked by the sell-side platforms and the ad exchanges. The AdMatter DNS is hosted on a Sambreel server.

AdMatter calls out to service.amasvc.com—which points to an IP which is in a range assigned to Sambreel Services, LLC—to request an ad tag. This leads to a tag being included whereby the ad slot can be sold through one of a range of major display ad networks and display ad exchanges.

When the ad slot is passed on to the major ad networks and display ad exchanges, YouTube is listed as the domain on which the ad slot has been made available. Indications are that jeetyetmedia.com, pluralmediallc.com and redfordmediallc.com are being reported as supplying the ad slot rather than Sambreel. All three of these domain names are currently protected with whois privacy, but were previously registered in the name of Arie Trouw, founder of Sambreel: bit.ly/1ey50QO; bit.ly/15Xxmoq; bit.ly/17cW94o.

Each ad slot is reloaded every 120 seconds:

  var time = 120000;
  setTimeout(function () { window.location.reload(false); }, time);

Video Advertisers, Beware.
Previously Sambreel’s adware only affected display advertisers. This is no longer the case. On some of the smaller video ad exchanges over 15% of the ad slots sold to video advertisers have been injected by Sambreel into YouTube. This happens as follows.

Video advertisers typically pay an order of magnitude more for each ad slot than display advertisers pay. Spotting the opportunity for arbitrage, some publishers have started to buy display ad slots from leading display ad exchanges. The publishers then immediately pass these purchased display ad slots on to leading video ad exchanges for sale to video advertisers—as though the ad slots are video ad slots.

Unfortunately this rather unscrupulous form of arbitrage has meant that the ad slots injected by Sambreel adware into YouTube are now being sold to video advertisers. We measured the extent to which these Sambreel ad slots are being bought by video advertisers by analysing close to a billion video ad slots sold through non-Google video ad exchanges. We identified over 3.5 million installations of Sambreel’s YouTube-focused adware plugins. Just over 15% of the analysed video ad slots were Sambreel ad slots injected into YouTube. Example screenshots are shown below.