A couple weeks ago I was looking over geographic location data to inform the return on investment of an advertising campaign. A significant portion of metropolitan data was showing a large percentage of the users sitting in the exact same location. It was as if they all checked-in on their apps while they were standing at the center of the city. That seemed unlikely. Was there something wrong with the data? Well, yes and no. More importantly: was it fraud?
Was it fraud? Welcome to the gray area of ad-tech. eMarketer shows that display advertising fraud loss is hovering around $6 billion worldwide. They likely derived this data from very black and white definitions of what is and what isn’t fraud. In this post I’ll be lining up not only explicitly fraudulent behavior, but also some marginal tactics of media owners, supply side companies, advertisers and demand side companies to illuminate the fringes of acceptable behavior.
Read on for geo stuffing, bid caching and more…
I was trying to figure out what to do with my Sunday. My options were: build a little header bidding ad server plugin for WordPress; run, sleep and eat; or write up some blog post on a pacing algorithm, because people still seem to be producing crappy ones. Since you’re reading this, you can probably guess which choice I made. I mean, it’s not the first post I’ve written on the subject.
It showed up again last week. I didn’t expect it, but I guess I never do. A saw-tooth pattern on a chart, indicative of a capping of sorts. A chart that says, “I want a thing to happen, but only so much.” In this case it was a traffic allocation. This was a surprise.
A little (bad pacing algorithm) history
Most of the time when I run into a bad pacing algorithm it’s in the form of a campaign trying to limit itself. It only needs to acquire a few thousand impressions every five minutes, for example. So the hastily written algorithm might divvy up the impression allocation into five minutes buckets. Effectively that’s 12 buckets every hour. So it takes an hour’s worth of impression needs and divides it by twelve. One twelfth of the impressions are purchased every five minutes. Unfortunately at that point it switches to a simple counter that says, “for the next five minutes buy impressions until the number purchased reaches 1/12th of what I need in this hour.”
You end up with a purchase graph that looks like this.
See that blue spiky thing? That’s the one that’ll get ya. Read on to find out how this impacts the industry and how to fix it
…and do others see this jump at midnight in their own timezone?
This question was asked on Quora.com, below is my answer.
I dug into this win price problem several months ago after noticing the same jump in spend at that hour. Rubicon is on Pacific time so we refer to this as the “9 O’Clock Bump” effect.
Dr. Richter pointing at a Dodo bird. “Adapt or perish”
After asking several DSPs about the problem we determined that it was, indeed, campaign budgets resetting combined with less-than-optimal pacing algorithms and in some cases lack thereof.
We’re in the process of finishing up some documentation on our pacing algorithm that does a pretty good job pacing to the needs of the campaign while considering the fairly predictable traffic pattern throughout the day. We’ll be putting this information out in the next couple weeks. Hopefully it will inspire some folks in the market to upgrade their systems and resolve some of this win price inefficiency. I’ll update post with a link to the document once we release it.
UPDATE: The document is finally out the door. You can read it here.
My post was originally published on the Rubicon Project blog. It was written with contributions from Dr. Neal Richter and Jonathan Zhuang.
Pacing algorithms come in a few basic forms at the Rubicon Project. The most basic is one called “as fast as possible” which can hardly be shown to do anything that resembles pacing. The Pacing controller is supposed to spread out impressions served for a campaign throughout the day. In general it takes the goal of impressions to serve in a given day as input and calculates a serving schedule for the campaign.
A naive pacing algorithm will break the day up into 24 segments, let’s call them hours. It will allocate equal amounts of campaign impression for each hour and then recommend the campaign get impressions until the hour’s allocation is exhausted. This algorithm has a couple of pretty big flaws. Primarily it tends to serve the campaigns at the beginning of each hour and then once the allocated impressions are used up it stops serving until the next hour. The second flaw is that it has no understanding of the traffic distribution throughout the day. The 1AM hour doesn’t have the same amount of traffic coming in as the 10AM hour. If the inventory is relatively scarce the campaign will under-serve during the early hours, catch up during the peak hours, and then under-serve again in the later hours. Ultimately campaigns using this algorithm may not achieve their goals at the end of the day and it won’t serve evenly in a given hour. Read more