Learn about Web Development, Design, Marketing and IT

How to identify and filter spam referral traffic (Part 2 of 3)

Written by Ryan Brooks | December 14, 2015 at 7:17 PM

Part one covered how to identify spam referral sources within Google Analytics. This part will focus on how to filter them out from appearing in future data. Even with a filter in place, historical data cannot be change, so it’s important to know that a filter will only work for data that is collected after a filter has been put into place. But that doesn’t mean it’s impossible to get clean data from a spam-ridden referral section.

Filtering data that’s already been collected in Google Analytics

Back in the referrals area of Google Analytics, the data table provides you with totals and averages of all the referral sources collected, including the junk ones. In order to filter those sources from the totals and averages for reporting purposes, make sure to have a list ready of which sources you want to be not counted. Once that’s been compiled, locate the search bar about the table, but below the graph and to the right will be a link called “advanced”. Click on that and a drop down will appear.

Here, you can choose what information is included or excluded from the data table. Since we want to exclude sources from the data, use the drop down to select Exclude, Source can stay the same, Containing can stay the same, then paste a domain from your list into the empty text box. If you have more than one, click the + Add a dimension or metric and select Dimensions and Source. You’ll be able to repeat this process as many times as you need.

Once every junk source is set to be excluded, hit the apply button and the data should change. Keep in mind that if you move away from this page, you’ll need to re-apply this advanced filter to get purified data again (yes, it’s as tedious as it sounds).

Preventing spam referral sources from contaminating more data

Prevention is the most important part about purifying data. This will apply to all data collected after the filter has been in place.

Take the list of spam referral sources collected and go into the Admin section from the main navigation in Google Analytics. Under the VIEW column, there should be a Filters option. Click on it, and either an empty list or a populated list of filters will appear. There should be a + Add Filter red button. If you don’t see it, you lack the necessary permissions to create filters. You’ll need to contact whoever owns the account if that’s the case.

Once you click on the add filter button, you’ll be prompted with a series of options. Start out by giving this filter a name. What I use is “Spam Filter #1”, but you can use whatever naming convention is most comfortable. Keep in mind that there might be more than one spam filter because of a character count limit later in the process, so including a number is recommended.

Set the filter type to custom and the exclude radio button should be selected by default. Under the filter field dropdown, select Campaign Name. Under filter pattern, paste in the first referral domain you want to exclude. Keep in mind that each proceeding referral domain has to be separated by a “|” without the quotes.

There’s a section called Filter Verification which contains a link called Verify this filter. Its usefulness depends on the sample size of the data being affecting, including total sessions to the website and total sessions for each referral source. If either number is too small, the filter will throw an error, even though the filter will still properly operate.

Here’s an example of what my spam filter looks like:

social-buttons.com|semalt.com|buttons-for-website.com|4webmasters.org|darodar.com|simple-share-buttons.com|adviceforum.info|ranksonic.info|makemoneyonline.com|best-seo-solution.com|hulfingtonpost.com|7makemoneyonline.com|best-seo-offer.com

Notice none of the entries include the “www” part of the URL. This is to make sure that no source has to be entered twice. “social-buttons.com” and “www.social-buttons.com” are both filtered by “social-buttons.com” because they both contain a matching series of characters.

It’s with this field that you’ll need to be aware of the character count limit Google Analytics has, which is 255 characters. If you go to save this filter and an error message says “must be at most 255 characters”, you’ll need to remove some domains until the character count falls below 256.

If necessary, repeat this process with another filter, perhaps called “Spam Filter #2”, and keep adding domains.

For a list of spam referral traffic sources that I've compiled over the months and months of analyzing data: click here.

But we’re not done yet. There are two more aspects to data purification needed to be addressed. In the next post, both hostname filtering and unfiltered analytics views will be discussed.