- On May 14, 2014
Updated – The blog Nabble from the Netherlands has done some further research on Semalt. Their findings suggest that the questionable traffic we detailed was just the tip of the iceberg. Read the Nabble post that details the nefarious tactics being employed.
Take a look in your website traffic logs (e.g. Google Analytics) and go to traffic report or the list of top referrers (referrers are the sites or domains that are sending traffic to your site or domain). Do you see an entry in there for Semalt.com or semalt.semalt.com? If so, you’re not alone and you probably have questions like “What is this site and why is it driving so much traffic to my site?” “Is this traffic a good thing or a bad thing?” and finally the obvious question …
… What is Semalt, anyway?
According to their website, Semalt is “a professional webmaster analytics tool that opens the door to new opportunities for the market monitoring, yours and your competitors’ positions tracking and comprehensible analytics business information.”
Note the somewhat awkward english in that value proposition. We found similar odd phrasings and sentence structure throughout their site. There are many theories as to where Semalt is based. The most credible is Ukraine with reports of the service originating traffic from servers in Europe, South America and Africa.
According to our research and various articles, we know they are a domain that generates large amounts of web traffic through spiders (site crawlers), ignores common conventions such as robots.txt (which tell spiders what they can and can’t index on your site) and that their index offers no perceptible benefit at this time.
The conclusion that we’ve come to is that they are positioning themselves and drawing traffic to their site through their indexing bots. In other words, they are crawling massive amounts of websites and thereby showing up in the traffic reports of millions of website admins. Those admins are browsing to semalt.com to see what it’s about and are then introduced to aforementioned awkward value proposition and encouraged to register for a Semalt account. Where they go from here is anyone’s guess and there’s no way to know (yet) if the company is legitimate or not. The sheer brute force of their indexing effort and the fact that they don’t follow instructions from robots.txt files is concerning. However, we’ve also heard that if you request to be removed from their index, they do respect those requests. We’re trying that ourselves and will report back.
What are the Drawbacks of Semalt Indexing Your Site?
The two main negative impacts of semalt.com visiting your website are:
Skewed Analytics Data
The most noticeable impact of the traffic that originates from the semalt crawler is how it manifests in your analytics data. If you generate a traffic report for your executive team and it includes Semalt.com you’ll a) have to explain what they are to that exec team (good luck with that) and b) your report won’t include the valid traffic source that Semalt pushed off the bottom of the list.
Another related drawback is bounce rate. If Semalt accounts for a significant amount of your website traffic and their bot is hitting one page at a time (i.e. not following links on the page), this will skew your bounce rate to a higher number.
The good news is that all analytics tracking applications worth their salt have a method for filtering out specific hosts from their reports. As you can see from this sample query, how to remove Semalt from analytics reports is a popular topic.
With Google Analytics you can filter out traffic from a domain by going to Admin and using the Filters tool (under View/Profile for non-Universal users, under View for Universal users). Here are some step by step instructions: http://www.wikihow.com/Create-a-Filter-in-Google-Analytics.
Every request that the Semalt crawler makes to your website results in data being sent back and forth from your server. If your hosting plan has a limit on bandwidth, this traffic could come at a cost. (If you are paying for bandwidth, please give us a call. Our hosting plans don’t include additional bandwidth charges.) With the volume of traffic that they are generating, that traffic can add up fast.
As you know, we’ve found that the Semalt doesn’t honor robots.txt which is typically the best way to block unwanted traffic from a source. Also, since Semalt traffic doesn’t come from a consistent location(s), it’s very difficult to block with htaccess directives or similar approaches for blocking traffic by IP address(es). Blocking the traffic at the server using these approaches can have a impact on server performance, so we generally don’t recommend it.
For now the best approach for mitigating this impact is to submit a request to be removed from the Semalt index.
Where do you go from here?
That depends on whether you’ve been impacted by traffic coming from the semalt application. If you haven’t heard about it before and don’t see semalt.com in your traffic reports, it’s safe to ignore them for now. If you’re seeing an increase in bandwidth charges or simply want to get Semalt out of your traffic reports, request to be delisted with them. If that doesn’t work, implementing some server-based rules for refusing semalt.com traffic may be your best bet.
Also, check back here as we’ll post updates to this blog entry as we learn more about what Semalt is actually doing with all the data they are gathering. Maybe we’ll find out it’s a great service that none of us can live without, but don’t hold your breath.
URL’s of note
Semalt’s description of Semalt: http://semalt.com/what-is-semalt.php
Online forum discussing Semalt: http://www.onlinethreatalerts.com/article/2014/1/1/what-is-the-website-www-semalt-com-about/
A WordPress forum discussing the same with some staff answers / opinions regarding Semalt: http://en.forums.wordpress.com/topic/a-suspicious-website-viewing-my-blog?replies=4
Google query for how to filter Semalt from your Google Analytics data: https://www.google.com/search?q=filter+semalt+traffic+google+analytics