My original intention in terms of a security article was to publish a post that talked about Drupal’s Taxonomy Access Control (TAC) module. As mentioned within my Drupal Configuration post, TAC is the module I selected for access management of the content in my architectedfutures.net site. My thought was that it might be helpful to detail how I was configuring TAC according to how I wanted to manage access to the various content items on the site for different audiences. Public content, administrative content, content reserved for registered users, etc. However, as I’ve spent time on the site recently, building up content, I’ve also been keeping an eye on how the site is being discovered and accessed by anonymous users. This is prior to any serious advertising or publication of the site URL. (Until I get to a certain critical mass of useful content, I’m not actively trying to drive people to the site. I want to wait a bit for that.) What I’ve seen though is a discovery process based on web crawlers. People and software that go out of their way to make it their business to find new sites and new content on the web. This seems to break down into a few general categories.
- First among these are the search engines. This is to be expected, and is actually appreciated. As I build up public content on the site, I want that content indexed and advertised through the search engines. In some cases this was actually advertised, since I notified Google and Yahoo! of my site and my ownership of the site. For others, it was a bit of a surprise. Especially Baidu, the Chinese search engine. I was surprised to find that a Chinese search engine would be that quick to discover a brand new, out-of-the-way site such as mine.
- Next there is a certain amount of crawler discovery that is taking place for businesses that make it their purpose to track and keep statistics on what types of sites exist on the web. They don’t seem to really care about the content, just that the site exists, and potentially trying to figure out what technology is being used to create aggregate statistics for the web. I don’t know that this helps me a lot, but it is harmless.
- Obviously I can see my activity as I edit the content on the site, and there is a small amount of activity from visitors whom I have notified of some of the content. And there is an occasional hit that comes through a search engine.
- But I’ve also noticed another group of visitors who are less welcome. Visits by what I would categorize as malware (as in malicious robots), or malicious users.
The first part of the process is awareness. Identification of what is happening on the site. And trying to find which of the previously identified categories any particular visitor may fall into. For this I’ve primarily been using two logging mechanisms:
- The logs maintained by the Drupal software.
- The logs maintained for my site by my hosting provider.
The Drupal logs give me an internal view of what is happening on the site. The general site logs however give me a view of activity that never makes it to Drupal. Together, the two offer a reasonably complete survey of what is happening on the site. What becomes especially interesting in these logs are attempts to get access to non-existing content. Some of which can clearly be distinguished as random probing on the part of the visitor. For example, attempts to use wp-login on my Drupal site. Where clearly someone is trying to use the site as though it were constructed with WordPress for purposes of trying to then log-in. And clearly this is an automated script running, since nowhere on my site is there a reference to wp-login nor is there any reason such a reference should exist anywhere else. Then, there are also the attempts to use the Drupal login facility to log into the site using arbitrary credentials. Happening once is mischievous. Happening daily for multiple days in a row moves somewhat beyond that in my mind.
The normal conversation around web site security focuses on various forms of hacking. Such as this post on cmswire.com. These are discussions of SQL injection, cross site scripting, authorization bypass, password cracking, etc. And the techniques to defeat them when they happen on your site. Which is important, and I am in no way advocating that anyone ignore that aspect of security. Rather, this post asks a different question and addresses a different issue.
Once you have found someone whom you suspect is trying to hack your site in some way, why sit back and play a game of war with them, trying to find out if your software’s defenses are stronger than their ever-changing best means of attack on your site?
From my perspective, once someone demonstrates that they did not come to my site as a friendly visitor, they immediately lose any good will on my part to offer up access to my site or my content. I’m not interested in playing games to find out if my security software can withstand their attacks. I will pay attention to that aspect of security management, but these visitors are simply not welcome on the site. I have no wish to waste my bandwidth on them.
Blocking Malicious Visitors
Given that I’ve identified someone as a malicious visitor, what can be done about it? One answer is blocking access to the site. My site runs using an Apache web-server and supports the use of .htaccess files. This gives me a set of options:
The links provided by the bullet points above give instructions on how to go about implementing these options.
In fact I use both of these techniques. If, on review of my log files, I’ve identified a malicious user based on IP address, then I deny further access to my site for that IP. In addition, I’ve identified a series of referrer domain names which make absolutely no sense as legitimate referrers to my site. These include such referrers as erotic sites within the ru domain and others. I can’t conceivably think of a legitimate scheme for such referrers other than their being used as a basis to support hacking, or their having been hacked and being used as components of a botnet being used to try to hack my site as part of their expansion. Therefore, I’m not interested in traffic from these sites and they are also part of my block list.
Identifying Malicious Parties
When I started this IP and referrer blocking process I was basing the decision of what IP addresses and domain names to include in my block lists on those sites which I had identified in my own log files. Since a number of the malicious IP addresses and referrer domain names had originated in China and Russia, I was curious as to whether other people had built up a list of malicious addresses that could be leveraged. A search turned up www.parkansky.com, which in turn provided a pointer to wizcrafts.net. I’ve found the wizcrafts site to be especially useful. It provides five distinct block lists which can be copy and paste incorporated into my .htaccess file, and it incorporates a page change monitoring facility that allows me to be automatically notified when any of the lists are updated. The block lists include:
- A block list for China and other Asian countries,
- A block list for former Soviet Union and Eastern European countries, Turkey and Iran,
- A South American block list,
- An African block list, and
- An exploited servers block list
All of these lists may be joined into one .htaccess file to include protection from all sites identified on any of the lists. And, since all the lists are maintained by the same person, I don’t have to worry in detail about editing the lists for overlap. (If you find the lists as valuable as I do, you may want to express your appreciation by making a PayPal contribution toward keeping this facility updated and current.) In addition, you can do what I’ve done, and add any other sites which do not appear on these lists but which you’ve identified as persona non grata for your site.