Filed under The SysAdmin Files

WordPress Search Spam

Our blog was recently affected by a rather clever little hack, and when I went searching for ways to remove it, I couldn’t find much. Here’s a brief writeup of what happened and how I fixed it.

Our Director of Internet Marketing Strategy, Sonny Cohen, spends some of his time searching Google and other search engines for keywords relative to our business. He began noticing that some of those results, while they would return pointers to our blog, were laced with keywords and links to various male enhancement drugs. When I searched our blog for these references, I couldn’t find anything.

Here’s what I was seeing when I would search our blog for the phrase “test”:

But here’s what Google was seeing when it did the same search:

You may notice that the URL in that is to a local file. There are two ways you can see what your site looks like to Google. One is to change the User Agent on your browser to match that of the Googlebot. The other is to use the Webmaster Tool’s “Fetch As Googlebot” lab utility. I used the latter, and saved the resulting report as an HTML file and then opened that file in Chrome.

So why is Google seeing different results than anyone else who visits my site and runs that query? Something different must be happening when Google visits. I started running through the execution path of WordPress. The first file that is accessed is index.php. All this file does is turn on a theming variable and load wp-blog-header.php. So I moved on to that file. It looked like this:

if ( !isset($wp_did_header) ) {
$wp_did_header = true;
require_once( dirname(__FILE__) . '/temp.php' );
require_once( dirname(__FILE__) . '/wp-load.php' );
wp();
require_once( ABSPATH . WPINC . '/template-loader.php' );
}

temp.php? Never heard of it, let’s see what’s inside:

eval (gzinflate(base64_decode(
'vVhtc9pGEP6emfwHRfUUmGLg9IbkhNrUJrZnEsfFOGmKXc1ZOoMmQqInYYea/Pfu'
.'nnjRG6aZzNRj0Em7++yzu3erOw5/fXM4HU9fvnj5Ym8cRnFnz77q9T/2+sPK2WBw'
...snip for length...
.'6reTZEAXdDrl4QNzE/3F3Wy+iKjPxFe0gH7G+ML1IiecBfHiY+LyWLhsVmDlrQ7g'
.'cvonDPkW65UOKh6zCWuM44kvFr6Ialmvw1/fHP4L'
)));

Now that looks evil. Obfuscated code can’t be good. I decided to see what it does by replacing the “eval” with “print” and then I ran “php test.php” from that directory. The results are very long, but you can see them here.

Basically, the program tries to determine if we are a real person or a search engine bot by looking at things like our IP address and our user agent. If it determines we are human, it goes ahead and returns the standard header. If we’re a bot, it serves the content in “theme.html” which is identical to the second screenshot above.

So to clean things up, I removed the reference to temp.php from wp-blog-header.php, deleted the file temp.php and deleted the file theme.html.

Add a Comment (2)

Troubleshooting DNS issues

Yesterday we experienced an issue reaching some of our .org domains and I wanted to write a bit about the troubleshooting process I used to determine what the problem was.

At around 10:15 am I was gathered with most of our company in our main conference room (watching a stream of the inauguration activities) when my pager went off alerting me that one of our sites was down.  The first thing I did was verifiy that the web server and database were up and serving content correctly.  Next I tried to reach the site from my laptop.  This particular site answers to both http://www.financeleaders.org and http://financeleaders.org ordinarily, but at the time only the second was actually responding.

The next step is to verify the DNS results for both those two addresses.  Commonly this is done simply with the ping utility.  “ping financeleaders.org” gave me the result I expected, but “ping www.financeleaders.org” returned “unknown host.”

Read the rest of this entry »

Add a Comment 

Intermittant outtages on .org sites (i.e. ncsasports.org)

We’re tracking a problem that’s manifesting itself as intermittent outages to .org domains.  what appears to be happening is that sometimes the .org DNS servers will return a null response instead of the authoritative servers.  This results in our local DNS servers caching a “null value” on the response and the site appears down until the cache expires and the full recursive lookup happens again.

Here’s an example of a failed recursive lookup:

sfrazer-mbp:~ sfrazer$ dig +trace www.prairie.org

; <<>> DiG 9.4.2-P2 <<>> +trace www.prairie.org
;; global options:  printcmd
.            79601    IN    NS    l.root-servers.net.
.            79601    IN    NS    j.root-servers.net.
.            79601    IN    NS    c.root-servers.net.
.            79601    IN    NS    k.root-servers.net.
.            79601    IN    NS    i.root-servers.net.
.            79601    IN    NS    d.root-servers.net.
.            79601    IN    NS    b.root-servers.net.
.            79601    IN    NS    f.root-servers.net.
.            79601    IN    NS    a.root-servers.net.
.            79601    IN    NS    m.root-servers.net.
.            79601    IN    NS    e.root-servers.net.
.            79601    IN    NS    h.root-servers.net.
.            79601    IN    NS    g.root-servers.net.
;; Received 449 bytes from 192.168.0.21#53(192.168.0.21) in 11 ms

org.            172800    IN    NS    C0.ORG.AFILIAS-NST.INFO.
org.            172800    IN    NS    D0.ORG.AFILIAS-NST.org.
org.            172800    IN    NS    A0.ORG.AFILIAS-NST.INFO.
org.            172800    IN    NS    A2.ORG.AFILIAS-NST.INFO.
org.            172800    IN    NS    B0.ORG.AFILIAS-NST.org.
org.            172800    IN    NS    B2.ORG.AFILIAS-NST.org.
;; Received 435 bytes from 192.58.128.30#53(j.root-servers.net) in 31 ms

org.            0    IN    SOA    a0.org.afilias-nst.info. noc.afilias-nst.info. 2008502420 1800 900 604800 86400
;; Received 96 bytes from 199.19.56.1#53(A0.ORG.AFILIAS-NST.INFO) in 49 ms

sfrazer-mbp:~ sfrazer$

A0.ORG.AFILIAS-NST.INFO should have returned a list of our DNS servers, which would then be queried.

In short, the issue is out of our control, as our DNS servers remain healthy and serving the correct content, and the websites themselves are still up, even though some people will be unable to get to them.

Because we set our Time To Live on DNS zones to 5 mintues, the outtages generally don’t last long (the cache expires quickly, and is refilled) but the request rate is higher, so people are more likely to see the problem.  The alternative would be longer TTL settings which would reduce the number of times people saw the problem, but would lengthen the time until the problem resolved itself.

Update: The problem has apparently been resolved.  More information here.

Add a Comment (1)

How Not To Use Email

A few months ago I began receiving emails from Verizon regarding a cell phone account that someone set up and helpfully attached my email address to.  This account has nothing to do with me, but Verizon happily took my email address and began sending me notices such as the one below.

Note the very helpful link that says “If you are not the intended recipient and feel you have received this email in error, please click here to notify us.”  That link takes you to the generic support page at Verizon.  I’ve spoken with three different people there and there is no way to look up an account based on the information included in the email (which is why I didn’t bother to blur any of it) or by my email address.  Frankly, there’s no way they can help me, and I’ll continue to get these emails until the account is deactivated.

Whenever you accept an email address, regardless of whether you take it in via a web form or in person, ALWAYS validate the email address and give the target an option to opt out. Doing otherwise may make a potential client into a potential enemy.

Add a Comment 

Using Mechanical Turk for Fun and Testing

Amazon’s Mechanical Turk (http://www.mturk.com/) is billed as “Artificial Artificial Intelligence.”  Amazon’s intent with the service is to connect tasks with workers who can perform the task.  Frequently the task is one that is simple for a human but complex to program.  One of the better examples is Amazon’s own iPhone application that allows you to take a picture of something, then creates a Mechanical Turk job to pay a person $0.10 to find a link to that thing (or the closest possible to that thing) on the Amazon store.

Other people have used Mechanical Turk for interesting little side projects, such as Andy Baio (of Waxy.org) who used Mechanical Turk to supplement data from Wikipedia in his quest to map out the years represented in the samples from this year’s Girl Talk release, Feed The Animals.  Later, Andy wondered how much it would cost to get Mechanical Turk workers to take a picture of themselves and post it to the site.  Someone else used Mechanical Turk to write a book about cats.

The particular problem I wanted to solve was testing the availability of a site from a wide range of locations.  There are services that will let you do this, but they tend to be geared towards long term testing or simple screen captures from a limited range of IP addresses.  I wanted to see proof that the site had loaded from 25 different locations around the world.  My budget was $30.

First I set up the Mechanical Turk job.  You can do these as “one-off” jobs or have code use the API to spit out repetitive jobs constantly.  I went the one-off route and created a simple webform that asked 4 questions.  The form looked like this:

Once the form was ready to go, I added $30 in funds to our account and told the Turk to offer the job out for people at $1 each, with each person only able to complete the task once, and with a 2 hour time limit.  The time limit was because I was simultaneously capturing all the internet traffic sent to the website during the test, in case anyone reported an error.  If so, I’d have a little more insight into exactly what problems were occurring.  When the batch was finished, 16 people had taken me up on the offer.  The results are delivered in spreadsheet format.  An excerpt is below.

What we were able to show from the test is that people from 16 different IP addresses (spread all over the world) were able to hit the target website without problem.

Add a Comment 

Systems Administration on a Train

At around 8:00pm on Friday the pages started coming in.  A new automated SQL-Injection attack was bringing one of our servers to its knees.  The server wasn’t vulnerable to the attack, but the act of having to process so many invalid requests was putting significant load on the processor, and was filling ColdFusion’s running process limit, preventing legitimate requests from having time to run.

Ordinarily this becomes a night-killer for me.  I’m already out, on my way to dinner, and now I’ve got to go home to deal with this issue.  But tonight I get to try something different.  I’ve had an iPhone since around Christmas, and it’s a nice little device, despite a few shortcomings.  But what I’ve always wanted was a way to actively manage a server without the hassle of carrying around a laptop.  With the release of WinAdmin for the the iPhone, I’ve got it.

I was able to connect to our company’s VPN, remotely log into the desktop of the affected server, download and install the free version of ISAPI_ReWrite, and block any URL’s containing the offending SQL injection code.  The processor utilization dropped dramatically, regular pages were being served in a timely manner, and I was able to continue on to dinner.

Add a Comment