Slow server? Don’t overthink it. (And don’t forget what’s running on it.)

I’ve just spent the better part of a week troubleshooting server performance problems for one of my clients. They’re running a number of sites on a dedicated server, with plenty of RAM and CPU power. But lately the sites have been really slow, and the server has frequently run out of memory and started the dreaded process of thrashing.

Fearing inefficient code in cms34 may be to blame, I spent a few days trying to optimize every last bit of code that I could, which did make a slight improvement, but didn’t solve the problem.

Then I spent a few more days poring over the Apache configuration, trying to optimize the prefork settings and turning off unnecessary modules. Still, to no avail, although getting those prefork settings optimized, and thus getting Apache under control, did allow me to notice that MySQL was consuming CPU like mad, which I had previously overlooked.

Hmmm… that got me thinking. I fired up phpMyAdmin and took a look at the running processes. Much to my surprise, almost every MySQL process was devoted to an abandoned phpBB forum. Within moments I realized the forum must be the source of the trouble, which was confirmed when I found that it had over 500,000 registered users and several million posts, almost all of which were spam.

As quickly as I discovered the problem, I was back in the Apache configuration, shutting down the forum. Then a quick restart of MySQL (and Apache, for good measure), and the sites were faster than I’ve seen them in months.

The moral of the story: if you have a web server that suddenly seems to be grinding to a halt, don’t spend days optimizing your code before first looking for an abandoned forum that’s been overrun by spammers.

A follow-up on Apache not starting on my web server

About 6 weeks ago, I wrote about a problem I was having with Apache not starting with SSLEngine on. I ended the post somewhat ominously with the following:

I’m a little concerned that Apache is going to require manual input of these pass phrases again whenever it restarts (e.g. if the server reboots). I hope not, but for now I am at least able to move forward knowing it works at all.

This morning, a little before 6 AM, that happened. I was awakened by notifications (with their attendant beeps and nightstand vibrations) on my iPhone that my web server was down. Great. Half-awake, I fired up my hosting provider’s handy iPhone app, tapped the “Hard Reboot” button, and tried to go back to sleep. Except, the notifications kept coming. Eventually I was awake enough to realize that the server was coming back up, but Apache wasn’t. Time to get up and deal with this problem from a real computer.

SSHed in, I tried manually starting Apache, and got this:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs

What the crap? After spending a half hour visually scanning log and configuration files, to no avail, I decided I needed to try to find out what was running on port 80. This page was helpful in that regard. I ran the command lsof +M -i4 and found that, whaddayknow, Apache was running. Apparently. But I couldn’t shut it down, and I couldn’t restart it. There were no signs of any compromise of the system’s security, so I just chalked this up to some minor problem deeply buried somewhere in a configuration file that I have yet to track down (but which is probably my fault). At any rate, lsof gave me what I really wanted: the process ID that was listening on port 80. Time for the dreaded kill -9 command.

After that, I tried starting Apache again, and it worked… and, as I suspected, it did ask for the pass phrases again. But now, all is well. (Except for the nagging feeling of not knowing what caused this to happen in the first place. Stay tuned…)

My strange solution to Apache not starting on Ubuntu Linux server with SSLEngine on… (YMMV)

The situation: I’m running a web server on Ubuntu Linux using Apache 2. I have two sites on the server that need SSL. I obtained a second IP address (since you can only have one SSL certificate per IP address) and configured Apache accordingly. I was able to get regular old port 80 non-SSL pages to load just fine on virtual hosts configured to use both IP addresses.

I created my key files, got the certificates from the CA (GeoTrust, in this case), all that business. Put the files in the right places, configured the Apache files, all that jazz. Made sure mod_ssl was enabled, yes. All of that. Trust me, I did it. Don’t bother asking. And yet, whenever I tried to run Apache with SSL configured… nothing.

And I mean… nothing.

I’d restart Apache at the command line, and nothing. No error messages of any kind. But Apache wasn’t running. I checked all of the log files (and I mean all of the log files), nothing. DOA.

Eventually I tracked down the culprit as the SSLEngine on line in the Apache config file. With it in there, Apache wouldn’t start. Comment it out, Apache starts up just fine, but of course you don’t have SSL.

I’m using the arrangement of Apache config files as they’re installed in a default Ubuntu build. That means /etc/apache2/httpd.conf is actually empty, and most of its usual contents are in /etc/apache2/apache2.conf, with a few other settings dispersed into a number of adjacent files. There are some critical settings in /etc/apache2/ports.conf and then everything else is in the individual config files I’ve created for each site on the server, stored in the /etc/apache2/sites-available directory with symbolic links for the active ones in /etc/apache2/sites-enabled.

Well… that turned out to be the problem. I’m not sure why it matters, but I was putting the VirtualHost configurations for the SSL sites in the respective sites’ existing configuration files. But no… all of the SSL-related (port 443) <VirtualHost> blocks needed to be put in the 000-default file. That made all the difference.

Well, almost all the difference. My private key files are encrypted with pass phrases, and Apache needed me to enter them when starting up. But, funny thing… it didn’t ask me for them all right away. I had to fiddle around with starting and stopping it a couple of times (which I bothered to do because it still wasn’t running), but eventually it did ask me to enter the pass phrase for both sites, and after I did that, everything is working. Both SSL sites, all of my non-SSL sites, it all works.

I’m a little concerned that Apache is going to require manual input of these pass phrases again whenever it restarts (e.g. if the server reboots). I hope not, but for now I am at least able to move forward knowing it works at all.

Download those PDFs!

Wow, I really like these 512x512 icons in Snow Leopard...This post is strictly for web developers/server administrators. The rest of you can resume your daily activities, confident that nothing that was even remotely relevant to you transpired here.

PDFs. Web browsers. Both are a daily, or at least frequent, part of the lives of most computer users. But not all web browsers are created equal when it comes to the matter of handling PDFs. Some browsers (say, the ones developed by commercial OS makers) take the approach of trying to manage everything for the user. They include PDF readers that are embedded right into the browser, and PDFs load almost like another web page. Other browsers (most notably Firefox) treat PDFs as downloadable files, and when the user clicks a link to one, the file gets downloaded to their hard drive to be opened in a “helper app” — usually Adobe Reader.

Most website owners prefer the latter approach, and I suspect most users do as well. PDFs in general are not much like web pages, and it does not seem logical that they should be viewed within a web browser. Generally when people are accessing a PDF, it’s because they want to download the document to their computer to be used offline or to print. It is illogical for these actions to take place in a web browser. Sure, savvy users can right-click (or on Mac, Ctrl-click) and select “Save linked file as…” or some such nonsense from the contextual menu. But a lot of Windows users don’t even know their mouse has a right button, a lot of Mac users have no idea that you can press keys and click the mouse button simultaneously to perform special actions, and a lot of both would be confused by this entire process.

So we come to the matter of web developers (such as myself) trying to find ways to force the web browser to download the file instead of loading it directly. A trick I have used often is to link not to the file directly, but to a special PHP script that reads the file into the server’s memory, changes various aspects of the response (such as the MIME type), and then streams the content out to the browser. This is especially useful when you want to restrict access to files, say only to logged-in users, or only to users who have entered a special passcode. The PHP script is perfect for that, because it allows you to execute some code before sending the file to the browser. It even lets you store your files in a directory on the server that web browsers cannot access directly, ensuring (more or less… this article isn’t about hacking) the security of your files.

But what if your files aren’t in a protected area? What if you don’t want to bother with the mucky-muck of the PHP wrapper — you just want to link directly to the (browser-accessible) file, but you still want to force the download? Well, if you’re using Apache, you’re in luck. I found this great explanation of a small block of code you need to add to your httpd.conf file to achieve the same effect.

Ultimately, what you want to do is change the MIME type of the response. Browsers that are inclined to load PDFs internally perform this magic by seeking out files that are sent with the application/pdf MIME type. But there’s a very handy, “generic” MIME type for binary files, which all browsers treat as files to be downloaded rather than displayed directly: application/octet-stream. It may sound like a group of 8 singers standing in a small river, but it really just means… a generic binary file.

Here’s the complete block of code to put into your httpd.conf file, or into the appropriate virtual host configuration, however it’s stored on your particular server. I placed the code just within the virtual host configuration for the client in question, so the change applies only to their site, and not to any others that are also running on the server:

<FilesMatch "\.(?i:pdf)$">
ForceType application/octet-stream
Header set Content-Disposition attachment
</FilesMatch>

If you’re not the server admin, you should also be able to put this in a .htaccess file in your site’s root directory, but I haven’t actually tested that to see if it works.

Barack Obama: the open-source candidate?

Now we’re speaking close to my heart. Granted, I’m a freeloader in the open source world. I have yet to contribute a single line of code to an open source project. (OK, I guess that’s not entirely true: I did write a WordPress plugin. Sweet. I’m in the club! Sort of.) But I have wholly embraced open source software in my work. PHP FTW, baby! (Uh yeah… whatever.)

These days the only thing I’m a more enthusiastic and outspoken proponent of than open source software is Barack Obama. So I’m surprised it took me so long to research what he’s running his website on.

Linux PWS/1.3.28

*Whew* Glad to see it’s Linux. But what the heck is PWS? I was at a loss. Then I found this blog talking about the very same issue. And suddenly it made sense why I didn’t recognize the acronym. I never would have considered Microsoft’s Personal Web Server to be the web server of choice running on a Linux server. I am still scratching my head at it. The whole VM thing seems the only logical explanation, except that there’s no logic to explain it. At least it’s not so transparently ass-backwards as John McCain’s configuration:

Linux Microsoft-IIS/6.0

And the inexplicable:

Linux ECAcc (lhr)

Interestingly, though, a Google search for “ECAcc (lhr)” reveals a link to a Digg post entitled John McCain Owns VoteForTheMILF.com. Stay classy, San Diego.

Let’s be clear: I think the idea of running a web site under Windows in a virtual machine on a Linux box is the most incomprehensible, mind-bogglingly stupid arrangement you’d ever bother with. I’d have to guess that the sites were developed to run in a Windows environment, but when it came time to deal with practical server and network capacity issues, load balancing and whatnot, some sysadmin made the (probably prudent) decision to load balance on Linux boxes instead of Windows, but since the site was tied to some feeble Windows technology, they couldn’t just move it over to Linux wholesale.

But let’s take this a step further. Back in late spring I received an email from Barack Obama’s IT director soliciting applicants for web developer positions with the campaign. Even though the job was in Boston, I figured it would be insane not to apply, so I submitted my resume. (I never heard back, for what it’s worth.) And it’s from this that I happen to know that the campaign was specifically seeking PHP developers. Rock on.

With that in mind, the whole Windows-on-Linux-through-VM arrangement made even less sense. Why would they develop the site in PHP, run it on a Windows server (definitely not the optimal arrangement for a PHP-based app, though it certainly will work), and then VM that Windows environment on a Linux box, instead of just gearing the PHP app for a Linux server in the first place? And that’s when I remembered that just earlier in the day I had been looking at taxcut.barackobama.com. Of course! Separate third-level domains are all over Obama’s site. Let’s check the configuration on that domain. Now that’s much better:

Linux Apache/1.3.34 (Debian) mod_gzip/1.3.26.1a AuthMySQL/4.3.9-2 PHP/5.2.0-8+etch10

And I think it explains a lot. Campaigns start off small. Obama had to register barackobama.com and put something up there ages ago, long before he was the Democratic nominee and the hugely successful fundraiser he became along the way. So that original site, www.barackobama.com, was probably developed on a Windows box in someone’s proverbial basement, probably when was running for the U.S. Senate or maybe even the Illinois Senate. But as the campaign has grown, its websites (plural) have grown as well, and in a decidedly open-source direction. There’s some good stuff in there. Debian (which could mean Ubuntu, too… I haven’t checked the signature on Ubuntu’s Apache package to see if it’s split from its Debian roots), PHP (and a reasonably up-to-date version at that), MySQL, etc.

It’s kind of fun to do this kind of research, as long as you don’t mind being distracted along the way, because there are plenty of weird sources of distraction.

Aside from the aforementioned MILF site (classy), and the somewhat interesting fact that searching on “PWS/1.3.28″ brings back as its first result a reference to Obama’s hosting, I discovered that for some reason the page title on John McCain’s official store is “Independent Online Stores.” OK. No one looks at title bars. And even fewer web developers look at <title> tags. I know that from experience. But of course that’s just a transitional landing page, announcing that McCain wares are not actually sold by the campaign, but by independent, for-profit companies, and buying these items doesn’t translate into money going back into the campaign. Huh. I can’t quite wrap my brain around that, but I’m a lifelong, union-loving Democrat, so I guess I wasn’t meant to. The only thing that comes to mind is that maybe it has to be that way, legally, now that he’s accepted public campaign financing. Anyway, the first McCain store link I found, which as they state is apparently an independent operation not affiliated with the campaign, is, not surprisingly, running:

Windows Server 2008 Microsoft-IIS/7.0

I also found that the company that hosts some of Obama’s pages also hosts a site for the American Model Yachting Association. Really? Model yachting? That exists?