Fun with recycled IP addresses

OK, well that title kind of gives away the end of the story, but it’s still a good one.

So…

Earlier this week I launched a new site for a client. As part of the usual process, I submitted their sitemap.xml file to Google Search Console and Bing Webmaster Tools. Usually that’s all it takes for a new site to get indexed within 1-3 days.

But it seemed to be taking longer than usual for this client, and I decided to investigate the situation.

I should note that we did a private “soft launch” of the site about a week prior to the official launch. During that time I had a robots “noindex” directive turned on so it wouldn’t start showing up in search engines prematurely.

I went into Google Search Console to request a re-crawl. And that’s when I noticed this…

Excluded due to 'noindex'

Well, that’s… weird. Not so much that it had read a “noindex” directive when it, unfortunately, had crawled the URL just a day before we launched — although it was a bit weird that it had crawled it at all — but that the Referring page was a totally different site that should have had no business linking to us, yet.

So then I did what anyone (?) would naturally do, I visited that URL. And much to my surprise, it redirected to our site. What??

Next I used mxtoolbox to do a DNS lookup, and suddenly it all made sense.

We’re hosting the site at Linode. And as it happens, the DNS entry for the referring site is set to the same IP address as our site. This is a virtual private server, so we’re the only people now using this IP address.

But there are a finite number of possible IP addresses, especially IPv4 addresses (about 4 billion). So they naturally get reused. This particular site was for a limited-use product that was only relevant in 2015, so it’s not too surprising that the owners of the domain took down their Linode server and relinquished the IP address. It’s unfortunate though that they didn’t think to remove the DNS entry from their zone file.

At this point, we could (a) contact them and ask them to update their DNS, but that could be convoluted and time-consuming, for no real benefit to us, (b) set up a rewrite in our server that shunts traffic that’s trying to access their product site back over to their main site, which would take less time but also wouldn’t really benefit us in any way, or (c) leave it as-is, and let the few randos who are still looking for a product that was last relevant during the Obama administration wonder why they’re instead seeing our site.

I’m going with (c).

I’m also going with submitting re-crawl requests to both Google and Bing so we can get in the priority queue, and hopefully by this time tomorrow the site will be showing up in search results.

The day Facebook performed seppuku

I don’t have much to say about all of this, other than that I would probably, yes, be posting this on Facebook if it were affecting literally anything else in my known realm of existence.

Today Facebook killed itself. But its undead corpse will surely rise again.

The problem is some kind of colossal DNS snafu, which has, for all intents and purposes, temporarily caused facebook.com to cease to exist.

Ah… the air somehow smells fresher today. The water tastes better. The sun shines brighter.

But I know it won’t last.

Anyway… today’s the day it happened. Here’s some more in-depth information from Ars Technica which hopefully will not disappear down the Memory Hole anytime soon.

Update: This Cloudflare blog post probably provides the definitive explanation of what happened.

Hack your hosts file to prevent distracting yourself at work

I suppose it’s a significant statement on the increasing marginalization of the computer as a work-only device. I hardly ever touch my Mac at home anymore. I really only use it for work. The problem is, I am permanently logged into Facebook and Twitter on my computer, and I am prone to distraction.

So I made the decision today to further that marginalization, by making it impossible for me to access Facebook and Twitter on my Mac. How? It’s easy! Assuming you have administrator access, at least. But why wouldn’t you? (If your Mac is a company computer and they have things so locked down, I’d say don’t worry about blocking social media sites… spend that time working on your résumé.)

These instructions are for Mac OS X. I’m not really sure how to do this in Windows. (And, honestly, I don’t care.) Instructions for Linux would be fairly similar, but you’d do it in a Terminal window and there’d be some sudo involved. (Actually, you can do that on a Mac too. I’ll give those instructions at the end.)

Now then. Open a Finder window and press Command-Shift-G. In the box, type /etc/hosts and click Go.

Screen Shot 2015-11-12 at 7.58.15 AM

This will take you to the “hidden” /etc directory (part of the Unix subsystem) and highlight the hosts file, which is what you need to edit.

But, you can’t do it here.

Files in the /etc folder are write-protected, but if you copy it to your desktop, you can edit it. So, drag it to the desktop. (Note that since it’s a protected system file, just dragging it to the desktop will make a copy, rather than moving it.)

Double-click the hosts file on your desktop. It should open in TextEdit. (If you’re asked to pick a program, pick TextEdit.)

Place the cursor at the bottom of the file and add these lines:

127.0.0.1 facebook.com
127.0.0.1 www.facebook.com
127.0.0.1 twitter.com
127.0.0.1 www.twitter.com

So what’s happening here? Well, the numbers are IP addresses, which are the true addresses of every device connected to the Internet. Domain names (like twitter.com) are essentially “aliases” for IP addresses. Normally your computer connects to a DNS server on the Internet to look up these associations. But before it does that, it checks this hosts file. If a domain is in there, it doesn’t bother checking any further. And 127.0.0.1 is a special IP address associated with the fake domain name localhost — basically, it’s the computer’s self identification on the Internet. “Me” in other words.

There’s probably no web server running on your computer, so loading http://127.0.0.1 in a web browser will return… nothing. But even if you do have a web server running on your computer, it’s not Facebook or Twitter, so mission accomplished.

All right. Now that we have the hosts file updated, save it, and then drag it back into the /etc folder. You’ll get a stern warning from the system.

Screen Shot 2015-11-12 at 8.06.50 AM

Click Authenticate. That gives you another annoying, but smaller, alert.

Screen Shot 2015-11-12 at 8.08.13 AM

Click Replace. Now you have to enter your administrator username/password. Do that then click OK.

Screen Shot 2015-11-12 at 8.09.15 AM

You’re done. (And note this time it moved the file from the desktop back into /etc. It doesn’t copy it like it did when you moved it to the desktop.) Now try loading Facebook or Twitter in your web browser!

Screen Shot 2015-11-12 at 8.11.22 AM

Want to do all of this at the command line instead? It’s actually a lot easier, now that I think about it. These instructions should work for either Mac or Linux. Open a Terminal window. Type sudo nano /etc/hosts and hit Enter. Move the cursor to the bottom of the file and enter the lines I gave earlier. Press Ctrl-X then Y to save your changes. That’s all! Seriously!

Note: If you’re on IPv6 (if you even know what that is), you may want or need to use ::1 instead of 127.0.0.1.

SPF for dummies (i.e. me)

For a while I’ve known that (legitimate) outgoing email messages originating from my web server were occasionally not reaching their intended recipients. I also knew that there was a DNS change you could make to help prevent this problem, but I didn’t know any more about it and it was a marginal enough problem that I could just put it off.

Finally today I decided to deal with it. And I was (re)introduced to the SPF acronym. No, that’s not Sun Protection Factor, or Spray Polyurethane Foam, or even Single Point of Failure (although in my case perhaps that last one is accurate). No, it stands for Sender Policy Framework, and it’s an add-on to the core capabilities of DNS that provides a way to positively identify the originating servers of outgoing email messages.

My situation is simple: I have a domain name that needs to be able to send mail from either my mail server or my web server. Most of the tutorials I found for SPF were far too convoluted to address this simple arrangement. Then I found this post by Cyril Mazur which provided the very simple answer:

v=spf1 a mx ~all

Simply add the above as a new TXT record in your DNS zone file, and you should be set.