OK, well that title kind of gives away the end of the story, but it’s still a good one.
Earlier this week I launched a new site for a client. As part of the usual process, I submitted their
sitemap.xml file to Google Search Console and Bing Webmaster Tools. Usually that’s all it takes for a new site to get indexed within 1-3 days.
But it seemed to be taking longer than usual for this client, and I decided to investigate the situation.
I should note that we did a private “soft launch” of the site about a week prior to the official launch. During that time I had a robots “noindex” directive turned on so it wouldn’t start showing up in search engines prematurely.
I went into Google Search Console to request a re-crawl. And that’s when I noticed this…
Well, that’s… weird. Not so much that it had read a “noindex” directive when it, unfortunately, had crawled the URL just a day before we launched — although it was a bit weird that it had crawled it at all — but that the Referring page was a totally different site that should have had no business linking to us, yet.
So then I did what anyone (?) would naturally do, I visited that URL. And much to my surprise, it redirected to our site. What??
Next I used mxtoolbox to do a DNS lookup, and suddenly it all made sense.
We’re hosting the site at Linode. And as it happens, the DNS entry for the referring site is set to the same IP address as our site. This is a virtual private server, so we’re the only people now using this IP address.
But there are a finite number of possible IP addresses, especially IPv4 addresses (about 4 billion). So they naturally get reused. This particular site was for a limited-use product that was only relevant in 2015, so it’s not too surprising that the owners of the domain took down their Linode server and relinquished the IP address. It’s unfortunate though that they didn’t think to remove the DNS entry from their zone file.
At this point, we could (a) contact them and ask them to update their DNS, but that could be convoluted and time-consuming, for no real benefit to us, (b) set up a rewrite in our server that shunts traffic that’s trying to access their product site back over to their main site, which would take less time but also wouldn’t really benefit us in any way, or (c) leave it as-is, and let the few randos who are still looking for a product that was last relevant during the Obama administration wonder why they’re instead seeing our site.
I’m going with (c).
I’m also going with submitting re-crawl requests to both Google and Bing so we can get in the priority queue, and hopefully by this time tomorrow the site will be showing up in search results.