Find the mode of an array in PHP

For those of you who don’t remember studying statistics in math (and I barely do), the mode refers to the value that occurs most frequently in a set of data. That contrasts with the mean — what most of us call the “average” — and the median, which is the “middle” value if you sort all of the values in order.

My daughter was recently studying all of this and it brought it back to my mind. These are really not things I use often. But, as it happens, right now in my work I have a need for a PHP function that determines the mode in a set of data.

In this case, it’s not actually numbers; it’s dates. In short, I have an array of dates, and I need to know which date occurs the most often in the array. You’d think there would be a built-in PHP function for this, likely called array_mode() or else something long and completely illogical, or short and incomprehensible. But alas, there is no array_mode() function.

Fortunately, it’s pretty damn easy to write one. I found some examples on other sites, but they weren’t pithy enough for my tastes, so I rolled my own. Now you don’t have to:

function array_mode($arr) {
  $count = array();
  foreach ((array)$arr as $val) {
    if (!isset($count[$val])) { $count[$val] = 0; }
    $count[$val]++;
  }
  arsort($count);
  return key($count);
}

Perhaps it’s excessive even to bother casting $arr as an array, but it’s a habit I picked up a long time ago and can’t seem to shake. Anyway, there you have it. (Of course, this probably breaks if $val isn’t scalar, but I’ll leave that to you to fix.)

From the Stupid PHP Tricks files: rounding numbers and creeping inaccuracy

This morning as I walked to the studio I was doing what geeks do best: pondering a slightly esoteric mathematical quandary.

Glass Half Full by S_novaIngraining the American spirit of optimism at a young age, and under dubious circumstances, our schools always taught rounding numbers in a peculiar way. You always round your decimal values to the nearest integer. That part makes sense. But what if the decimal is .5 — exactly half? In my education, at least until late in high school (or was it college?), we were always taught to round up! The glass is half full. Optimism.

Eventually — far later than it should have been, I think — the concept was introduced that always rounding .5 up is not really that accurate, statistically speaking. It might be nice in the case of a single number to be an optimist and think a solid half is good as a whole, but in aggregate this thinking introduces a problem.

If you have a whole lot of numbers, and you’re always rounding your halves up, eventually your totals are going to be grossly inaccurate.

Of course, the same would happen if you were ever the pessimist and always rounded down.

The solution, I later learned, was to round halves up or down, depending upon the integer value that precedes them. Which way you go doesn’t really matter, as long as you’re consistent, but as it happens, I learned it as such: if the integer is odd, round up; if it is even, round down.

In my work, I write a lot of PHP code. Most of it is of the extremely practical variety; I’m building websites for clients, after all. But every once in a while I like to indulge my coding abilities in a bit of frivolous experimentation, and so today I produced a little PHP script that generates 10,000 random numbers between 1 and 100, with one decimal place, and then it shows the actual sum and average of those numbers, along with what you get as the sum and average if you go through all 10,000 numbers and round them to whole integers by the various methods described above. Try it for yourself!

Any time the rounded average is different from the “precise” (and I use that term somewhat loosely) average, it is displayed in red. Interestingly, and not at all surprisingly, when you always round halves in one direction or the other, at least one of those directions will (almost) always yield an incorrect average. Yet if you use the “even or odd” methods, both of those methods will almost always yield a correct average.

It’s all about the aggregate.

Fun with site usage stats, part two

Back in February, I wrote about web browser usage by visitors to my site. Some of the discussion over my recent redesign has prompted me to do it again. Here we go!

Web Browsers

browser-20091021.png

Compare to last time: Firefox has jumped from 34% to 47%. That gain has come at the expense of both Safari and IE, which have dropped from 33% to 27% and from 28% to 17%, respectively. (Note, of course, that I’m rounding the actual percentages to whole numbers because talking about “16.88%” makes me feel like Spock on Star Trek, and I’m enough of a geek without that.)

Also worth noting: Chrome. It is stuck in fourth place, but its share has jumped by 4.1% from 1.44% to 5.54%. (OK, in this instance I needed to Spock it up a bit.)

Operating Systems

os-20091021

Once again, as a Mac user who also (unfortunately, despite my feeble efforts at self-promotion) represents a hugely disproportionate amount of the total traffic, I’m skewing the results here a bit. Still, I have not significantly altered my own usage of the site since February, but in that time Windows has nonetheless dropped from 56% to just under 50% of my total traffic, while the Mac has gone from 29% to 43%. Interestingly, in February, iPhone/iPod represented over 12% of the traffic but now they’re just over 4%. Linux has stayed pretty even, in between 2 and 3%.

OS/Browser Combinations

browser-os-20091021

In February, IE/Windows was the dominant combination, at 28%. Now it has dropped to fourth place, at 17%. Firefox/Windows has gone from #2 to the top spot, even though it just inched up from 25% to 26%. Safari/Mac and Firefox/Mac each went up a spot as well, moving into second and third, and going from 21% to 24% and from 8% to 18%, respectively.

Conclusions

This is far too small and skewed a sample to say a whole lot about trends on the Internet as a whole, but what I’m seeing here overall is that Mac usage vs. Windows is up, and Firefox usage vs. anything else is also way up. Specifically I’m seeing a significant surge in Firefox/Mac… which may suggest, I suppose, that I have been visiting the site a lot more lately than I did in February. Or maybe not.

It’s also worthwhile to look at the raw total numbers in the traffic. In the time between then and now I’ve split up room34.com into a number of separate sites. The totals back in February were across the board on room34.com; for October we’re looking at stats strictly from blog.room34.com. The date range is the same: 30 days. (The original data was from January 19 to February 18; the new data is from September 20 to October 20.) Back in February, the data I analyzed represented 2,845 unique visits to my site. This month’s data represents 3,810 visits, an increase of 965, or 34%. Since the old stats included visits to a lot of pages that are now parts of other sites, the increase in blog traffic is even greater. So while it’s probably true that I’ve been spending more time looking at the blog myself in the past month, vs. February (considering I just did a redesign this weekend), the majority of the traffic increase is most likely not from me. In fact, it’s probably quite likely that my own percentage of the total traffic is quite a bit less than it was in February. Traffic here spiked on October 13-14, when I posted a reply to Derek Powazek’s blog on SEO — visits to that single page, just on October 13, represent more than 10% of the total traffic the entire site saw all month.

Let’s take a look at the OS/browser breakdown for just that one day, October 13, 2009:

os-browser-20091013

The traffic from this one date was likely responsible for some overall skewing of the totals. Derek Powacek’s blog appeals most strongly to Mac users, which would explain why the Mac/Safari combination is in the top spot (Safari being far more popular in general on Macs than Firefox, for the same reason IE dominates Windows — it comes with the OS).

Lessons to be learned? Well, if I want traffic, I should write about SEO. The SEO bots (both human and software) seem to love it. But beyond that, I think there probably is some valid evidence here that there’s some real movement in the directions of both Mac and Firefox. Something that sits just fine with me!

Final Thought

What’s the deal with this “Mozilla Compatible Agent” on iPhone and iPod? I haven’t seen that before, but I assume it’s one of two things:

1. A Mozilla-derived alternative to Mobile Safari, available only on “jailbroken” iPhones.
2. An embedded client in an app like Facebook, which allows you to view web pages without leaving the app.

I’m inclined to guess that #1 is correct. I’d be surprised if any Apple-approved apps were running a Mozilla-based web browser; it seems it would be far easier and more logical to develop legit apps using the official WebKit/Mobile Safari engine. I haven’t seen any hard numbers (nor do I think it would be possible to obtain them) on the percentage of iPhones in use that are jailbroken, but if this assumption is correct, and we can assume that the ratio of “Mozilla Compatible Agent” to Safari on the iPhone/iPod platform represents at least the percentage of iPhones that are jailbroken (since I’d assume some jailbroken iPhone users still use Mobile Safari), then the numbers are staggering indeed.

However… given the fact that over 8% of the total traffic on October 13 came from this user agent, and I myself visited the site numerous times on that day from my (non-jailbroken) iPhone, to monitor and respond to comments, I suspect a much more innocuous explanation. But a brief yet concerted effort to find an explanation on Google turns up nothing. Anyone in-the-know out there care to shed some light on the situation?

Yes, it has been colder in Minneapolis this summer… except when it wasn’t

There’s a bit of a brouhaha afoot with regard to our weather in Minnesota this summer, and whether it proves or disproves climate change.

A good summary of the “debate” appeared yesterday on Alas!

It started with a Minneapolis-based wingnut blogger relying on anecdotal evidence to prove… something.

Statistics guru Nate Silver responded with a bunch of boring old facts that dispel the argument of a colder-than-normal summer.

I just have a few comments to add to the fray:

1. If climate change is real (and it’s pretty much impossible for an honest, rational person to deny at this point), anecdotal evidence of a chilly month of July in one city doesn’t do anything to disprove it. And if you’re not looking at hard numbers, it’s easy to endure this cold July and forget just how hot it really was at the end of June.

2. Rising global temperatures associated with climate change emphatically do not mean that the resulting weather change in any particular location will manifest as a simple 2-3 degree temperature increase, and identical weather as before. In fact what it means is that global weather patterns will change significantly, and unpredictably, with some parts of the globe experiencing significantly hotter temperatures, some cooler, and more severe weather events occurring in more places than before.

Forget red state/blue state: it’s really red browser/blue browser

Sean Tevis browser statsAnyone who’s read this blog for any period of time knows my political leanings pretty well. I’m about as liberal as they come in this country (which means I’m probably middle-of-the-road anywhere else). And the same reader(s) probably also know(s) how I feel about Internet Explorer 6.

Well it’s interesting to see that there seems to be a correlation between political viewpoint and web browser usage. As (almost) always, this comes from Daring Fireball. We’re looking at the decidedly non-traditional campaign blog of Kansas Democrat Sean Tevis. His campaign did a survey that, among other things, discovered that users of outdated browsers like Internet Explorer 6, AOL, “Don’t Know” and “No Internet” preferred, strongly, his Republican opponent, while users of Firefox, Chrome, Opera and Safari preferred Tevis. Interestingly, IE 7/8 users slightly favored Tevis.

It would be interesting to see the raw numbers, rather than just percent deviation, to get a sense of the relative proportions of the electorate who fell into each category, especially considering that Tevis apparently lost, by a small margin.

It’s also interesting to look at the strength of each group’s leanings. Those who most strongly favored the Republican candidate were the AOL users and non-Internet users, a.k.a. the Luddites. Chrome users (all on Windows) were the strongest Tevis supporters, followed by Safari (presumably all or nearly all Mac) users. Firefox users were slightly weaker supporters of Tevis. This makes sense to me in that I suspect there’s a high correlation between “average” Mac users (who almost all use Safari, just like most “average” Windows users run IE) and Democratic leanings, whereas users of Firefox (and of open source software in general) are as likely (or moreso) to be libertarian as liberal. Opera… well… I don’t know. Contrarians?

That IE 7/8 users slightly favored Tevis is most interesting to me. IE 7/8 represent by far the largest percentage of the Internet-using population. And the country as a whole moved slightly in the Democrats’ direction in the 2008 election. But Kansas is far more conservative than the US populace as a whole; combine that with the “No Internet” crowd, and a small margin of victory in favor of the Republican candidate makes sense.

P.S. Sean Tevis for President 2016.