The DKos visit count non-issue: Ruffini’s wrong

There is no controversy regarding the Daily Kos visit count. Patrick Ruffini is misunderstanding the difference between what SiteMeter shows you on a summary page, and what is counted as a visit or a hit. SiteMeter is not inflating DKos’ site stats. In all probability, SiteMeter is undercounting DKos traffic, as it does with everyone. Let’s take Ruffini’s primary “gotcha” moment:

Then it hit me: SiteMeter only accounts for the last 100 visitors individually. On a site like Daily Kos, the 100th most recent visitor could have been 15 seconds ago. If you are the 101st most recent visitor and you click on a new page, you are counted as a new unique visitor in SiteMeter’s all important count. On a normal site, this wouldn’t matter, since it’s highly unlikely you’ll stick around long enough to have 100 others show up after you. On a site with hundreds of thousands of page views a day, it’s extremely likely you will.

Um, no. That’s not how it works.:

When you are browsing a site, every time you follow a link, it is treated as a single “page view”. Site Meter defines a “visit” as a series of page views by one person with no more than 30 minutes in between page views.

You are counted by IP address, not by virtue of being on the SiteMeter “last 100 visits” page. If I go to DKos and read a post, click on the message thread, spend the next 40 minutes reading the messages, and then click on the main page again, that counts as a single visit. Ruffini wrongly thinks that a second click is counted as another unique visit. It is not. SiteMeter counts a second click as another page view, but page views are entirely different statistics from visits.

Another error in his thinking is that SiteMeter “only” counts the last 100 visitors. No, it counts them all. (Well, except for the ones it misses, which is another complaint about SiteMeter.) It only shows the last 100 visitors, and only in the default free view. When you become visitor 101, you are still tracked as if you were visitor number 15 on that block of visitors that Patrick saw on his screen. But you are no longer seen on the “Last 100 Visitors” screen. And you are just as active on SiteMeter’s radar as you were when you could see your IP address in the Visitor 15 Slot.

On Patrick’s second point, that the clickthrough rate isn’t as high as Andrew Sullivan’s, well, that’s due to a number of factors, and it’s a common problem even with high-traffic sites. I know that I have a high clickthrough rate, even though my blog isn’t a very high traffic blog, because my readers tend to be longstanding readers with similar tastes in reading materials. They also trust my opinions. Linkfests used to be a staple of this blog, and still are, in respect to Haveil Havalim.

People don’t go to DKos for linkage. They go to DKos to read what’s there. Glenn Reynolds entire site is about clickthrough. People read Instapundit because they want to find other bloggers or information that Glenn provides. High traffic doesn’t guarantee high clickthrough rates. I’m not surprised that most people did not click through to Patrick’s blog. The DKos readers do not like conservatives. They do not like Republicans. Of course they’re not going to click through.

The content of the link also makes a difference. My highest-traffic links from Glenn all had to do with sex. The Comic Book Superhero Dating Ratings? Through the roof. I think I got nearly 10k hits from Glenn, whereas an ordinary Instalink generated about half that number. If Patrick wants to see DKos clickthroughs in high numbers, he needs to post that he’s given up being a conservative Republican and has joined the ObamaWagon, or some such thing. Or maybe something to do with sex and Democrats.

Lastly, Patrick tries to extrapolate DKos traffic via some arcane formula he invents regarding page views and stats of similar blogs. Ah, no. Bad move. That’s like trying to calculate the traffic on the NJ Turnpike based on the traffic on I-95 in northern VA and the Long Island Expressway. Now he’s just reaching, and looking really silly while he does it.

In short, there are many reasons to criticize Daily Kos. But blaming SiteMeter for inflating DKos visits and pageviews? No. That’s just a case of Patrick not really understanding SiteMeter and server logs. I don’t have the best grasp of them either—it’s been a long time since I read the raw server logs and decoded them for my boss at Lucent—but I do know enough to know that Patrick is way off base on this one.

This entry was posted in Bloggers, Computers. Bookmark the permalink.

22 Responses to The DKos visit count non-issue: Ruffini’s wrong

  1. Patrick says:

    This issue has been documented to death in my post and in the comments that follow, and I encourage anyone who’s at all confused about this issue to refer back to them.

    But a couple of points here:

    1) There is no controversy over the page view counts.

    2) In re:

    You are counted by IP address, not by virtue of being on the SiteMeter “last 100 visits” page. If I go to DKos and read a post, click on the message thread, spend the next 40 minutes reading the messages, and then click on the main page again, that counts as a single visit.

    No — that’s not how it works *for extremely high traffic blogs* like Daily Kos.

    If I visit a normal blog with about 1,000 visitors a day (like mine), that is how it works. The last 100 visitor list likely encompasses the last 30 minutes of visits, so if I’m visitor #99 and become active again 29 minutes later, SiteMeter remembers that I was visitor #99, increments my page view count, and resets me to visitor #1.

    The issue is what happens when I drop off the list, and become recent visitor #101. SiteMeter doesn’t remember that I specifically was ever there. The only thing it remembers it that there was once a visit #1,255,593 (that happened to be me), but it remembers nothing about me… not my IP address, not my ISP, not my browser, so it can’t tie my next click to the previous visit *even if it was within the half hour window.* So my next click on the site is treated as both a new page view AND a new visit when in fact it should only be counted as a new visit.

    Even for blogs that cycle through their latest 100 visitors in 15-20 minutes, this is a relative non-issue because so few people hang out on a blog that long. But when the last 100 cycles through every 10-15 seconds (of even every minute) it does impact potentially the vast majority of readers, and creates exactly the effect I described with a depressed visit to page view ratio and ridiculously low visit lengths.

  2. Matthew says:

    Meryl,

    The Site Meter Knowledge Center describes the differences between a Basic (free) account and a Premium (6.95/month) account: http://support.sitemeter.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=8

    The basic account tracks 100 visitors. The premium account tracks the last 4000 visitors.

    You can see this in action by going to DailyKos’s recent visitor page on Site Meter here: http://support.sitemeter.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=8

    Take a look at the Visit Length column. The longest visit of ANYONE on DailyKos happens to be less than the time difference between visitor #1 and visitor #100. No one is being tracked after they drop off the top 100 list. The same goes for Instapundit and any other web page using the Basic account.

    Patrick is right. You need to man up. :)

  3. Patrick, that’s not how it works.

    SiteMeter defines a “visitor” as someone who views a series of pages with no more that 30 minutes between page views. So someone who visits your page once every 31 minutes would be counted as a new visitor.

    Okay, you’re right about one thing, and that’s SiteMeter’s 30-minute new visitor rule. If you’re going to hang out for 30 minutes and do nothing, then SiteMeter is going to count you as a new visitor when you finally activate your cursor. That’s visit inflation, and it probably doesn’t count for much traffic at all.

    That bit aside, you are totally misunderstanding the way a visit is normally tracked. The 100-visitor list you are talking about isn’t an accurate way to determine who is online on a high-traffic site. It simply tells you who is online when you display the list. You don’t disappear from the list because the traffic is high. You move to another part of it, and it’s unseen, because that’s the pay-per-view version, not the free version.

    The 30-minute rule that you keep mentioning counts YOUR visit, not the number of visits in 30 minutes.

    Really, Patrick, you’re not getting how site trackers work. You’re confusing a bell-and-whistle with a nuts-and-bolts.

  4. Matthew, see above. You are both confusing the LIST of visitors with the TRACKING of visitors.

    Time to show you a raw server log of Yourish.com, methinks.

  5. Matthew says:

    Site Meter only tracks past 100 if you pay them to. Kos isn’t paying them to. Site Meter says that in their manual.

    From SiteMeter:

    http://support.sitemeter.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=122

    Why can I only see the last 100 visitors?

    Solution – If you have a basic account, you only have access to the last 100 visitors to your site. You can increase your visitor tracking to 4000 if you upgrade to a premium account. Unfortunately, you will not be able to immediately see the details of the last 4,000 visitors to your site if you upgrade your account. Once you have upgraded your account and you get more visitors to your site, the list will grow to a maximum of 4,000 visitors.

    So Yourish, if they were tracking more than 100 all along, why can’t they show you those figures as soon as you upgrade to the Premium account?

    Because they are NOT storing past 100 if you don’t pay them to.

    Try the expirement I described above.

  6. Tom Maguire says:

    OK, I understand what Sitemeter claims as their definition of unique visits, and I understand why keeping track of only the last 100 visitors could be a big problem at high traffic sites.

    But other than Meryl recycling the Sitemeter definition, I have not seen any evidence that Sitemeter takes the steps it would need to take to accurately track traffic for a high-visit site.

    Just for example, if a basic account only tracks the last 100 visitors, why is Meryl so sure that Sitemeter tracks a longer list for purposes of tracking unique visits? We all agree that they ought to, but that is hardly the same as agreeing that they do.

    Meryl’s rebuttal seems to amount to, “Sitemeter is doing it right because any idiot can see the right way to do it.” I’m not convinced (I hope that doesn’t make me an idiot…).

    And strongly in Patrick’s favor – why is the average visit to DKos only about one second – does that really jibe with anyone’s image of how that site is used, given the multitudinous comments?

  7. Patrick says:

    You say tomato, I say toh-mah-toh…

    I think this is probably a basic misunderstanding on semantics here. For instance,

    “The 30-minute rule that you keep mentioning counts YOUR visit, not the number of visits in 30 minutes.”

    I never said anything about the aggregate number of visits in thirty minutes. Your visit is exactly what I’m talking about.

    My argument is that the actual rule for “double counting” a visit in practice is 30 minutes or however long SiteMeter can store individual visit data for, *whichever is shorter.* If you have the free version that only stores 100 visits, that can be as low as 12 seconds if you’re Kos. If you have the premium version which stores 4,000 visits, it’s however long your site takes to get 4,000 visits, whether or not it’s shorter than half an hour.

    SiteMeter probably doesn’t specify this because the number of blogs for which this is an issue is exceedingly small, and when SiteMeter first took off, no one had a site on the network that big.

    “You move to another part of it, and it’s unseen, because that’s the pay-per-view version, not the free version.”

    There’s no evidence that on free counters (which is what Kos has) SiteMeter stores anything more than the last 100 visits. It would make no sense for them to keep the other 3,900 visits in reserve, because their whole premium product revolves around paying them for having to store 40x more data.

    Once again, if you look at Kos’s Detail screen, you won’t find a single visitor whose visit exceeds the amount of time it takes for the 100-visitor list to cycle through. If it were really a thirty minute rule, we would see visit lengths up to half an hour sprinkled through the list. No matter how many times you refresh, you never will.

    http://www.sitemeter.com/?a=stats&s=sm8dailykos&r=8

    You may have the premium version, but these are the actual limitations of the free version.

  8. Jim Hu says:

    Trackers like Sitemeter don’t know how long you are on a site. They only know the time gaps between clicks to pages in the same site with the same Sitemeter account. It’s really, really unlikely that the js is sending mouse events to Sitemeter.

    The typical visit to dKos is 0:00, which is someone goes to the site, reads the main page, and leaves. If the person leaves 29 minutes later, or an hour later, it still counts as a 0:00 visit. So that doesn’t support Patrick’s argument. It doesn’t disprove it either.

    What I think supports Meryl’s argument is that when I refresh dKos; sitemeter page, the 100 most recent gets totally recycles in a few seconds. But some of these are showing visits longer than the refresh times, which means that sitemeter must remember them past the scroll-off for the top 100.

    I could probably test this more rigorously by actually visiting dKos, but I have no desire to go that far!

  9. Kralizec says:

    The question is one as to whether there is a defect in Sitemeter’s tracking software. Meryl Yourish makes Sitemeter’s documentation authoritative as to how Sitemeter’s software really works; this seems such an obvious error that I disdain further comment on it. Yourish also relies on experience at Lucent as a basis for a claim as to how visitor tracking usually works. But whether Yourish’s experience at Lucent is relevant depends partly on whether Sitemeter’s software has the defect Patrick Ruffini thinks he has detected. Whether Sitemeter’s tracking works the way tracking supposedly usually works also depends on whether Sitemeter’s software has that defect. Moreover, the way software usually works is this: Software has defects. Meryl Yourish doesn’t appear to have settled anything.

  10. There’s no evidence that on free counters (which is what Kos has) SiteMeter stores anything more than the last 100 visits. It would make no sense for them to keep the other 3,900 visits in reserve, because their whole premium product revolves around paying them for having to store 40x more data.

    Um, yes, there is.

    Do you see the little number on the top of a SiteMeter summary screen? The one that says “Total” and has a number of the total visits ever tracked to your site? That’s an aggregation. So is the “average number of daily visits,” “average visit length,” and the rest of the aggregate statistics on the summary page. The SiteMeter statistics do not disappear into the ether after you refresh the screen. This is the fatal flaw in Patrick’s theory. The data is there. Just because you can’t see it doesn’t mean it isn’t being stored. What kind of site statistic tracker wouldn’t store information for customer use?

    “The latest 100 visitors” displays exactly that–the latest 100 visitors. If I go to DKos, and 100 people visit immediately after me, I am no longer on the list because THEY are the latest 100 visitors. I’m not going to magically appear on the screen again even if I click to another page. Your conclusion is incorrect because your premise is flawed.

    This is easily proven by testing it on a lower-traffic site. Whether the site is high- or low-traffic is not relevant. The question is, do you get counted twice because you move down the 100 Latest Visitors list? The answer is: No. SiteMeter specified that you get counted a second time after being inactive for 30 minutes. No other way.

    I admit I was getting a bit forgetful that SiteMeter uses javascript cookies, not server logs, for data. But you can get a hell of a lot of data from a cookie, including your IP address, the last site you visited, the referrer site, your browser, what software your computer runs on, and a host of other information. Don’t think that SiteMeter lets a drop of that information out of its clutches. That’s the business model—collecting and using that information.

    Really, guys, you’re way off the mark on this one. Stick to politics and leave the web tech alone.

  11. Patrick says:

    Do you see the little number on the top of a SiteMeter summary screen? The one that says “Total” and has a number of the total visits ever tracked to your site? That’s an aggregation.

    After you drop off the 100 list or the 4,000 list (from premium accounts), it knows there was once a visitor #1,865,533 — and THAT’S IT. It doesn’t keep my IP address, my browser, or any identifying information about me or use my cookie data to reference me back to that original visit. That data (critical for assessing visits within that 30 min window) is ONLY stored for purposes of the application within the 100/4,000 most recent visitors list. Reading by rote from SiteMeter’s FAQ doesn’t resolve the pretty clear anomalies you see on the DailyKos/Gizmodo/Lifehacker/Instapundit reports.

    Matthew (#5) was also pretty definitive on this. By far the most likely reason you don’t see extra visitor info immediately upon upgrading is that SiteMeter simply does not store personal info past 100 visitors for basic accounts.

    SiteMeter specified that you get counted a second time after being inactive for 30 minutes. No other way.

    Are you a spokesperson for SiteMeter now? They couldn’t possibly have any bugs in their code, or have their software perform differently in extreme circumstances. Stuff like that never happens. :-)

    Don’t think that SiteMeter lets a drop of that information out of its clutches. That’s the business model—collecting and using that information.

    I don’t buy it. Fundamentally, they are an ad company not a data company like Google. Look at all the advertising in the free accounts. Storing every bit of data on the third party sites they track is immaterial to their ability to sell ads on counter pages.

    Google on the other hand…

  12. tolbert says:

    Okay,

    When is somebody going to ask Den Beste how this shit really works?

  13. Mark Jaquith says:

    When is somebody going to ask Den Beste how this shit really works?

    Can’t we just take their FAQ at face value and accept that visits time out after 30 minutes?

    By far the most likely reason you don’t see extra visitor info immediately upon upgrading is that SiteMeter simply does not store personal info past 100 visitors for basic accounts.

    That may be true (that visitor “details” are only stored for display purposes). But they don’t need to store “details” to track visitors accurately. They need an IP and a timestamp. That’s it. I left more info in a comment on your site, Patrick.

  14. Tom Maguire says:

    A simple experiment easily performed at home indicates Patrick is right.

    During the day, Kos is getting about 30,000 visits per hour, or 100 each 12 seconds.

    so – open three copies of the Kos detail page, each listing 100 visitors.

    Refresh them roughly twenty seconds apart.

    If Sitemeter does not have the “Last 100” hidden constraint described by Mr. Ruffini, then the three lists should contain *zero* duplicates for IP addresses.

    But if Sitemeter does have the hidden constraint, then folks who click a second page after twelve seconds will appear twice.

    I did a cursory try (copying everything into Excel and sorting) and got twelve duplicate IP addresses among 118 ostensibly unique visitors over about a one minute span. And yes, the identical visits ocurred more than twelve seconds apart. For example, one IP was visitor 15 on one page and visitor 2 on another.

    So my money is firmly on Patrick here.

  15. Tom, were the duplicate IP addresses Comcast, Verizon, AOL, Optima, or other network services? Without that information, your duplicate IP numbers are meaningless.

  16. Mark Jaquith says:

    If Sitemeter does not have the “Last 100″ hidden constraint described by Mr. Ruffini, then the three lists should contain *zero* duplicates for IP addresses.

    You’re assuming that if a person moves up on the list, they’ve been counted as a new unique visit. I do know my IP address, and was able to track myself. It appears to display the last 100 unique IP addresses that accessed the site, sorted by the timestamp of their most recent page view (in descending order). So yes, if I let myself fall off the page and refresh, I’ll appear in the list again, but that does not mean that SiteMeter is counting me as a new unique visit.

  17. Mark, will you stop trying to inject logic into this debate, please? You’re making our job too hard.

  18. Tom Maguire says:

    Tom, were the duplicate IP addresses Comcast, Verizon, AOL, Optima, or other network services? Without that information, your duplicate IP numbers are meaningless.

    First of all, feel free to try this yourself – opening three pages and hitting refresh twenty seconds apart can be done in less than a minute. The copy/paste/sort is not exactly technological high frontier, either.

    Secondly, more than half the details provided by Sitemeter are a generic “Comcast”, Optonline”, or whatever, and I dropped those. Among 118 that presented a specific IP address, I got the twelve duplicates noted.

    From Mark:

    I do know my IP address, and was able to track myself. It appears to display the last 100 unique IP addresses that accessed the site, sorted by the timestamp of their most recent page view (in descending order).

    Really? I went to a very low traffic site and my timestamp has not changed even though I clicked on a total of five pages over a span of twenty seven minutes. And two folks appeared after I did, but but place on the list remained based on my *first* appearance, not my final refresh.

    Interestingly, I have now been there a total of thirty-three minutes but it still thinks I am one unique visitor; an obvious explanation is that Sitemeter is also tracking my “Last refresh” (as well as first visit) and will not think of me as new again until at half hour elapses from my last refresh.

    More details – Sitemeter is keeping a correct count of the pages I have visited at the low traffic site, but is holding my place based on the first entry timestamp, as noted.

    So, to test Mark’s notion, one might wonder – if you drop off the Hot 100 Sitemeter page and then open a new page at the test site do you (a) show up as a new visitor with a page view count of 1, or (b) show up as a new visitor with a page count of 2?

    I have switched computers yet again but I am pretty sure that the duplicate visitors in my Kos test were scored as hitting their first page, even though the time lag was about twenty seconds.

  19. Tom Maguire says:

    OK, now I have tracked myself at a medium traffic site (my own). I went there and quickly opened five pages, which made me very easy to find in the Sitemeter log.

    After about five minutes I moved out of the top 100; I opened another page and there I was, back at number 1, and showing total page views equal to one.

    That is what the Ruffini Hypothesis would have predicted; here is what mark said:

    So yes, if I let myself fall off the page and refresh, I’ll appear in the list again, but that does not mean that SiteMeter is counting me as a new unique visit.

    Let’s see – it shows me on the visitor page with a new timestamp and one page view, after spending five minutes correctly counting me as a one-time visitor opening multiple pages.

    OK – that does not *prove* that Sitemeter is not handling it correctly, since maybe all their stats are based on calculations to which we lack access. Of course, that amounts to a faith-based initiative that Sitemeter is doing it right.

    But what they are doing that we can see is certainly consistent with the notion that after 100 visitors you are new again regardless of elapsed time. At Kos, in prime time, folks are new after ten-fifteen seconds; at my site, it takes five minutes.

    More data – back at my test low-traffic site, I am now appearing twice on the main page as a unique visitor. I arrived at 4:43, did my next-to-last page-open at about 5:10, and just opened another page at 5:48.

    As predicted by Sitemeter fans, since a half-hour had elapsed since my last page-open, I am unique again.

    I think I am understanding this.

  20. CGHill says:

    I may have to play with this myself, since (1) I have what might be characterized as “light” traffic (600-700 daily) and (2) I am one of the nine or ten people on earth who actually pays for SiteMeter and therefore can go back 4,000 if I have to.

  21. Not unique, Tom, but a new visitor, yes. Turns out Patrick was right.

  22. Tom Maguire says:

    That’s interesting.

Comments are closed.