- Availability calculated for the first time
- In 264,000+ minutes, 412 minutes’ downtime
- Why Dropbox had a bad Christmas
Cloud storage service Dropbox has been written about more times than any of its competitors. One fact you won’t find in any article, however, is a statement of its reliability, or availability in IT speak.
Cloud storage providers have been reluctant to commit to a guaranteed level of availability despite the millions of people who rely on them. Dropbox alone has over 50 million registered users yet none of them know how many hours each month the service is expected to stay online.
While consumers may be happy to take their chances, businesses need to know they can access their files when they need to.
BoxFreeIT has studied six months of transmission logs from network monitoring service Pingdom to calculate the monthly uptime for Dropbox from July to December last year.
In over 264,000 minutes of operation, Dropbox was offline for just 412 minutes. But this is only half the story.
Although Dropbox was offline for a total seven hours over six months, Pingdom recorded abnormal results for another 574 minutes. Pingdom called these hiccups “unconfirmed down” events and were often accompanied by notes such as 503 error, 404 error or timeout.
These events often were isolated one-minute events, unless it was leading up to an outage.
Should “unconfirmed down” events be subtracted from Dropbox’s record of reliability? The fairest answer was to work out whether a Dropbox user could use the service as intended during an “unconfirmed down” event.
Is an Unconfirmed Down really down?
Availability is sometimes a rubbery figure. Some cloud providers don’t include scheduled downtime where they take the system offline for maintenance or upgrades. Another tactic is to exclude downtime if it doesn’t exceed a minimum threshold.
Google used to ignore downtime that was under 10 minutes. It claimed it was the first cloud provider to dock all downtime against its availability when Google changed its availability policy in 2011 and scrapped the 10-minute minimum.
Users are often unaware of exceptions or conditions in calculating availability. For them the metric is much simpler – does it work? If it doesn’t, the system isn’t available. Uptime is not the same as availability.
Back to those unconfirmed downs. When Dropbox showed a 503 error or a timeout, the Pingdom server couldn’t connect to Dropbox. The Pingdom server would often check again in the same minute and returned an “OK” result.
But the unconfirmed downtime regularly coincided with longer connection times. In some cases where repeated unconfirmed downtime events occurred closely together, these effectively made Dropbox unusable.
There was at least one incident in late August where Dropbox remained online but users were unable to access the service. The issue was serious enough for a Dropbox executive to make a public announcement.
“We’re experiencing heavy load which is resulting in intermittent slowness/downtime,“ Dropbox’s chief technical officer and co-founder Arash Ferdowsi posted in the company’s forums on August 21.
“In some cases there’ll be a delay when syncing files through the desktop client but the delay shouldn’t last more than a minute or two.”
The Dropbox logs show that there were seven minutes of unconfirmed downtime on August 20 and eight minutes of unconfirmed downtime on August 21. Although Dropbox didn’t record a “down” status over that 48-hour period, the performance was poor enough to prompt coverage by tech site TechCrunch.
The forum post by the Dropbox CTO also attracted 30 comments from disgruntled users.
In short, a cloud service must not just be active, it must connect to users at a reasonable speed to be considered available. While the average response time was generally around 513 milliseconds it could blow out with the maximum a whopping 29066 ms (23 August).
Dropbox still recorded that connection as up.
Was every unconfirmed down long enough to cause an issue for a user somewhere? It’s impossible to know. The safest conclusion is that a percentage of “unconfirmed down” events made the service unusable.
In the best case scenario, which includes only confirmed downtime, Dropbox’s average availability was 99.85 percent. From July to September the service had a great track record of 99.95 percent but some heavy outages towards the end of the year, especially over Christmas, dragged the results down.
Dropbox recorded 106 minutes of downtime in a 48-hour period during Christmas Eve and Christmas Day. The most likely culprit was an outage at Amazon Web Services, which stores files for Dropbox. (At the time the headlines focused on the impact on Netflix.)
In the worst case scenario, which includes downtime and unconfirmed downtime, Dropbox’s average availability was 99.63 percent. The worst month was December which scraped through at 99.04 percent.
See the table below for the month-by-month breakdown.
What does this mean for my business?
A business that expects Dropbox to function 99.85 percent of the time could see up to an hour and seven minutes of downtime in a 31-day month. When that percentage drops to 99.63 percent, the acceptable downtime jumps to two hours and 45 minutes a month.
That is getting close to the average 3.8 hours of downtime a month for the average email server, according to research by the Radicati Group (a paid report mentioned in Google’s post).
In the best light Dropbox is nearly as reliable as benchmark services from Google and Microsoft. The cloud productivity suites Google Apps and Microsoft Office 365 both claim 99.9 percent availability (downtime of 43.8 minutes a month).
Even at its worst Dropbox still comes ahead of the average on-premise server as measured by Radicati. On-premise servers can vary enormously in uptime due to factors such as quality of equipment and the competence of the IT administrator.
Some caveats for this exercise. The exact availability of a cloud service is very difficult to determine and the numbers above should be used as indications.
The availability of a cloud service depends in part on the route a user connects to it. Over the six-month period Pingdom measured Dropbox’s performance from servers in 12 locations including Amsterdam, England, Canada, France, Czech Republic, Germany and several states in the US.
Network issues in one country may not be experienced elsewhere.
Pingdom’s logs also appeared to be short by 400 minutes or so (264535 entries versus 264960 minutes in the six months). It wasn’t possible to determine the discrepancy.
The missing information represented 0.2 percent of the total collected. Dropbox’s larger outages were corroborated with other sources where possible.
Next week: Box, SkyDrive and Google Drive – as reliable as they claim?
Image credit: Design Tickle