Mike Ozier, head of IT operations for Netflix, explains what caused the 3 day service disruption on the Netflix Community Blog:
Now that things are back to normal following last week’s shipping outage, I’d like to shed some light about what happened, why, and what we’re doing about it. On Monday, 8/11, our monitors flagged a database corruption event in our shipping system. Over the course of the day, we began experiencing similar problems in peripheral databases until our shipping system went down. It was going to be a long night.We suspected hardware and moved the shipping system to an isolated environment, gradually getting DVD shipments moving again. Eventually the system was repaired and shipping returned to normal conditions. With some great forensic help from our vendors, root cause was identified as a key faulty hardware component. It definitively caused the problem yet reported no detectable errors. We’ve taken steps to fortify our shipping system with the acquisition of additional equipment and worked with our vendors to verify we’re in good shape elsewhere.
Hope this was helpful and thanks for your patience.
Lets put this in the BS to English translator:
Input BS:
"With some great forensic help from our vendors, root cause was identified as a key faulty hardware component."
Output English:
"Database performance was terrible, Netflix had no idea why and no idea what to do. Netflix contacted vendor-x. Vender-x suggested installing a bunch of new high-end hardware. Netflix waffled for a couple hours while their in-house database engineers struggled to find/fix the problem. Management lost patience with database engineers and decided to throw hardware money at the problem. New hardware installed, masks real problem for unknown period of time. In the near future problems will come back."
Posted by: ScottZ | August 26, 2008 at 10:34 PM
ScottZ: Did you read any more than that one sentence?
"On Monday, 8/11, our monitors flagged a database corruption event in our shipping system." That's not a performance issue. And they didn't say the hardware wasn't fast enough, they said it was "faulty."
I've encountered faulty hardware. Have you? Put in a replacement part, same make-and-model, and the system goes back to working normally.
Posted by: Tom | August 26, 2008 at 11:48 PM
SCOTT YOU IS ABSOLUT CORRECT! IS IT CONSPIRACY TO STEAL OUT MANHOODS AND SLEEP WITH OUR WOMENFOLK! YOU IS SO SMART TO SEE THROUGH THE EVIL CONSPIRASY THAT I AM TURNED ON. I TOTALLY WANT YOU! NETFLIX MADE MY WIFE EXPLODE AND MY CHILDREN ABDUCTED BY ALIENS FROM PLANET X. THEY MAKE MY PENIS SHRIVEL TO A RAISIN. I'M SURE YOU KNOW THAT FEELING, BUDDY. KEEP UP THE GOOD FIGHT. NEVER GIVE OUT, NEVER ASUNDER!
Posted by: Scott Z's New Best Buddy | August 27, 2008 at 10:45 AM
These dipshits don't have redundant systems to failover to? What the hell?
Posted by: Anonymous | August 27, 2008 at 04:15 PM
I had no movies from Netflix last weekend, so I headed down to Hollywood Video. They're promoting unlimited free blu-ray rentals. The only catch is, only one free rental out at a time. You might want to check it out.
Posted by: BuffyVee | August 27, 2008 at 05:03 PM
Anonymous: You're chasing the red herring of hardware failure. The issue isn't that a piece of hardware failed. If it just failed, no big deal -- your redundant systems can handle it, or worst case, you shut down for an hour or two and bring up systems on standby.
The problem is that the hardware failed in such a way as to result in a *database corruption*. And then the corruption started spreading as the servers replicated with each other. The downtime was how long it took to setup a fresh set of servers from scratch.
Posted by: Tom | August 27, 2008 at 07:05 PM
I don't buy it for one minute. I'll bet some disgruntled person got fired on Monday and urinated on the servers. That is why the Bobs say that you should always let people go on a Friday. It leads to less confrontation.
Posted by: leonardodicrapio | August 27, 2008 at 07:20 PM
While this explanation sounds plausible I still stand by the theory that the issue they mentioned of keeping profiles was going to cause made this crash happen.
Posted by: Akbar | August 27, 2008 at 11:05 PM
Data corruption? Replace parts? Redundant system?
S**t bro, they use Micro$oft products just like every one. Now if they'd used a Unix based system, this never would have happened.
How do you spell "Sun Computer Systems."
Get off the Micro$osft monkey. Go Unix.
Now if you don't buy that, I propose a hacker ripped off all their confidential data, and it took netflix.com a few days to cough up the 5 million bucks in ransom to get there data back.
Posted by: Bunhole | August 28, 2008 at 03:04 AM
I would also like to thank all of the above members of netflix.com for clarifying this complicated technical issue for me.
You are they greatest, and in the true tradition of IT cannot agree on the cause of anything.
I love you all, Bunghole
PS I really, really hate Micro$oft; and several years back when the Soviet thugs hacked into Redmond, WA, only a few miles down the road from where I live and ripped off the code for Windo$e XP, I got some really good dope and was in a stoned groove for a week.
Posted by: Bunhole | August 28, 2008 at 03:22 AM
cyborg400,
That's disrespectful. Don't go there.
Posted by: Galofree | August 28, 2008 at 11:52 AM
Inquiring minds want to know: what was the hardware component and who's the vendor of the hardware component?
Until that information is published, why wouldn't we all just think that Osier smokes hash in the computer room? Heck, maybe it was Osier's secondary hash smoke that caused the hardware failure. ;-)
Posted by: Edward R Murrow | August 28, 2008 at 12:22 PM
So, Mr. Murrow, have you had any issues getting your Netflix discs recently?
Posted by: Inquiring Mind | August 28, 2008 at 04:23 PM
Too much hash, can't remember any Netflix shipping issues. Although I did give the test pattern that I watched for 2 hours a five star rating.
Posted by: Edward R Murrow | August 28, 2008 at 04:53 PM
If you guys weren't making me laugh so much I could tell ya all a story about discs still taking 3 or 4 days to get to me, while returns only take their usual one. But, may be it's really, really good hash...
Posted by: eviltimes | August 28, 2008 at 05:32 PM
I think Ed and I have identified possible sources of the outage, which I will repeat in a more PC manner as: water, fire, and/or smoke damage.
Of course maybe Raj and Michael Bolton took one of the servers out to the field with a baseball bat and a couple drop kicks.
Posted by: leonardodicrapio | August 28, 2008 at 06:30 PM
Is the customer service honeymoon over? After being a subscriber for about 15 months and experiencing nothing but exemplary customer service from Netflix, the tide has finally turned. There was once a time when Netflix customer support personnel would automatically send a bonus disc for the slightest oversight or delay. That is what made Netflix one of the best customer oriented vendors nationally; ranking along side of American Express. Now a Netflix customer service employee tells me that management is curtailing the bonus DVD compensation no matter how infrequent. Take good care of your loyal subscribers because it is easier to keep your current customers than it is to gather new ones, especially in a soft economy.
Posted by: | August 28, 2008 at 10:03 PM
"Is the customer service honeymoon over?"
Yup. People who abused the policy ruined it for everyone else. The customers who rent a lot of discs and call and demand a bonus disc whenever there's a day's delay are obviously worth a lot less than the average customer.
Posted by: kh | August 29, 2008 at 08:53 PM
Customers who rent more movies will get throttled and will end up with just a few movies per month, only instead of actually holding the DVD, they'll be tracking its way throughout the country.
Unlimited service? I wish it was limited, that way I wouldn't have to keep track not to pass the 'one week cycle'.
Posted by: Ned | August 30, 2008 at 02:57 PM
Love ya cyborg400 -
You have such a straight sense of humor. Let's have a beer and I'll let ya in on my real feeling, bro.
You da man, Bunghole
PS What really fries my buns is Bill ripping off consumers for an OS that competitively should sell for around $50, and that they sell for $200, and its a crummy system. Then, like Rockefeller, who repressed the working class proletariat thereby filling his money coffers with blood money, he becomes a philanthropist in his later year to groups who the people he ripped off will never get within10 miles of, just like that elitist swine Bill Gates.
It's only the neive consumers who have been mislead, and manipulated, so they are reluctant to go Unix/Linux, and the illegal marketing of Micro$oft that eliminate any market software developers might fill with products netflix.com could use to stream to Unix/Linux.
Hell, since 1996 I've had a partition with every Slackware edition since 3.9, currently 12.0 that has a comparable product installed open source, that is free, comparable to every product Micro$osft sells. If netflix.com wasn't defendant on Windoze cause consumers are too meek to buy anything else, there whole system would be more stable, and cost less so we would pay less money. I've never had file corrumption, system crashes, its hacker and virus resistant, ya never has to defrag, and its faster.
Posted by: Bunghole | August 30, 2008 at 03:15 PM
"PS What really fries my buns is Bill ripping off consumers for an OS that competitively should sell for around $50, and that they sell for $200, and its a crummy system."
If Windows sold for $200, you wouldn't be able to buy a computer for $300-$400.
Posted by: leonardodicrapio | August 30, 2008 at 03:40 PM
Profiles! Didn't they just add the ability to create new profiles?
Posted by: Spimby | August 30, 2008 at 10:38 PM
Things aren't back to normal. This week I had two DVDs arrive a day late twice this week which puts one set into next week thus saving NETFLIX additional money as a result of the Labor Day Holiday. I believe these delays are intentional as NETFLIX is trying to recover the 15% credit they are giving us.
Posted by: RAYMOND KNIGHT | August 31, 2008 at 09:36 AM
Things aren't back to normal. This week I had two DVDs arrive a day late twice this week which puts one set into next week thus saving NETFLIX additional money as a result of the Labor Day Holiday. I believe these delays are intentional as NETFLIX is trying to recover the 15% credit they are giving us.
Posted by: RAYMOND KNIGHT | August 31, 2008 at 09:39 AM
says the riddler - blame the colorful bird thats been in the nest for too long and laid no eggs.
Posted by: joker | August 31, 2008 at 06:19 PM
Funny how disc 1 of tv show ships from southeastern PA, but then disc 2 and 3 ship from the closest shipping center to me. Sure looks like they are trying to get that money back.
Posted by: wufan1981 | September 01, 2008 at 03:45 AM
"Funny how disc 1 of tv show ships from southeastern PA, but then disc 2 and 3 ship from the closest shipping center to me. Sure looks like they are trying to get that money back."
Or, they did not have disc 1 available at your local hub?
Sometimes the simplest answer really is the correct one.
Posted by: Maybe | September 02, 2008 at 11:04 PM