In case you haven't read about it yet:
On Friday afternoon Eastern Time, we started getting reports that something was seriously wrong with Steam, the digital PC games platform that services over 125 million users.
Needless to say, this is [a] colossal fuck-up, which the Steam tracking site Steam Database now suggests is due to a caching issue.
Knowing a bit about how these things work I am inclined to believe this was indeed a major internal fuck-up rather than a hacker's attack.
"Updating a caching mechanism" usually means that people deployed something to production after a month of pulling all-nighters, and it was a set deadline, because "we have to increase the robustness of our system for the Xmas spike in traffic".
The effect was of course the exact opposite - robust my ass, because the new caching mechanism wasn't properly tested - and how could it have been, since:
The deadline wasn't realistic in the first place.
Nobody knows all the differences between production and the QA environment.
Caching wasn't probably the way to resolve whatever the real problems were.
I'll bet it turned out that someone wrote a piece of PHP code or other crap somewhere that the whole system now relies on, and there's some config file somewhere that nobody knew about, and that great mind left the company five years ago.
Then they deployed the upgrade and all hell broke loose.
Information Technology is just a fancy name for "we have no fucking clue what we're doing, but look at all our security and PCI-compliance certificates issued by people who have no fucking clue what they're doing".