@TheGibson I dunno, there aren't a whole lot of orgs that could handle 50 million new users in a weekend without issues. There's always stuff you could have done better, because there's always something you didn't anticipate
Spending a lot of cash for something that MIGHT happen doesn't always make sense either, so…
But using some deployment scripts to expand your environment in the cloud on demand seems like a play here...
My understanding is that they are self hosted though ( Which I also applaud) so certainly struggles would be expected there.
Not being in the thick of it, it makes it hard to say what could be done better... I'm basically armchair quarterbacking... I think we all are.
That said, that team has been very busy for several days now... I do feel sorry for them.
@TheGibson the fact they recovered fairly quickly suggests to me they had a pretty good plan. I bet the retro will find holes. But until you've been through an event like this, you can't know which of your tech debt decisions are going to bite you or which assumptions you got wrong
Scaling is hard. Resilience is hard. Half the people chucking rocks have never had to solve either problem for anything
@TheGibson I respect the folks who clearly know how this stuff works providing their viewpoint. None of them are shitting on Signal though; their takes tend to me more measured and charitable
@darrenpmeyer I only wanted to clarify, because I responded to a thread on it yesterday, and didn't want my comments to go misconstrued...
providing public facing services at all is hard... sudden scaling is a whole other level of complexity.
A Mastodon instance for info/cyber security-minded people.