10 Lessons in Scalability from MySpace
I just read an interesting article called Inside MySpace.com. With 40 billion page hits a month (which they reached just three short years after launching), MySpace is currently living on the scalability frontier and thus offers unique insights into what kind of architecture and hardware it takes to service such massive demands.

Here are ten tips that I gleaned from the article.
- Master/Slave architecture is Good (used when less than 1 million users) - When mySpace outgrew a single database server, it first moved to a master-slave configuration where the master server received all the updates and then propogated the data to the other read-only slave databases. Unlike many applications that rely heavily upon reporting or infrequently updated data, MySpace was not able to utilize any OLAP style optimizations because it has to deal with a high degree of updates.
- Vertical Partioning is Better (used between 1-3 million users) When the site grew to the point where disk IO became a bottleneck and it was taking too long to transfer data between servers, mySpace moved to vertical partitioning model and divided the workload by logically grouping data onto different databases and servers according to function. Thus every time a new feature was added, it got its own database and server.
- Horizontal Partitioning is Best (used currently [3-27 million users]) Vertical partitioning eventually ran into roadblocks when it came to shared data and particularly demanding features, so mySpace developers ultimately re-engineered the app so that data was logically grouped according to users. Thus a user logged on to a centralized server that stored only authentication information and the location where that users’s data is stored.
- Optimize for Acceptable Data Loss When Possible - By acknowledging that they were able to tolerate a certain amount of data loss, MySpace was able to increase scalability by extending the time between database checkpoints and thus saving on significant IO in exchange for what they consider to be an acceptable risk of losing between 2 minutes and 2 hours of data.
- Microsoft is a Viable Vendor Option - When I first started working in the industry during the dotcom bubble, scalability was a big issue because everyone expected to eventually have to deal with MySpace-like growth problems. I remember encountering several Java developers who dismissed Microsoft based solutions with the phrase “It will never scale”. Despite pushing the limits on the number of simultaneous connections supported by SQL Server, MySpace has definitely proven that Microsoft is a major player in this arena. In fact, when MySpace migrated all their apps from ColdFusion to ASP.NET a few years ago, they noted being able to handle the same load with 40% fewer servers.
- Virtualized Storage trumps a SAN - Before mySpace upgraded from a SAN to virtual storage, they required 2 full time people to manually redistribute data across the SAN on a continuous basis.
- DoS (Denial of Service) safeguards are NOT so helpful - After having several of their windows 2003 servers randomly shut down, the mySpace IT Pros realized that they had to disable the Denial of Service safeguards built into Windows 2003 because they were being inadvertently triggered by the immense amount of traffic.
- Consider a Separate Caching Tier - To save on database hits, mySpace added a layer of servers between the databases and web servers that are devoted exclusively to caching data objects.
- Who Needs Load Testing? Since it is impossible to do realistic load testing on this scale, mySpace has dramatically streamlined their deployment cycle so they can rapidly get feedback and make corrections on their live site.
- Design Matters - In a related article, there is an interesting quote from Jakob Nielsen about how focusing on the usability aspects of the site and redesigning the flow to reduce the number of clicks required would have an even more dramatic effect on scalability than most of the technological solutions.
NOTE: I just noticed that the original article on Baseline magazine seems to be inaccessible now. I spent 10 minutes fruitlessly searching on what is one of the most frustrating, advertising laden sites I have ever seen ,but all I found was the title for the article sans content buried in mounds of gaudy advertisements. I guess it’s a good thing that I took notes while I was reading it.



When you just need to
Hey Now Coder,
E4, Nice Post we sure can learn a lot from a site that has that kind of volume.
Thx 4 the info,
Catto
This appears to be the same article, but on a different site:
http://www.cioinsight.com/c/a/.....ySpacecom/
@Jeremy - Excellent! Thanks for the heads up.
[…] 10 Lessons in Scalability from MySpace - An interesting analysis of the performance tricks that MySpace have used in order to service the demands placed on their systems. […]
On #5 above, they also rewrote slow portions of the application while changing over to .Net. It is mentioned in the article but it seems to be missed by anyone who reads it.
@George - Good point. Rewrites always have the advantage hindsight, especially if the developers rewriting it attempt to address existing pain points at the same time. The only reliable speed comparisons would probably have to be conducted in a much more controlled environment to ensure that apples to apples comparisons are done.