Archive for the 'Architecture' Category

Going Down the CQS Rabbit Hole

A few weeks ago I attended a rather thought provoking presentation by Udi Dahan on the Command-Query Segregation Principle.

If you’re like me, then you might have confused this newer architectural principle with the other CQS (Command Query Separation), which is the classic OOP design principle that I wrote about in a recent post.

The story goes that Martin Fowler tried to warn Udi and his cohorts that this confusion would occur when they first coined this term, but they decided to stubbornly stick to the term because it described the the architectural approach so perfectly.

The driving force behind CQS is simply the belief that many of the scaling and complexity woes exhibited by the vast majority of systems today can be traced to the misguided effort to unify commands and queries into a single conceptual model.

The thought goes that once you truly break apart these divergent activities and try to solve each problem independently, then some innovative and sometimes even blasphemous ideas start to emerge.

If you’ve never heard Udi talk on the subject, then I recommend reading this fairly concise article on it.

Here is my interpretation of some of the more thought provoking points that I remember from the talk:

Query-Side

  • This should be a read-only, de-normalized data-store with built-in latency that is separate from the command (transactional) data store.
  • Data should be not only de-normalized, duplicated, and aggregated as needed, but also stored in the exact same format required by views. This will eliminate the need for DTO’s and the usual array of transformations that occur on the way from the data layer to the UI layer.
  • Defining acceptable latency is key. Data should be organized according to acceptable staleness thresholds. Data may be duplicated many times to account for the fact that different pages usually have different tolerances for latency.
  • Since data is flattened out completely there is no need for any relations, which also means that RDMS is probably not nearly as ideal as one of the NoSQL databases.
  • Since these data stores only need read-only permissions, why do they need to be on a separate server behind the firewall? Udi points out that putting a document database on a web server that is in front of the firewall is actually more secure than using a caching solution. I can’t wait to have that conversation with our network guys and the PCI auditors…

Command-Side

  • Commands should all be asynchronous. This means you fire off the command and immediately return with a message to the user saying that they’ll be notified when the transaction has been processed. It’s amazing how many things really don’t need to be synchronous when you stop to think about them. Udi makes the point that the world got along perfectly fine (and in some cases better) when it ran on paper, which is intrinsically an asynchronous process.
  • Asynchronous is a better user experience. Is it really necessary to make 99.9% of the people wait an extra 10-15 seconds just because there is a slight chance that something may go wrong? Isn’t it a better user experience to handle the exceptional case manually or by having the user revisit the website to fix the problem?
  • To be asynchronous, UI’s must capture intent instead of just being dumb data entry screens. In a classic ticketing system, a user has to select the exact seats he or she wants and then retry multple times because at least one of the seats was taken by another user since the screen first loaded. In the CQS model, the user would simply specify that they wanted five seats near each other that were all within a certain price range and then the system would later send out a notification with the results after sufficient requests were accepted and a packing algorithm was run to satisfy the highest number of requests possible.
  • More onus is now placed upon the UI to ensure commands are valid and have a high probability of success. Once again, it is perfectly acceptable and cost efficient to handle the infrequent cases of failures by putting them in a manual intervention queue and even having a customer service person contact them.
  • Asynchronous through a queuing mechanism like MSMQ is much more reliable and probably the only way to meet most SLA’s (Service Level Agreements).
  • Each command should be totally independent (think separate VS solutions) so that they can be versioned separately. Udi refers to the benefits of reuse as largely a fallacy and the enemy of flexible versioning.
  • Since domain models no longer have to support queries, there is no need for the web of relationships that usually just reflects the RDMS. Entities should only contain what is needed to support the command and can be duplicated across commands.

Here’s a diagram of this model from the paper I linked to at the top:

CQS

In conclusion, this is definitely one of the most thought provoking talks I’ve ever attended.

I’ve already lobbied my boss to send as many people as possible to attend Udi’s training, which covers this topic in much more depth.

I’ve also pushed the task of investigating projects like NServiceBus and MassTransit to the top of my list.

Most of these suggestions are too radical to immediately act upon, but I have to admit that I’m sold on the concepts.

Popularity: 1% [?]

10 Lessons in Scalability from MySpace

I just read an interesting article called Inside MySpace.com. With 40 billion page hits a month (which they reached just three short years after launching), MySpace is currently living on the scalability frontier and thus offers unique insights into what kind of architecture and hardware it takes to service such massive demands.

scalability graph

Here are ten tips that I gleaned from the article.

  1. Master/Slave architecture is Good (used when less than 1 million users) – When mySpace outgrew a single database server, it first moved to a master-slave configuration where the master server received all the updates and then propogated the data to the other read-only slave databases. Unlike many applications that rely heavily upon reporting or infrequently updated data, MySpace was not able to utilize any OLAP style optimizations because it has to deal with a high degree of updates.
  2. Vertical Partioning is Better (used between 1-3 million users) When the site grew to the point where disk IO became a bottleneck and it was taking too long to transfer data between servers, mySpace moved to vertical partitioning model and divided the workload by logically grouping data onto different databases and servers according to function. Thus every time a new feature was added, it got its own database and server.
  3. Horizontal Partitioning is Best (used currently [3-27 million users]) Vertical partitioning eventually ran into roadblocks when it came to shared data and particularly demanding features, so mySpace developers ultimately re-engineered the app so that data was logically grouped according to users. Thus a user logged on to a centralized server that stored only authentication information and the location where that users’s data is stored.
  4. Optimize for Acceptable Data Loss When Possible - By acknowledging that they were able to tolerate a certain amount of data loss, MySpace was able to increase scalability by extending the time between database checkpoints and thus saving on significant IO in exchange for what they consider to be an acceptable risk of losing between 2 minutes and 2 hours of data.
  5. Microsoft is a Viable Vendor Option – When I first started working in the industry during the dotcom bubble, scalability was a big issue because everyone expected to eventually have to deal with MySpace-like growth problems. I remember encountering several Java developers who dismissed Microsoft based solutions with the phrase “It will never scale”. Despite pushing the limits on the number of simultaneous connections supported by SQL Server, MySpace has definitely proven that Microsoft is a major player in this arena. In fact, when MySpace migrated all their apps from ColdFusion to ASP.NET a few years ago, they noted being able to handle the same load with 40% fewer servers.
  6. Virtualized Storage trumps a SAN - Before mySpace upgraded from a SAN to virtual storage, they required 2 full time people to manually redistribute data across the SAN on a continuous basis.
  7. DoS (Denial of Service) safeguards are NOT so helpful – After having several of their windows 2003 servers randomly shut down, the mySpace IT Pros realized that they had to disable the Denial of Service safeguards built into Windows 2003 because they were being inadvertently triggered by the immense amount of traffic.
  8. Consider a Separate Caching Tier – To save on database hits, mySpace added a layer of servers between the databases and web servers that are devoted exclusively to caching data objects.
  9. Who Needs Load Testing? Since it is impossible to do realistic load testing on this scale, mySpace has dramatically streamlined their deployment cycle so they can rapidly get feedback and make corrections on their live site.
  10. Design Matters – In a related article, there is an interesting quote from Jakob Nielsen about how focusing on the usability aspects of the site and redesigning the flow to reduce the number of clicks required would have an even more dramatic effect on scalability than most of the technological solutions.

NOTE: I just noticed that the original article on Baseline magazine seems to be inaccessible now. I spent 10 minutes fruitlessly searching on what is one of the most frustrating, advertising laden sites I have ever seen ,but all I found was the title for the article sans content buried in mounds of gaudy advertisements. I guess it’s a good thing that I took notes while I was reading it.

Popularity: 18% [?]

Excessive Abstraction can lead to the Bends

In his blog post on Old School Programming, Wesner Moise waxes nostalgic about his pre-high school experiences with writing his own disassembler and assembler for the Comodore 64. Apparently he used his homemade dissassembler to decode and rewrite the entire 8k BASIC ROM back to source and then used his assembler to add his own extensions to the BASIC language to support structured programming and better graphics. Aside from retyping a few bouncing ball programs from the owner’s manual,

I believe that the only thing I ever did with my Comodore 64 was play pirated video games with my friends. For some reason all I can remember is a break dancing video game where I felt proud of getting the little block character to spin on his head. As you can imagine, the post made me feel like I belonged somewhere between plankton and an anemic protozoa on the alpha geek food chain.

By contrast I spent most of my day today in high altitude architectural abstraction. The bank I work for has hired a consulting company to do a data strategy gap analysis and we spent a good part of the day talking about BPM, SOA, and Master Data Strategy. There were several points where I had to politely suggest that we move on to another topic because we had risen a little too high in the abstraction layer and I was sure that someone walking past might have had trouble discerning whether the topic of the conversation had to do with coding, politics, or pepperoni pizza. The next two days will be more of the same.

Does anyone know if you can get decompression sickness by ascending too quickly from the depths of creating custom interrupts on the Comodore 64 in order to implement a rasterizer that bypassed hardware limits for 8 sprites to the abstract heights of macro workflows and model driven architecture? If anyone happens to observe any of these symptoms tomorrow, please sit me down and give me a steady dose of concrete examples. Thanks in advance.

[Originally Posted Monday, June 18, 2007 11:31 PM]

Popularity: 6% [?]

The Architecture Blues

A few years ago I shifted roles from a developer team lead who coded almost full time to an Architect. Although I am naturally a bit of an abstract thinking and find architectural issues interesting, I have always been hesitant about this role because of the ubiquitous Architecture Astronauts that have given the discipline a bad name. These guys have a knack for intimidating even the smartest of developers by throwing around architectural jargon that they can describe in the abstract but are completely unable to translate these ideas into concrete code.

Last night as I was lying in bed, I realized that I hadn’t written a single line of code in the last few months because of my involvement in some high level strategy initiatives, my advisory role in a number of projects, and my supervisory responsibilities over a solutions architect and a data modeler. I realized with horror that much of my time these days was spent just reading about fuzzy concepts like SOA, SaaS, BPM, EII, EAI, MDM, Data Governance, and I was sure that I had suddenly become one of those useless pieces of corporate baggage that real developers make jokes about. That thought prompted me to spend my last waking moments plotting various routes to getting back to my coding roots.

This morning I relistened to an old Hanselminutes interview with Jeffrey Snover, the Powershell Architect, to help prepare for an upcoming .NET User group talk that I will be giving on Powershell. It was an excellent podcast, but what caught my attention the most was when Jeffrey mentioned that in the beginning of the project he had locked himself in a room for a month and pounded out a 15,000 lines of code as a proof of concept that he then used to convince people to get the project started.

THAT is the kind of architect that I want to be when I grow up! I need to figure out a way to get myself out of all these meetings about process improvement, data strategy, and regulatory compliance and start focusing on some nice juicy proof of concept work.

[Originally Posted Friday, June 15, 2007 12:10 PM]

Popularity: 6% [?]