Archive for the 'Architecture' Category

Just Say No to Manual CRUD

I’ve been working a lot with Castle’s Active Record and Ruby on Rails in the last month and as a result have written significantly fewer basic CRUD operations and database access code. It’s been an addictive experience and has caused me to rethink the proper role of hand-written database code (sprocs) within an application.

Although I feel perfectly comfortable in a set-based world writing SQL, it has traditionally been one of my least favorite areas of coding. Besides being relatively repetitive and tedious, at least when it comes to basic CRUD operations, sprocs are much more difficult to handle when it comes to source control, versioning, debugging, and unit testing.

For example, at my last job we were tasked by auditors to come up with a build and deployment process that included version traceability and rollback capabilities. It was pretty easy to put together an acceptable solution for assemblies, since automatically versioning dll’s in a trusted way is simple via the AssemblyInfo and rolling them back is trivial since everything is contained in a manageable number of dlls that simply need to be copied from one folder to the next. When it came to sprocs, however, the best we could offer auditors was some hackery around adding comments about the version at the top of each sproc definition file along with a big disclaimer that there was no guarantee that files were not modified by DBA’s along the way.

At my current job, database code causes us even more trouble because the database is larger and contains more sensitive data (thus making restores more difficult), there is a heavier reliance on shared code, and only the deltas of database code are currently under source control. It feels like I am constantly tracking down some ghost bug that is caused by my local database schema somehow being out of sync with the codebase.

Despite these misgivings, I’ve dutifully followed the traditional Microsoft recommended best practice of funneling all data access through sprocs up until recently because sprocs were supposedly faster, more secure, and provided a beneficial layer of abstraction.

Although I have heard several arguments against sprocs in the last several years, I recently embarked on a more thorough investigation while trying to convince my team to switch over from using an in-house code generation\sproc-based solution for data access to using NHibernate\ActiveRecord. Here are a few resources I found that present good counter arguments against the conventional sproc wisdom.

  1. Stored procedures are bad, m’kay?: This classic post that was written by Frans Bouma of LLBLGen fame in 2003 spawned quite the flame war on the topic. Most notably, Frans counters the “Sprocs are faster because they are compiled” argument by quoting passages from the SQL Server BOL documentation that clearly suggest otherwise. He also counters security arguments by pointing out that parameterized queries prevent SQL injection just as much as sprocs and that assigning permissions to views and roles provide just as much protection as assigning execute rights on sprocs.
  2. Who Needs Stored Procedures, Anyways?: This is also an older post from Jeff Atwood where he nicely summarizes the negative aspects of SQL when compared with a traditional coding languages and came up with the quotable phrase “Stored Procedures should be considered database assembly language: for use in only the most performance critical situations”. I definitely think that some developers are reluctant to embrace ORM’s for the same reasons that many old C++ programmers scoffed at the idea of letting the CLR garbage collector manage memory for them instead of manually doing it themselves with raw pointers.
  3. Why I do not use Stored Procedures: Jeremy Miller dismisses performance arguments by declaring them instances of premature optimization and elaborates on all the problems caused by sprocs when it comes to maintainability, testability, and architecture. He also points out that the touted benefit of allowing DBA’s to make changes is actually a dangerous practice since it represents a breaking API change from the application’s point of view and thus should go through thorough regression testing before any DBA should be allowed to make changes.
  4. Foundations of Programming - Part 6 - NHibernate: Besides offering a nice introduction to NHibernate, Karl Sequine provides a nice summary of the historical sproc debate, including a counter point for the increased network traffic argument, which he says is a moot point since most traffic occurs between a app and database servers sitting on the same internal GigE networks where bandwidth is fast, plentiful, and free.
  5. DotNetRocks ORM Smackdown: For a more balanced debate, listen to this podcast episode (or download the transcripts) where Oren Eini and Ted Neward face off over the value of ORM’s in the software industry. Be sure to check out the commentary on the episode in the comment section of this Ayende post as well as the rebuttal in this post by Ted Neward.

In my opinion, one of the strongest denunciations of traditional sproc dogma comes Redmond itself, which seems to be straying from its original sproc recommendations in favor of a more more dynamic SQL generation world-view with its recent release of LINQ to SQL and the Entity Framework.

If you follow the open source world or program in some language other than .NET, then you’re bound to feel a little smug right now because ORM’s have been around for a long time. In fact, I have a vivid memory from 6 years ago of a co-worker who was fresh from the Java world being dumb-founded that Microsoft didn’t have any ORM solution. He was used to using HIbernate and the thought of manually mapping database tables to domain objects was hard for him to grasp. Even in the .NET open source world, I’ve been reading blog posts that sing the praises of NHibernate and IBatis.NET, two popular .NET ORM ports, for several years.

On one hand, Microsoft’s entrance into the fray is good news for ORM enthusiasts since it means that a larger audience of developers will begin to see the technology as legitimate. On the other hand, Microsoft clearly has some catching up to do in this space, so you might want to think twice about starting off with Microsoft’s offering rather than one of the more proven open source or third party alternatives.

If you are a .NET developer and new to ORM’s, then I recommend starting out with Castle’s Active Record, which you can learn in less than an hour by reading this Getting Started with Active Record tutorial. My co-worker’s were reluctant to try NHibernate because of the perceived learning curve and the plethora of mapping files required, but they quickly agreed to use Active Record after only a short demo.

If you are a POCO purists, which means that you want to keep your domain objects free of any non-business related concerns (such as persistance), then you’ll want to follow the repository pattern using the ActiveRecordMediator class rather than inheriting from ActiveRecordBase like the tutorial shows. Some of the more experienced ORM users seem to see ActiveRecord as more of a gateway drug to NHibernate and ultimately prefer to forgo the conveniences offered by the ActiveRecord layer in favor of the increased flexibility and loosely coupled design offered by dealing directly with NHibernate instead.

Regardless of the approach taken, I definitely no longer believe that sprocs should play any significant role in any application. The current mandate in the software industry is to strive to lower costs by increasing developer productivity and ORM’s clearly help to do this by eliminating the need to write and maintain countless simple CRUD sprocs.

It’s definitely time for all of us .NET developers to abandon our convention sproc wisdom and start playing catch-up with the rest of the industry when it comes to using ORM’s.

10 Lessons in Scalability from MySpace

I just read an interesting article called Inside MySpace.com. With 40 billion page hits a month (which they reached just three short years after launching), MySpace is currently living on the scalability frontier and thus offers unique insights into what kind of architecture and hardware it takes to service such massive demands.

scalability graph

Here are ten tips that I gleaned from the article.

  1. Master/Slave architecture is Good (used when less than 1 million users) - When mySpace outgrew a single database server, it first moved to a master-slave configuration where the master server received all the updates and then propogated the data to the other read-only slave databases. Unlike many applications that rely heavily upon reporting or infrequently updated data, MySpace was not able to utilize any OLAP style optimizations because it has to deal with a high degree of updates.
  2. Vertical Partioning is Better (used between 1-3 million users) When the site grew to the point where disk IO became a bottleneck and it was taking too long to transfer data between servers, mySpace moved to vertical partitioning model and divided the workload by logically grouping data onto different databases and servers according to function. Thus every time a new feature was added, it got its own database and server.
  3. Horizontal Partitioning is Best (used currently [3-27 million users]) Vertical partitioning eventually ran into roadblocks when it came to shared data and particularly demanding features, so mySpace developers ultimately re-engineered the app so that data was logically grouped according to users. Thus a user logged on to a centralized server that stored only authentication information and the location where that users’s data is stored.
  4. Optimize for Acceptable Data Loss When Possible - By acknowledging that they were able to tolerate a certain amount of data loss, MySpace was able to increase scalability by extending the time between database checkpoints and thus saving on significant IO in exchange for what they consider to be an acceptable risk of losing between 2 minutes and 2 hours of data.
  5. Microsoft is a Viable Vendor Option - When I first started working in the industry during the dotcom bubble, scalability was a big issue because everyone expected to eventually have to deal with MySpace-like growth problems. I remember encountering several Java developers who dismissed Microsoft based solutions with the phrase “It will never scale”. Despite pushing the limits on the number of simultaneous connections supported by SQL Server, MySpace has definitely proven that Microsoft is a major player in this arena. In fact, when MySpace migrated all their apps from ColdFusion to ASP.NET a few years ago, they noted being able to handle the same load with 40% fewer servers.
  6. Virtualized Storage trumps a SAN - Before mySpace upgraded from a SAN to virtual storage, they required 2 full time people to manually redistribute data across the SAN on a continuous basis.
  7. DoS (Denial of Service) safeguards are NOT so helpful - After having several of their windows 2003 servers randomly shut down, the mySpace IT Pros realized that they had to disable the Denial of Service safeguards built into Windows 2003 because they were being inadvertently triggered by the immense amount of traffic.
  8. Consider a Separate Caching Tier - To save on database hits, mySpace added a layer of servers between the databases and web servers that are devoted exclusively to caching data objects.
  9. Who Needs Load Testing? Since it is impossible to do realistic load testing on this scale, mySpace has dramatically streamlined their deployment cycle so they can rapidly get feedback and make corrections on their live site.
  10. Design Matters - In a related article, there is an interesting quote from Jakob Nielsen about how focusing on the usability aspects of the site and redesigning the flow to reduce the number of clicks required would have an even more dramatic effect on scalability than most of the technological solutions.

NOTE: I just noticed that the original article on Baseline magazine seems to be inaccessible now. I spent 10 minutes fruitlessly searching on what is one of the most frustrating, advertising laden sites I have ever seen ,but all I found was the title for the article sans content buried in mounds of gaudy advertisements. I guess it’s a good thing that I took notes while I was reading it.

Excessive Abstraction can lead to the Bends

In his blog post on Old School Programming, Wesner Moise waxes nostalgic about his pre-high school experiences with writing his own disassembler and assembler for the Comodore 64. Apparently he used his homemade dissassembler to decode and rewrite the entire 8k BASIC ROM back to source and then used his assembler to add his own extensions to the BASIC language to support structured programming and better graphics. Aside from retyping a few bouncing ball programs from the owner’s manual,

I believe that the only thing I ever did with my Comodore 64 was play pirated video games with my friends. For some reason all I can remember is a break dancing video game where I felt proud of getting the little block character to spin on his head. As you can imagine, the post made me feel like I belonged somewhere between plankton and an anemic protozoa on the alpha geek food chain.

By contrast I spent most of my day today in high altitude architectural abstraction. The bank I work for has hired a consulting company to do a data strategy gap analysis and we spent a good part of the day talking about BPM, SOA, and Master Data Strategy. There were several points where I had to politely suggest that we move on to another topic because we had risen a little too high in the abstraction layer and I was sure that someone walking past might have had trouble discerning whether the topic of the conversation had to do with coding, politics, or pepperoni pizza. The next two days will be more of the same.

Does anyone know if you can get decompression sickness by ascending too quickly from the depths of creating custom interrupts on the Comodore 64 in order to implement a rasterizer that bypassed hardware limits for 8 sprites to the abstract heights of macro workflows and model driven architecture? If anyone happens to observe any of these symptoms tomorrow, please sit me down and give me a steady dose of concrete examples. Thanks in advance.

[Originally Posted Monday, June 18, 2007 11:31 PM]

The Architecture Blues

A few years ago I shifted roles from a developer team lead who coded almost full time to an Architect. Although I am naturally a bit of an abstract thinking and find architectural issues interesting, I have always been hesitant about this role because of the ubiquitous Architecture Astronauts that have given the discipline a bad name. These guys have a knack for intimidating even the smartest of developers by throwing around architectural jargon that they can describe in the abstract but are completely unable to translate these ideas into concrete code.

Last night as I was lying in bed, I realized that I hadn’t written a single line of code in the last few months because of my involvement in some high level strategy initiatives, my advisory role in a number of projects, and my supervisory responsibilities over a solutions architect and a data modeler. I realized with horror that much of my time these days was spent just reading about fuzzy concepts like SOA, SaaS, BPM, EII, EAI, MDM, Data Governance, and I was sure that I had suddenly become one of those useless pieces of corporate baggage that real developers make jokes about. That thought prompted me to spend my last waking moments plotting various routes to getting back to my coding roots.

This morning I relistened to an old Hanselminutes interview with Jeffrey Snover, the Powershell Architect, to help prepare for an upcoming .NET User group talk that I will be giving on Powershell. It was an excellent podcast, but what caught my attention the most was when Jeffrey mentioned that in the beginning of the project he had locked himself in a room for a month and pounded out a 15,000 lines of code as a proof of concept that he then used to convince people to get the project started.

THAT is the kind of architect that I want to be when I grow up! I need to figure out a way to get myself out of all these meetings about process improvement, data strategy, and regulatory compliance and start focusing on some nice juicy proof of concept work.

[Originally Posted Friday, June 15, 2007 12:10 PM]