Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Calling the author naive is, I think, uncharitable. I've also written my own ORM layer, used for large-scale, high-performance enterprise applications, and I largely agree with the original author's post, so I certainly don't think it's naive. It sounds to me like the thoughts of someone who's ran into some real problems in real situations.

I don't disagree with the points you've made around caching, but I do think you're simplifying the problem a bit. Not all performance tuning in DB-intensive applications is around caching, and it often involves query tuning, indexing, and traditional DB-level stuff.

A large part of the abstraction leak around ORMs is around both the caching and that DB-level performance tuning. You have to understand what code is going to generate what queries so that, at the very least, you can tune them by adding in the appropriate indexes in the database. All of a sudden, you're living in SQL land, examining query plans, etc. But if you decide that the change you need to make is to the SQL itself, the ORM layer suddenly gets in your way: you either have to bypass the ORM layer to drop into raw SQL, which at worst is hard to do and at best tends to massively reduce the value proposition of the ORM framework, or you have to try to tweak your code to get it to generate the query that you want, which is often frustrating and far more difficult than just writing the SQL yourself. I don't think I'm that much in the minority of having an experience like, "Hmm, the query I really need to write needs to use an ORDER BY statement that includes a few case clauses . . . now how do I convince this query-generation framework to spit that out so that I don't have to pull back all the results and do the sorting in memory?" It's also worth mentioning that caching doesn't help tune writes, so if scaling your product requires scaling writes, you're probably going to be mucking around in SQL land.

There's a similar problem around query-generation layers that attempt to allow you to just write normal methods and have things executed on the database; because the code is so far removed from the SQL, it makes it really, really easy to write really terribly-performing queries or to write things that will do hugely unnecessary amounts of work.

On a more trivial point, the fetching all columns when you only need a subset of them problem is really an issue sometimes, especially if you A) have to join across a bunch of tables, B) the columns that you want could be retrieved from indexes, rather than requiring actual row reads, or C) the columns that you care about are strictly several removes away from the original search table, but the ORM layer loads everything in between. (For example, Foo->Bar->Baz, my WHERE clause is on Foo, but the only columns I care about are the id on Foo, which is in the index, and a few columns on Baz . . . how do I tell my ORM layer to load nothing from else from Foo and nothing at all from Bar? It's a different problem than pre-fetching, because I just don't want anything loaded.)

Now, that's not to say that ORM layers can't be made to perform; of course they can, pretty much all of them have the sorts of hooks you describe, and there's plenty of empirical evidence to that effect. But sometimes the way you make them perform is by just bypassing them.

There's another abstraction point, which is that supporting multiple databases often leads to a least-common-denominator functionality approach; for example, if you want to use a db-specific spatial data type, the ORM has to either provide db-specific functionality, or it might just not support handling that type of data well. The same often comes to things like db-specific functions or query hints; if the ORM layer doesn't handle those things for you, you have to bypass it and drop into raw SQL if you need them.

So really, the argument is not, "ORM's are not functional and no one should use them," it's related to the value proposition of an ORM layer. The value proposition is "This tool will make your life easier, will save you from having to write SQL, and will help you work across multiple databases." If the tool makes life harder than it otherwise would be, then it's not useful, even if it's still possible to do work in it.

So the question is largely around whether or not they make life easier or not. In the simple case, I think the answer is that yes, they do: they make it easier for beginners to get off the ground, they make it easy to do simple queries and writes, and the performance probably doesn't matter anyway.

When things get more complicated, though, the question becomes a lot less clear. Yes, the ORM layer makes it easier to have structured queries that can be cached . . . it also makes it harder to have one-off queries that can be tuned easily based on exactly what data is needed and tweaked to convince the database to generate the right query plan, and it makes it much harder to look at some DB stats, identify a poorly-performing query, and then map it back to the code that generated that query. I know of applications that have basically had to bypass ActiveRecord more and more as they scaled to just do raw SQL queries because making ActiveRecord perform was simply too hard or not possible.

So personally, I prefer an ORM approach that does minimum stuff to let me do the simple things simply (pull rows back and map them to an object, execute simple queries directly on that table), but that's designed from the ground up with the idea that dropping straight into SQL is a normal, accepted part of the workflow, rather than some one-off thing that you should rarely do. But it really depends on your project and your comfort level with SQL.



> Yes, the ORM layer makes it easier to have structured queries that can be cached . . . it also makes it harder to have one-off queries that can be tuned easily based on exactly what data is needed and tweaked to convince the database to generate the right query plan, and it makes it much harder to look at some DB stats, identify a poorly-performing query, and then map it back to the code that generated that query.

The problem I have with your post is that you are repeatedly mistaking the high level idea of an ORM with the (seemingly) Spartan implementations that you have used.

Here is an ORM that provides support for automatically profiling the performance of queries over a period of time:

http://squeryl.org/performance-profiling.html

Sample output:

http://squeryl.org/profileOfH2Tests.html

I cannot begin to tell you how much time this has saved me when optimising performance of my webapp.

Note that that particular ORM also has the advantage of having type safe queries - i.e. it can tell at compile time if there's a syntax error in your query (subject to bugs in the ORM :)) - even in dynamically generated queries. In practice this is a fantastic feature as it is so much safer than building up SQL queries with string manipulation and dealing with multiple code paths that depend on user input. The test paths alone in such code (even if you have a "query builder" layer) are the stuff of nightmares.

There are many features missing from Squeryl though that I've had in other ORMs because it makes different trade-offs. But this is what you do when you choose a library, and it's important to understand what trade-offs you're making upfront... otherwise you might find yourself writing off an entire approach to software development as an anti-pattern because you picked the wrong library.


I think you're missing my point on the performance side; yes an ORM layer can help you identify slow queries, but it's pretty much the database query plan that will tell you why it's slow. Is it doing a full table scan instead of using an index? Is it applying joins in a sub-optimal order? Are the statistics just off, which causes it to use a bad query plan? At that point you're already in SQL/DBA land, but now you have to map that knowledge back to the ORM layer to fix things.

My experience with type-safe query layers is that they tend to be incomplete; they simply don't let you generate the full range of SQL queries because you're restricted by the language's type system. That said, I'm not particularly familiar with squeryl (and Scala's type system is certainly more expressive than most statically-typed languages), so I can't say what it's limitations are, I can only make general statements.

Anyway, I think it's fair to say it's difficult to talk about ORM generally due to the differences between frameworks and approaches. So I'll try to phrase things more clearly, and say that I think the author's original intent, and the part I agree with, is the fundamental premise that ORM abstractions are inherently leaky and that performance needs often result in a desire to go around the ORM framework to handle something more natively in SQL. Some ORM frameworks embrace those limitations, and allow you to use them when you want to and to work around them when you don't; other frameworks fight that limitation and attempt to swallow the world such that you never have to leave the ORM framework, and those frameworks tend to be the ones that become frustrating to work with.

So if I were to attempt to charitably read the original post, I'd say that perhaps saying it's an "antipattern" is taking it too far, but saying that it's a fundamentally flawed, leaky abstraction is totally accurate, and that recognizing that it's fundamentally leaky means that you, as a developer, should probably take that into account in your application design and your library selection, and that there are some techniques that might help you to do that.


In this discussion, everyone seem to broadly assume that it's an all-or-nothing affair.

More precisely, it's a cost-benefit decision. If most queries you will make are hampered by the ORM then by all means, don't use one. But if like in many (most?) situations, an ORM greatly abstracts and eases design and development for 90%+ of your operations and you have like 10% queries to be either tuned or handwritten (even if it has to be handwritten against multiple database types to preserve portability), then an ORM is a net benefit.

Compare it to inline asm in C, or C modules in Python: the higher level stuff makes it efficient to work with top level concepts 90% of the time, but sometimes you have to go down to be actually efficient, or even simply be able to do something, even if that means losing some form of independence (which would then mandate writing the same function for a different platform if you want to preserve portability). Not only it is not an anti-pattern, by no means does it mean either that the abstraction is fundamentally flawed.

The very idea that "going around" an ORM is somehow proving that ORMs are flawed is simply wrong. There are problems that ORMs are built to solve, and there are problems they can't ever solve. "Going around" is part of the deal because it's not a "work around", it's a "work together".

This is very visible in the article, especially the moment the author states that "I claim that the abstraction of ORM breaks down not for 20% of projects, but close to 100% of them". Indeed this is the case, but for close to 100% of those close-to-100%-projects where it "fails", the ORM is helpful for managing 90% of data access implementation. The 10% remainder may need to be implemented at a lower level, but wouldn't it be silly to spend a lot of time on those 90% of code that would get used 10% of the time? This is what ORMs are about, and saves a lot of time to develop the 10% of code that is critical both in usage volume and in performance. Of course such ratios are highly project dependent, and this is what warrants a thoughtful analysis to select the right tool for each task, of which there can be multiple in a single project, or even _object_. ORM, just as NoSQL, is simply not the end-all be-all solution, yet that does not make it a very valid pattern any less.

(edit: cosmetic/typo)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: