> [!note] Thoughts on the ORM (Object Relational Mapping). # On the term ORM Even the term ORM - *Object Relational Mapping* - shows its history. ORM was first created during the peak [[OOP]]-era, where OOP is the most hot paradigm. We're past the OOP craze, but ORMs are still relevant even when the underlying language is not object-oriented. For example, the [Go Programming Language Spec](https://go.dev/ref/spec) purposefully avoids the word *object*, yet we have no difficulty talking about a *Go ORM framework*. The same is true from the database's perspective. R is for *Relational* yet a *[[MongoDB]] ORM* is a perfectly sound idea. The timeless aspect is the *Mapping*. Mapping programming language types ("objects", but they can be structs, records, or anything) to a *database-aware* form is the true value of an ORM, and they strive to this date. # Scope ORMs are widespread yet people have love-hate relationship with ORMs. Some can't live without them, whereas some think it's an overbloated leaky abstraction. Unfortunately, ORMs span a large surface area, it can be quite small or encompassing. We rarely agree what an ORM is. At core, there are three different types of ORMs: - [[#Object hydration]] - only perform the *mapping* between the language and the database. Programmer is still writing SQL. In this case, an ORM is not a framework but a helper library. - [[#Query Builder]] - allows queries (either DQL or DMLs) to be programmatically described. Programmer is still primarily interacting with database-oriented entities (tables, queries) - [[#Advanced ORM]] - the ORM is in "full control". One can interact with the system without resorting to SQL (or SQL-first concepts). ## Object hydration Example: Go's `sqlx`, Rust's [sqlx](https://github.com/launchbadge/sqlx) (which is a different library) [Snippet from jmoiron/sqlx](https://github.com/jmoiron/sqlx) ```go type Person struct { FirstName string `db:"first_name"` LastName string `db:"last_name"` Email string } ... people := []Person{} db.Select(&people, "SELECT * FROM person ORDER BY first_name ASC") ``` In the above example, one is still primarily writing SQL queries. Examples: - sqlx (go) - sqlx (rust) ## Query Builder Primary job is to build typesafe queries. A lot of times, this is primarily SQL driven. Examples: * JooQ * [[kysely]] * ## Data Mapper See [Data Mapper Pattern](https://en.wikipedia.org/wiki/Data_mapper_pattern). - [[Prisma]] - SQLAlchemy - Hibernate ## Active Record Magical proxy objects represent database rows. Automatic reference traversal. `obj.ref` may trigger sql query. This is where the framework becomes complex. Framework may have concept of session and attached/detached object. Examples: - Django - [[Active Record]] ## ORMs for the [[Data Warehouse|OLAP]] use case https://docs.fiveonefour.com/moose, https://clickhouse.com/blog/moosestack-does-olap-need-an-orm # Different ORM libraries ![[ORMs.base]] | Name | Information | | --------------------------------------------- | --------------------------- | | [[Active Record]] (Ruby) | | | [[SQLAlchemy]] | Most popular [[Python]] ORM | | JooQ | Static SQL Generation | | [TypeORM](https://github.com/typeorm/typeorm) | [[TypeScript]] | | sqlx (go) | Go | # Why and why not ## Why SQL is not a convenient language for *programmatic access*. See [[SQL#What is wrong with SQL?]]. * Language is full of edge cases. * One needs to write *parameterized SQL* to avoid SQL injection attacks. However, Certain things cannot be parameterized. * For these reasons alone, almost every nontrivial application end up having an ad-hoc ORM (at least object hydration / query builder) in an application. An instance of [[Greenspun's Tenth Rule]]. ## Why Not - However, certain SQL features - aggregation, windowing function - are powerful yet quite intricate to represent in SQL. - ORMs end up adding [[Complexity of the Modern Software#Layers|Layers]]. - Specifically, there are many layers of the same data type between systems. (SQL layer, ORM layer). # Strawmans ## "no need to write SQL" In the battle around ORMs, "no need to write SQL" had been the biggest strawman on both sides of the battle. OOP is seen as "strictly better" model than the relational model. (hard to emphasize how much OOP craze we had back then) thus, abstracting out the archaic relational model to provide pristine OOP layer was seen as a "good thing". However, this is fundamentally a leaky abstraction (see "advanced SQL"). Moreover, OOP prefers certain type of design, and ORMs force you to have more verbose and inefficient design. In reality, schema design should be database / model driven (instead of OOP design driven) as data models are much harder to evolve over time. Given that even "canonical RDBMS best practices" like normalization is being criticized for creating inefficiencies, It is impossible to use ORM effectively without the good understanding of the underlying database and technology. Nowadays, No ORM claims that: * SQL is bad * one doesn't need to learn SQL and: * takes more "utilitarian" approach; this reduces boilerplate. * allows an escape hatch to SQL * less concerned with "OOP design". A lot of [[#Object mapping]] questions are not tackled. ## Abstracting over database implementations Many ORMs (Hibernate as an example) attempts to create abstraction over databases. There are two goals; * allow the library to be used against different DBs * allow writing an application that can run on top of different DBs. The second is insanely difficult: > [!quoute] https://forum.sentry.io/t/still-does-not-support-mysql/1483/4 > > Over and over again, large projects that started on a database-agnostic framework realize that they can’t really support multiple database backends. Sentry is just one example of this. GitLab is probably very close to dropping their support for MySQL too. Other projects have gone the other way and chosen to support only MySQL. The common denominator is that once you get to the point where you need to be doing special stuff that a generic query builder can’t handle, you either need to choose a single database engine or you need to put far more effort into maintaining compatibility than it is worth. # Features and Concepts ## Object mapping **Object mapping has fallen out of flavor** - Originally, people were really concerned with accurately mapping object polymorphism into the relational model. [[Leaky Abstraction]]. This may be due to the fact that [[OOP#Classical OOP]] has ran out of flavor. Polymorphism, multiple inheritance, etc... they are all complex concepts with no *obvious* ways to reason what happens when these concepts intersect. A lot of complexity with Hibernate comes from the [inheritance mapping](https://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch10.html). ### Types of Object Mappings * table per class hierarchy - table per subclass - table per concrete class However, now nobody wants to just "take an existing class and persist them"; everyone recognizes that this is a pipe dream (and even if it's possible, not very viable long term). This is a similar problem that [[Serialization]] has. ORM and inheritance polymorphism feel really wonky. One needs to be aware of all polymorphic subclasses to correctly model it (in all three of the above models). However, in OOP style of programming, one can define subclasses anywhere. (Ultimately the runtime will be aware of them, so there will be some form of "class registration" even if it's just updating the vtables, but this). If I take a [[Materialism]] stance; then I can argue that polymorphism is "just vtable lookups"; however, in the OOP-centric world, we've taken an approach to interpret and reshape the entire world as the inviolatable principle. However, the reality is that serialized bytes on disk / databases are more "foundational" concepts then mere dynamic dispatch that are living inside a running program. Thus it's better to yield to the data in the database (and make the application more malleable for change) then the vice versa. ## Model first vs SQL first > [!note] > [[drizzle]] calls this **Database-first** versus **Codebase-first** [link](https://orm.drizzle.team/docs/migrations) This is one major defining characterization of many ORMs; The question of, "where does the data modeling happen *first*?" - Code via a special DSL. This is the approach that [[Active Record]] takes. They tend to have many *convention over configuration* practices; such as auto-incrementing primary keys being chosen by default. This is convenient for rapidly creating a new app, but it is not the best choice for adopting the ORM in an existing application. Hibernate extensively uses Java annotations (Hibernate-specific and JPA) to define the data models. - SQL first. This is the approach taken by jOOQ and other libraries; the framework introspects an existing database schema and creates data models out of them. These frameworks tend to be more *enterprise* as they are meant to be integrated into an existing application. - Code via a special language. [[Prisma]] is a good example of this, where its modeling language is not tied to any language. This offers the most flexibility (as the modeling language can stay declative), but it also requires the most implementation to get started. - Bidirectional syncing. Prisma allows you to create Prisma schema files from an existing database. It's not foolproof but this is a good way to get a project started. ## Schema Migrations ## Exposing RDBMS semantics * Transactions ## Querying This is the area where the database system (relational or otherwise) are very rich and specific on their semantics and capabilities. Queries, indices, key-value, etc... # Events and application level triggers For a heavyweight ORMs ([[Django]], [[SQLAlchemy]], Hibernate...) as the session sees all objects coming through the system, *application-level triggers* can be implemented. Example: - permissions - triggering actions on save / deletes. # Literature * *[ORM (wikipedia)](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping) * [Data Access Objects](https://en.wikipedia.org/wiki/Data_access_object) - https://github.com/getlago/lago/wiki/Is-ORM-still-an-%27anti-pattern%27%3F - [Is Prisma ORM an ORM?](https://www.prisma.io/docs/orm/overview/prisma-in-your-stack/is-prisma-an-orm) - What is wrong with [[SQL]] as a language?