How I unknowingly deviated from JPA
March 26, 2016 Leave a comment
In a post from January 2015 I wrote about possibility to use plain foreign key values instead of @ManyToOne and @OneToOne mappings in order to avoid eager fetch. It built on the JPA 2.1 as it needed ON clause not available before and on EclipseLink which is a reference implementation of the specification.
To be fair, there are ways how to make to-one lazy, sure, but they are not portable and JPA does not assure that. They rely on bytecode magic and properly configured ORM. Otherwise lazy to-one mapping wouldn’t have spawned so many questions around the internet. And that’s why we decided to try it without them.
We applied this style on our project and we liked it. We didn’t have to worry about random fetch cascades – in complex domain models often triggering many dozens of fetches. Sure it can be “fixed” with second-level cache, but that’s another thing – we could stop worrying about cache too. Now we could think about caching things we wanted, not caching everything possibly reachable even if we don’t need it. Second-level cache should not exist for the sole reason of making this flawed eager fetch bearable.
When we needed a Breed for a Dog we could simply do:
Breed breed = em.find(Breed.class, dog.getBreedId());
Yes, it is noisier than dog.getBreed() but explicit solutions come with a price. We can still implement the method on an entity, but it must somehow access entityManager – directly or indirectly – and that adds some infrastructure dependency and makes it more active-record-ish. We did it, no problem.
Now this can be done in JPA with any version and probably with any ORM. The trouble is with queries. They require explicit join condition and for that we need ON. For inner joins WHERE is sufficient, but any outer join obviously needs ON clause. We don’t have dog.breed path to join, we need to join breed ON dog.breedId = breed.id. But this is no problem really.
We really enjoyed this style while still benefiting from many perks of JPA like convenient and customizable type conversion, unit of work pattern, transaction support, etc.
I’ll write a book!
Having enough experiences and not knowing I’m already outside of JPA specification scope I decided to conjure a neat little book called Opinionated JPA. The name says it all, it should have been a book that adds a bit to the discussion about how to use and tweak JPA in case it really backfires at you with these eager fetches and you don’t mind to tune it down a bit. It should have been a book about fighting with JPA less.
Alas, it backfired on me in the most ironic way. I wrote a lot of material around it before I got to the core part. Sure, I felt I should not postpone it too long, but I wanted to build an argument, do the research and so on. What never occurred to me is I should have tested it with some other JPA too. And that’s what is so ironic.
In recent years I learned a lot about JPA, I have JPA specification open every other day to check something, I cross reference bugs in between EclipseLink and Hibernate – but trying to find a final argument in the specification – I really felt good at all this. But I never checked whether query with left join breed ON dog.breedId = breed.id works in anything else than EclipseLink (reference implementation, mind you!).
It does not. Today, I can even add “obviously”. JPA 2.1 specification defines Joins in section 4.4.5 as (selected important grammar rules):
join::= join_spec join_association_path_expression [AS] identification_variable [join_condition] join_association_path_expression ::= join_collection_valued_path_expression | join_single_valued_path_expression | TREAT(join_collection_valued_path_expression AS subtype) | TREAT(join_single_valued_path_expression AS subtype) join_spec::= [ LEFT [OUTER] | INNER ] JOIN join_condition ::= ON conditional_expression
The trouble here is that breed in left join breed does not conform to any alternative of the join_association_path_expression.
Of course my live goes on, I’ve got a family to feed, I’ll ask my colleagues for forgiveness and try to build up my professional credit again. I can even say: “I told myself so!” Because the theme that JPA can surprise again and again is kinda repeating in my story.
Opinionated JPA revisited
What does it mean for my opinionated approach? Well, it works with EclipseLink! I’ll just drop JPA from the equation. I tried to be pure JPA for many years but even during these I never ruled out proprietary ORM features as “evil”. I don’t believe in an easy JPA provider switch anyway. You can use the most basic JPA elements and be able to switch, but I’d rather utilize chosen library better.
If you switch from Hibernate, where to-one seems to work lazily when you ask for it, to EclipseLink, you will need some non-trivial tweaking to get there. If JPA spec mandated lazy support and not define it as mere hint I wouldn’t mess around this topic at all. But I understand that the topic is deeper as Java language features don’t allow it easily. With explicit proxy wrapping the relation it is possible but we’re spoiling the domain. Still, with bytecode manipulation being rather ubiquitous now, I think they could have done it and remove this vague point once for all.
Not to mention very primitive alternative – let the user explicitly choose he does not want to cascade eager fetches at the moment of usage. He’ll get a Breed object when he calls dog.getBreed(), but this object will not be managed and will contain only breed’s ID – exactly what user has asked for. There is no room for confusion here and at least gives us the option to break the deadly fetching cascade.
And the book?
Well the main argument is now limited to EclipseLink and not to JPA. Maybe I should rename it to Opinionated ORM with EclipseLink (and Querydsl). I wouldn’t like to leave it in a plane of essay about JPA and various “horror stories”, although even that may help people to decide for or against it. If you don’t need ORM after all, use something different – like Querydsl over SQL or alternatives like JOOQ.
I’ll probably still describe this strategy, but not as a main point anymore. Main point now is that JPA is very strict ORM and limited in options how to control its behavior when it comes to fetching. These options are delegated to JPA providers and this may lock you to them nearly as much as not being JPA compliant at all.
But even when I accept that I’m stuck to EclipseLink feature… is it a feature? Wouldn’t it be better if reference implementation strictly complained about invalid JPQL just like Hibernate does? Put aside the thought that Hibernate is perfect JPA 2.1 implementation, it does not implement other things and is not strict in different areas.
What if EclipseLink reconsiders and removes this extension? I doubt the next JPA will support this type of paths after JOINs although that would save my butt (which is not so important after all). I honestly believed I’m still on the standard motorway just a little bit on the shoulder perhaps. Now I know I’m away from any mainstream… and the only way back is to re-introduce all the to-one relations into our entities which first kills the performance, then we turn on the cache for all, which hopefully does not kill memory, but definitely does not help. Not to mention we actually need distributed cache across multiple applications over the same database.
In the most honest attempt to get out of the quagmire before I get stuck deep in it I inadvertently found myself neck-deep already. ORM indeed is The Vietnam of Computer Science.