EclipseLink joins without mapped attributes
January 30, 2015 2 Comments
EDIT April 2016: Originally I believed I described pure JPA 2.1 solution, much later I found it is merely an EclipseLink extension to the JPQL that is not supported in standard.
OK, whoever knows it is possible, just stop reading right away (before my JPA rant starts). Note, that this post builds on JPA 2.1 syntax for ON and EclipseLink extension that allows to join root entities (JPA specification allows only association paths).
JPA is hard
Java Persistence API is not easy – let’s face it. Please. Because it is easy only for two kinds of programmers – those that know everything (again, congratulation) and those who think they know everything. Or something similar. While I’m all for minimizing complexity to make things easy, it is known fact that making things simple (that is non-complex) is hard. Everytime I hear “no problem, that’s easy” I know what I’ll get. Complex ball of mud that does not correspond to “easy” at all.
I’ve read parts of Hibernate reference, I’ve read Pro JPA 2/2.1 – and I’d love to read it all finally, but there is always something else to read too. And so while I was aching for solution how to get rid of some my @Many/OneToOne mappings from entities (because these are not guaranteed to be lazy and can trigger too many selects), I thought I couldn’t because I’ll be unable to join in the direction of that association. I was wrong, but first…
Can’t you avoid triggered selects for *ToOne somehow?
Yes, you can. You can eagerly fetch that association with JOIN, some smart ORMs do that for you to some depth. Let’s talk about this example:
A Dog points to a Breed. How can I populate a table on the user interface with selected attributes from the dog and its breed as well?
- The worst case (and still traditionally used!) – select Dogs and don’t care. This will likely result in N+1 select (officially name does not reflect the cause and effect – as it is rather 1+N). This plain sucks. I really don’t know why some mainstream ORM solutions still execute this as N+1 even when it’s eager (*ToOne is by default), but that just underlines… well, the state of ORM, obviously.
- Fetch the relationship! Of course. It is bound to happen anyway, so let’s deal with it in one select. Does it mean we’re done in single select? If you have more *ToOne relations on your Dog – or Breed – then you’re not. Any fetch may trigger additional selects for further to-one relation.
- Fetch and select columns explicitly. This works, but you have to deal with Lists of arrays of Objects, or – in better case – Querydsl’s handy Tuple. Or you may use projection to DTO. When you need to do this with 60 columns you’ll know the pain.
But yes, it is possible. The question is if it’s worth to use JPA if 80% of time you go around these problems. Depending on relationship optionality you need to use LEFT JOIN of course. “Use LEFT JOIN, Luke!” is my answer when another developer asks me questions like “when I want to display fields from breed, some rows just disappear from my results!” or similar with a bit of fun in it “count select returns different count than is the size of final result!” – because you don’t need to join for count unless you perform WHERE part on joined columns.
Summed up, it’s hard. Gone is the premise that you’ll enjoy working with some mapped objects. No, no, deal with Lists of Object or DTOs! It’s hard to say what is better – DTO in theory, but if the list of columns is changing a lot then it’s just another place you’ll have to change. If you still use vanilla JPA, consider Querydsl, seriously. Tuple may be lightweight DTO for you and on the other side you can get the stuff out using expression paths that are compile time safe – without the need to mess with accessors. (Old enough to remember this article? :-))
Dropping @ManyToOne annotation
(The same applies for @OneToOne where relevant.)
To write left join from Dog to Breed you can find virtually exclusively examples like this (Querydsl, but the same goes for JPQL):
QBreed breed = new QBreed(“breed”); // alias List<Tuple> result = new JPAQuery().from(QDog.dog) .leftJoin(QDog.dog.breed, breed) .list(QDog.dog.name, breed.name);
That is, in left join you first state the path how to get to the breed from the dog and then you assign an alias to it (breed). Sometimes you can live without aliases, but in general you need them for any deeper joins. Notice also how the leftJoin implies ON clause. This is logical and expected, that’s why we have the mapping there.
But thing I never realized (and decided to try it the first time today) is that you can just leftJoin to alias and add your ON explicitly – just like you would in SQL! (This is not JPA standard, but EclipseLink QL extension. Querydsl does not have any problem to generate it for you.)
QBreed breed = new QBreed(“breed”); // alias List<Tuple> result = new JPAQuery().from(QDog.dog) .leftJoin(breed).on(breed.id.eq(QDog.dog.breedId)) .list(QDog.dog.name, breed.name);
Obviously you have to have breedId attribute mapped on Dog – but that is probably some plain type like Integer. This will not trigger any additional select. If you really want, you can have dual mapping for the same column like this:
@Column(name = "breed_id", insertable = false, updatable = false) private Integer breedId; @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "breed_id") private Breed breed;
Notice the insertable/updatable=false – this has to be present on one of those mappings. The question is on which one. If you save/update the entity a lot and breed’s ID is good enough then move it to @JoinColumn annotation. But if you don’t need Breed in most cases at all, remove it completely. Also notice how the code is lying to you about LAZY fetch. It is not. In theory it can be with some byte-code modified Breed (Hibernate does it out of the box), but that is not portable anyway. If you absolutely need that attribute in Dog class, you can still make it @Transient, fetch it and list it explicitly (here I show it for singleResult for brevity):
QBreed breed = new QBreed(“breed”); // alias Tuple result = new JPAQuery().from(QDog.dog) .leftJoin(breed).on(breed.id.eq(QDog.dog.breedId)) .singleResult(QDog.dog, breed); Dog dog = result.get(QDog.dog); dog.setBreed(result.get(breed)); // sets transient field
No problem at all. Full control. Couple of lines longer. Probably way more performant on scale.
What if JPA allowed us to ignore relations on demand?
If you use only id value that copies exactly FK in your entities then you’ll lose some of the ORM benefits – but as said you’ll also dump a lot of its burden that is in many cases unconsidered. Imagine you could specify mappings like this:
@Column(name = "breed_id") @ManyToOne(targetEntity = Breed.class) // exists, but invalid when types don’t match private Integer breedId;
This way you’d say what the target entity is, generated metamodels could offer you joins, etc. I’m not going to evolve this idea at all because there are some obvious disadvantages. Your JOIN path would be dog.breedId, not breed, so if you followed it to Breed’s attributes it would read confusingly (assuming that when used as a path it would act as Breed of course). Then maybe this:
@ManyToOne(fetch = FetchType.JOIN_ONLY) // does not exist @JoinColumn(name = "breed_id") private Breed breed;
This would mean that unless you require JOIN explicitly in your select this field will NOT be populated. Actually, this would be best to combine with dual mapping as shown sooner. ID is easy to get without JOIN – and referenced entity would be insertable/updatable false. Otherwise semantics of save would be questionable. Does null mean that entity is removed and FK should be nulled? Or does it mean to ignore it? Should JPA ignore it you need that explicit breedId field. If not ignored even trivial find/change single attribute/save (or let it happen automatically at the transaction commit) would delete your relation unexpectedly. You can add some dirty checking magic of course (“if the attribute changes then do that, otherwise don’t do anything”), but we’d just add to existing mess. So this definitely breaks “least surprise” principle. But with dual mapping that is already possible, I’d love to have this really lazy fetch type.
In case you wonder why LAZY is not guaranteed (without byte-code modification magic) – the answer is that breed attribute should be null only if NULL in database. So if the FK is there, you need to instantiate some Breed and if that supposes to be lazy, it has to be some kind of proxy to a real one that is lazily loaded when you actually touch (get something from) the breed object. This is very easy with collections (*ToMany) and ORM’s can do that (although Eclipselink’s Vector is spectacularly wrong way to do it) – that’s why I’m so possessed with *ToOne problem.
There is also another alternative solution – we would have Breed instantiated as a wrapper for the FK value. That is dog.breed.id would not be null, but other attributes of dog.breed would not be initialized. Programmer would have to know – after all he requested it this way. That would allow the mapping to be still object based, but we would have more low-level control over the fetches.
Is it still ORM?
With dual mapping or with FK wrapper it would still be ORM as we know it. You’d have bigger control, single em.find would not trigger 47 selects (real story, seriously, and all traversing *ToOne relationships) and there would be minimal implications to JPA (although there are more competent to say so or otherwise).
Currently I plan to drop quite a lot of *ToOne mappings because:
- There is no portable way to avoid the fetch when I know I don’t want it.
- When I want it, I can join it myself anyway.
- If I want to pick specific relation I can just find it by ID that is available on the entity (and then store it to @Transient field for example).
Am I still using it as ORM then? I’m dropping a lot of “mappings” of my “relations(hips)” from my “objects”. But I can still utilize a lot of JPA (and Querydsl). There still is persistence context, you can use L2 cache (if you want), and maybe you don’t have uber-cool mapping to your domain… but let’s face it. How often we really map it? How often we just mirror our databases?
Do you map @ManyToMany or do you map the association table with two IDs instead? With explicit mapping you can get list of IDs of related object by ID of owning object. In SQL it is a single table query. Eclipselink joins Master, MasterDetail and Detail to get information that is clearly available on MasterDetail table alone (not mapped explicitly). Is it so difficult to treat IDs properly? It probably is. How does ORM lower the complexity here?
JPA adds tons of accidental complexity every time. Not first, but later. And virtually all teams working with it have no real JPA expert. I study JPA more than most people I know and it still keeps surprising me. Rarely in a good way. Left join using unrelated alias (at least from mapping point of view) and explicit ON is the good one. I don’t even know why I never tried it when I officially longed for it! I ranted: “Why don’t we have free joins on any numbers (PK/FK) just like SQL can do?” And nobody ever contradicted me. Most examples always start with path going from FROM entity, using alias only as their second argument (Querydsl, but JPQL examples are the same). We are slaves of our *ToOne relations and can’t get rid of undesired selects – and there is a simple way how to do it all the time. Partial reason for this is that ON for JOIN requires JPA 2.1 and didn’t work before.
If you don’t care about *ToOne problem, no problem. To be fair, if you have lazy set up properly you may be happy with it. If you do want to get rid of to-one relations the treatment is:
- Replace your *ToOne mapping with simple ID as a value attribute.
- Change anyJoin(x.y, alias) to anyJoin(alias).on(x.yId.eq(alias.id)) – this is Querydsl example but it transforms to JPQL directly. List both entities in projection if you need it (maybe you just WHERE on it).
- If you really want +1 select, do it with em.find yourself. That may actually use L2 cache (if it’s not in the persistence context already).
With this tiny speck of knowledge I’m backing off from my plans to get rid of ORM in our project (you can imagine what pain that would be). Now it gives me so-so enough control to contain the select explosions to required levels.
EDIT 2016: Just to repeat it – this is based on EclipseLink support for JOIN for root entities. Maybe it will get into JPA eventually (I’d love it), but as of 2.1 it is not there. ON condition is there already, but without root entity JOIN it is not enough.