JPA Pagination of entities with @OneToMany/@ManyToMany

Displaying paginated results of entities with their *-to-many sub-entities is a tricky thing and many people actually don’t even know about the problems. Let’s check what JPA does and what it does not for you. Besides standard JPA I’ll use Querydsl for queries – this effectively results in JPQL. Querydsl is type-safe and compared to Criteria API much more readable.

Source of N+1 select problem

Imagine having an entity called User that can have many Roles. Mapping looks like this:

@JoinTable(name = "user_roles",
  joinColumns = @JoinColumn(name = "user_id", referencedColumnName = "id"),
  inverseJoinColumns = @JoinColumn(name = "role_id", referencedColumnName = "id"))
@ManyToMany(fetch = FetchType.EAGER)
private Set<Role> roles;

You want to display a paginated list of users that contains list of their roles – for instance as tiny icons with tooltips containing role names.

Naive approach is to query the Users and then wait for whatever may come:

QUser u = QUser.user; // we will use this alias further
List<User> result = new JPAQuery(em).from(u)
    .where(...)
    .offset((page - 1) * pageSize)
    .limit(pageSize)
    .list(u);

Now something may or may not happen. Simple select from USER is definitely executed and if you don’t touch user.role that will be all. But you want those roles and you want to display them – for instance when you iterate the results. Each such access executes another select from ROLE where user_id = iterated user’s id. This is the infamous N+1 select problem (although the order is rather 1+N :-)). It is extremely bad for selects with large result sets, while for pages N has reasonable limits. But still – no reason to do additional 25 selects instead of… how many actually?

Left join for the help?

Were the mapping @…ToOne, then you could just left join this entity (e.g. user’s Address assuming there is a single address needed for a user). Left join in this case does not change the number of rows in the result – even for optional addresses, whereas inner join would omit users without addresses (also surprisingly common error for programmers who try to avoid SQL for no good reasons).

Left join for *ToOne relations is good solution for both paginated and non-paginated results.

But for *ToMany paginating and left join don’t go well together. The problem is that SQL can only paginate rows of the resulting join, it cannot know the proper offset and limit to select for instance second user only. Here we want page 2 with page size 2:

List<User> result = new JPAQuery(em).from(u)
    .where(...)
    .leftJoin(u.roles).fetch()
    .offset(2).limit(2)
    .list(u);

The result is delivered in a single select – but what a result…

We definitely don’t want User1 with a single RoleC and User2 with RoleA. That is not only not the page we want (we should get User3 on the final page 2), but it’s just plain disinformation (aka lie).

(Don’t) Use Eager!

For most cases EAGER mapping for *ToMany is not a good default. But in case you’re asking what will happen… JPA provider at least knows at the time of select from USER that we want the ROLEs too. EclipseLink performs N additional selects to get users’ roles although it utilizes any cached data possible. Actually I tested it with cache turned off with these lines in persistence.xml:

<property name="eclipselink.query-results-cache" value="false"/>
<property name="eclipselink.cache.shared.default" value="false"/>
<property name="eclipselink.cache.size.default" value="0"/>
<property name="eclipselink.cache.type.default" value="None"/>

Cache can bring some relief, but it actually just obscures the problem and may mislead you that the problem is not there. Especially if you create the data using JPA just before the test. 🙂

In case of LAZY JPA provider has to wait when we touch the role list – and it always loads only the one that is needed. But for our page we need to check all of them, so we will end up with N selects again. Provider has no way to optimize for this, not to mention that you may end up with LazyInitializationException (your fault) unless you use OSIV (open session in view), which I consider bad design after my previous experiences with it.

If provider knows that you want the roles upfront (EAGER) it may optimize, but I would not count on that. EclipseLink does not, and you simply cannot know. So let’s just do it ourselves, shall we? And switch that EAGER back to LAZY, which is default, so you can just as well remove the fetch parameter altogether.

1+1 select, no problem

We need to get the proper page of users and then load their roles. There are two ways how to do this:

  1. if you have reverse mappings (role.users) then you can get List<User> and then select roles containing these roles;
  2. if there is no reverse mappings, you list users first and then again with left join to roles – again only where user is in the first list.

In both cases list of user IDs may be enough. Because our role does not have reverse mapping (I don’t like it that much for @ManyToMany relations) we end up with code like this:

List<Integer> uids = find.queryFrom(u)
    .where(...)
    .orderBy(u.login.asc())
    .offset(2).limit(2)
    .list(u.id);
List<GuiUser> users = find.queryFrom(u)
    .where(u.id.in(uids))
    .leftJoin(u.guiRoles).fetch()
    .orderBy(u.login.asc())
    .distinct()
    .list(u);

What you display is the result of the second query, obviously. There are two important points here though:

  • You have to repeat the order. While in the first select it is essential for paging (paging without order is meaningless, right?) in the second case you could do it in memory, but there is no reason why, as you let DB to order just a single page and you save Java lines of code. Not to mention Java possibly (and quite probably!) using different collator… this really is not worth it.
  • You have to use distinct() before the list() of the second query. This not only adds distinct to the select, but also instructs Querydsl or JPA (really don’t know who does the job, sorry :-)) to give you the list of distinct users, not list of size of that joined select. Such distinct is required in most leftJoins on *ToMany entities (unless you create projection results for them).

1+1 for OneToMany

Now let’s pretend we have different situation – some @OneToMany case, like an Invoice and its Items. Item will have invoice attribute and Invoice will have @OneToMany items mapped by Item.invoice – like this:

@OneToMany(<b>mappedBy = "invoice"</b>)
private Set<Item> items;

And now the select part:

QInvoice i = QInvoice.invoice;
Map<Integer, Invoice> invoices = find.queryFrom(i)
    .where(...)
    .orderBy(i.id.asc())
    .offset(2).limit(2)
    .map(i.id, i);
// this performs items = new HashSet<>() for each invoice
invoices.values().forEach(Invoice::initItems);

QItem ii = QItem.item;
List<Item> items = find.queryFrom(ii)
    .where(ii.invoice.in(invoices.values()))
    .list(ii);
for (Item item : items) {
    // addItem is always nicer than getItems().add(item)
    invoices.get(item.getInvoice().getId()).addItem(item);
}

Always check what selects are generated in the end. In my case ii.invoice.in performs no additional join and puts ids into the IN list – and that’s how it should be.

Also note the usage of map(i.id, i) – this returns LinkedHashMap, so the order is preserved (cool!) and makes getting the invoice for item much easier. Difference in select is just one redundant id which is no problem at all.

Conclusion

We demonstrated the essential problem with pagination of one/many-to-many joins and checked two solutions that are both based on one additional select. These solutions are not necessary if pagination is not required, obviously. There may be other ways to do it, there are even some extensions for specific JPA providers (like @BatchFetch for EclipseLink), but I’d prefer this for couple of reasons:

  • Full control over selects – well, at least as full as you can get with Querydsl over JPQL. You can select into DTOs, you can enumerate required columns – there is much more you can do around the basic idea. (Funny. Just minutes after writing this sentence I travelled home from work, reading neat little book SQL Performance Explained – and coincidentally I was going through the part about Joins and Nested Loops. Markus Winard provided also short analysis of ORMs in various languages, how they perform joins and what you can do about it. The final tip was pretty clear about it: Get to know your ORM and take control of joins.)
  • It scales! There is no N. As long as your pagination is reasonably fast (but this is separate problem and you have to do what you have to do), second select goes after IDs, which should be pretty fast anyway.
  • It is unrelated to any specific features of any specific JPA provider.
  • And after all – you have full control over your contract. Don’t leave presentation layer trigger your selects. Even better – close that persistence context after leaving the service layer.

The only problem you should be aware of is limitation about amount of expressions in the IN clause. Check it out, read documentation (both for your JPA provider and RDBMS), experiment, find out what the maximum page size will be – and (obviously) cover it in your tests (at least when it crashes the first time ;-)).

The most important takeaway from this post should be: 1) EAGER is not a solution, 2) result of the left join with “to-many” cardinality cannot be reasonably paginated in SQL. We’re not talking about specific solutions here – hell, clever DB guys can even provide you procedures delivering you the right page plus total count of all records – all in a single call to DB. But that’s beyond basic SQL/JPA. 🙂

One thought on “JPA Pagination of entities with @OneToMany/@ManyToMany

Leave a comment