How I unknowingly deviated from JPA

In a post from January 2015 I wrote about possibility to use plain foreign key values instead of @ManyToOne and @OneToOne mappings in order to avoid eager fetch. It built on the JPA 2.1 as it needed ON clause not available before and on EclipseLink which is a reference implementation of the specification.

To be fair, there are ways how to make to-one lazy, sure, but they are not portable and JPA does not assure that. They rely on bytecode magic and properly configured ORM. Otherwise lazy to-one mapping wouldn’t have spawned so many questions around the internet. And that’s why we decided to try it without them.

Total success

We applied this style on our project and we liked it. We didn’t have to worry about random fetch cascades – in complex domain models often triggering many dozens of fetches. Sure it can be “fixed” with second-level cache, but that’s another thing – we could stop worrying about cache too. Now we could think about caching things we wanted, not caching everything possibly reachable even if we don’t need it. Second-level cache should not exist for the sole reason of making this flawed eager fetch bearable.

When we needed a Breed for a Dog we could simply do:

Breed breed = em.find(Breed.class, dog.getBreedId());

Yes, it is noisier than dog.getBreed() but explicit solutions come with a price. We can still implement the method on an entity, but it must somehow access entityManager – directly or indirectly – and that adds some infrastructure dependency and makes it more active-record-ish. We did it, no problem.

Now this can be done in JPA with any version and probably with any ORM. The trouble is with queries. They require explicit join condition and for that we need ON. For inner joins WHERE is sufficient, but any outer join obviously needs ON clause. We don’t have dog.breed path to join, we need to join breed ON dog.breedId = breed.id. But this is no problem really.

We really enjoyed this style while still benefiting from many perks of JPA like convenient and customizable type conversion, unit of work pattern, transaction support, etc.

I’ll write a book!

Having enough experiences and not knowing I’m already outside of JPA specification scope I decided to conjure a neat little book called Opinionated JPA. The name says it all, it should have been a book that adds a bit to the discussion about how to use and tweak JPA in case it really backfires at you with these eager fetches and you don’t mind to tune it down a bit. It should have been a book about fighting with JPA less.

Alas, it backfired on me in the most ironic way. I wrote a lot of material around it before I got to the core part. Sure, I felt I should not postpone it too long, but I wanted to build an argument, do the research and so on. What never occurred to me is I should have tested it with some other JPA too. And that’s what is so ironic.

In recent years I learned a lot about JPA, I have JPA specification open every other day to check something, I cross reference bugs in between EclipseLink and Hibernate – but trying to find a final argument in the specification – I really felt good at all this. But I never checked whether query with left join breed ON dog.breedId = breed.id works in anything else than EclipseLink (reference implementation, mind you!).

Shattered dreams

It does not. Today, I can even add “obviously”. JPA 2.1 specification defines Joins in section 4.4.5 as (selected important grammar rules):

join::= join_spec join_association_path_expression [AS] identification_variable [join_condition]
join_association_path_expression ::=
  join_collection_valued_path_expression |
  join_single_valued_path_expression |
  TREAT(join_collection_valued_path_expression AS subtype) |
  TREAT(join_single_valued_path_expression AS subtype)
join_spec::= [ LEFT [OUTER] | INNER ] JOIN
join_condition ::= ON conditional_expression

The trouble here is that breed in left join breed does not conform to any alternative of the join_association_path_expression.

Of course my live goes on, I’ve got a family to feed, I’ll ask my colleagues for forgiveness and try to build up my professional credit again. I can even say: “I told myself so!” Because the theme that JPA can surprise again and again is kinda repeating in my story.

Opinionated JPA revisited

What does it mean for my opinionated approach? Well, it works with EclipseLink! I’ll just drop JPA from the equation. I tried to be pure JPA for many years but even during these I never ruled out proprietary ORM features as “evil”. I don’t believe in an easy JPA provider switch anyway. You can use the most basic JPA elements and be able to switch, but I’d rather utilize chosen library better.

If you switch from Hibernate, where to-one seems to work lazily when you ask for it, to EclipseLink, you will need some non-trivial tweaking to get there. If JPA spec mandated lazy support and not define it as mere hint I wouldn’t mess around this topic at all. But I understand that the topic is deeper as Java language features don’t allow it easily. With explicit proxy wrapping the relation it is possible but we’re spoiling the domain. Still, with bytecode manipulation being rather ubiquitous now, I think they could have done it and remove this vague point once for all.

Not to mention very primitive alternative – let the user explicitly choose he does not want to cascade eager fetches at the moment of usage. He’ll get a Breed object when he calls dog.getBreed(), but this object will not be managed and will contain only breed’s ID – exactly what user has asked for. There is no room for confusion here and at least gives us the option to break the deadly fetching cascade.

And the book?

Well the main argument is now limited to EclipseLink and not to JPA. Maybe I should rename it to Opinionated ORM with EclipseLink (and Querydsl). I wouldn’t like to leave it in a plane of essay about JPA and various “horror stories”, although even that may help people to decide for or against it. If you don’t need ORM after all, use something different – like Querydsl over SQL or alternatives like JOOQ.

I’ll probably still describe this strategy, but not as a main point anymore. Main point now is that JPA is very strict ORM and limited in options how to control its behavior when it comes to fetching. These options are delegated to JPA providers and this may lock you to them nearly as much as not being JPA compliant at all.

Final concerns

But even when I accept that I’m stuck to EclipseLink feature… is it a feature? Wouldn’t it be better if reference implementation strictly complained about invalid JPQL just like Hibernate does? Put aside the thought that Hibernate is perfect JPA 2.1 implementation, it does not implement other things and is not strict in different areas.

What if EclipseLink reconsiders and removes this extension? I doubt the next JPA will support this type of paths after JOINs although that would save my butt (which is not so important after all). I honestly believed I’m still on the standard motorway just a little bit on the shoulder perhaps. Now I know I’m away from any mainstream… and the only way back is to re-introduce all the to-one relations into our entities which first kills the performance, then we turn on the cache for all, which hopefully does not kill memory, but definitely does not help. Not to mention we actually need distributed cache across multiple applications over the same database.

In the most honest attempt to get out of the quagmire before I get stuck deep in it I inadvertently found myself neck-deep already. ORM indeed is The Vietnam of Computer Science.

Advertisements

Last three years with software

Long time ago I decided to blog about my technology struggles – mostly with software but also with consumer devices. Don’t know why it happened on Christmas Eve though. Two years later I repeated the format. And here we are three years after that. So the next post can be expected in four years, I guess. Actually, I split this into two – one for software, mostly based on professional experience, and the other one for consumer technology.

Without further ado, let’s dive into this… well… dive, it will be obviously pretty shallow. Let’s skim the stuff I worked with, stuff I like and some I don’t.

Java case – Java 8 (verdict: 5/5)

This time I’m adding my personal rating right into the header – little change from previous post where it was at the end.

I love Java 8. Sure, it’s not Scala or anything even more progressive, but in context of Java philosophy it was a huge leap and especially lambda really changed my life. BTW: Check this interesting Erik Meijer’s talk about category theory and (among other things) how it relates to Java 8 and its method references. Quite fun.

Working with Java 8 for 17 months now, I can’t imagine going back. Not only because of lambda and streams and related details like Map.computeIfAbsent, but also because date and time API, default methods on interfaces and the list could probably go on.

JPA 2.1 (no verdict)

ORM is interesting idea and I can claim around 10 years of experience with it, although the term itself is not always important. But I read books it in my quest to understand it (many programmers don’t bother). The idea is kinda simple, but it has many tweaks – mainly when it comes to relationships. JPA 2.1 as an upgrade is good, I like where things are going, but I like the concept less and less over time.

My biggest gripes are little control over “to-one” loading, which is difficult to make lazy (more like impossible without some nasty tricks) and can result in chain loading even if you are not interested in the related entity at all. I think there is reason why things like JOOQ cropped up (although I personally don’t use it). There are some tricks how to get rid of these problems, but they come at cost. Typically – don’t map these to-one relationships, keep them as foreign key values. You can always fetch the stuff with query.

That leads to the bottom line – be explicit, it pays off. Sure, it doesn’t work universally, but anytime I leaned to the explicit solutions I felt a lot of relief from struggles I went through before.

I don’t rank JPA, because I try to rely on less and less ORM features. JPA is not a bad effort, but it is so Java EE-ish, it does not support modularity and the providers are not easy to change anyway.

Querydsl (5/5)

And when you work with JPA queries a lot, get some help – I can only recommend Querydsl. I’ve been recommending this library for three years now – it never failed me, it never let me down and often it amazed me. This is how criteria API should have looked like.

It has strong metamodel allowing to do crazy things with it. We based kinda universal filtering layer on it, whatever the query is. We even filter queries with joins, even on joined fields. But again – we can do that, because our queries and their joins are not ad-hoc, they are explicit. 🙂 Because you should know your queries, right?

Sure, Querydsl is not perfect, but it is as powerful as JPQL (or limited for that matter) and more expressive than JPA criteria API. Bugs are fixed quickly (personal experience), developers care… what more to ask?

Docker (5/5)

Docker stormed into our lives, for some practically for others at least through the media. We don’t use it that much, because lately I’m bound to Microsoft Windows and SQL Server. But I experimented with it couple of times for development support – we ran Jenkins in the container for instance. And I’m watching it closely because it rocks and will rock. Not sure what I’m talking about? Just watch DockerCon 2015 keynote by Solomon Hykes and friends!

Sure – their new Docker Toolbox accidentally screwed my Git installation, so I’ll rather install Linux on VirtualBox and test docker inside it without polluting my Windows even further. But these are just minor problems in this (r)evolutionary tidal wave. And one just must love the idea of immutable infrastructure – especially when demonstrated by someone like Jérôme Petazzoni (for the merit itself, not that he’s my idol beyond professional scope :-)).

Spring 4 and on (4/5)

I have been aware of the Spring since the dawn of microcontainers – and Spring emerged victorious (sort of). A friend of mine once mentioned how much he was impressed by Rod Johnson’s presentation about Spring many years ago. How structured his talk and speech was – the story about how he disliked all those logs pouring out of your EE application server… and that’s how Spring was born (sort of).

However, my real exposure to Spring started in 2011 – but it was very intense. And again, I read more about it than most of my colleagues. And just like with JPA – the more I read, the less I know, so it seems. Spring is big. And start some typical application and read those logs – and you can see EE of 2010’s (sort of).

That is not that I don’t like Spring, but I guess its authors (and how many they are now) simply can’t see anymore what beast they created over the years. Sure, there is Spring Boot which reflects all the trends now – like don’t deploy into container, but start the container from within, or all of its automagic features, monitoring, clever defaults and so on. But that’s it. More you don’t do, but you better know about it. Or not? Recently I got to one of the newer Uncle Bob’s articles – called Make the Magic go away. And there is undeniably much to it.

Spring developers do their best, but the truth is that many developers just adopt Spring because “it just works”, while they don’t know how and very often it does not (sort of). You actually should know more about it – or at least some basics for that matter – to be really useful. Of course – this magic problem is not only about Spring (or JPA), but these are the leaders of the “it simply works” movement.

But however you look at it, it’s still “enterprise” – and that means complexity. Sometimes essential, but mostly accidental. Well, that’s also part of the Java landscape.

Google Talk (RIP)

And this is for this post’s biggest let down. Google stopped supporting their beautifully simple chat client without any reasonable replacement. Chrome application just doesn’t seem right to me – and it actually genuinely annoys me with it’s chat icon that hangs on the desktop, sometimes over my focused application, I can’t relocate it easily… simply put, it does not behave as normal application. That means it behaves badly.

I switched to pidgin, but there are issues. Pidgin sometimes misses a message in the middle of the talk – that was the biggest surprise. I double checked, when someone asked me something reportedly again, I went to my Gmail account and really saw the message in Chat archive, but not in my client. And if I get messages when offline, nothing notifies me.

I activated the chat in my Gmail after all (against my wishes though), merely to be able to see any missing messages. But sadly, the situation with Google talk/chat (or Hangout, I don’t care) is dire when you expect normal desktop client. 😦

My Windows toolset

Well – now away from Java, we will hop on my typical developer’s Windows desktop. I mentioned some of my favourite tools, some of them couple of times I guess. So let’s do it quickly – bullet style:

  • Just after some “real browser” (my first download on the fresh Windows) I actually download Rapid Environment Editor. Setting Windows environment variables suddenly feels normal again.
  • Git for Windows – even if I didn’t use git itself, just for its bash – it’s worth it…
  • …but I still complement the bash with GnuWin32 packages for whatever is missing…
  • …and run it in better console emulator, recently it’s ConEmu.
  • Notepad2 binary.
  • And the rest like putty, WinSCP, …
  • Also, on Windows 8 and 10 I can’t imagine living without Classic Shell. Windows 10 is a bit better, but their Start menu is simply unusable for me, classic Start menu was so much faster with keyboard!

As an a developer I sport also some other languages and tools, mostly JVM based:

  • Ant, Maven, Gradle… obviously.
  • Groovy, or course, probably the most popular alternative JVM language. Not to mention that groovsh is good REPL until Java 9 arrives (recently delayed beyond 2016).
  • VirtualBox, recently joined by Vagrant and hopefully also something like Chef/Puppet/Ansible. And this leads us to my plans.

Things I want to try

I was always friend of automation. I’ve been using Windows for many years now, but my preference of UNIX tools is obvious. Try to download and spin up virtual machine for Windows and Linux and you’ll see the difference. Linux just works and tools like Vagrant know where to download images, etc.

With Windows people are not even sure how/whether they can publish prepared images (talking about development only, of course), because nobody can really understand the licenses. Microsoft started to offer prepared Windows virtual machines – primarily for web development though, no server class OS (not that I appreciate Windows Server anyway). They even offer Vagrant, but try to download it and run it as is. For me Vagrant refused to connect to the started VirtualBox machine, any reasonable instructions are missing (nothing specific for Vagrant is in the linked instructions), no Vagrantfile is provided… honestly, quite lame work of making my life easier. I still appreciate the virtual machines.

But then there are those expiration periods… I just can’t imagine preferring any Microsoft product/platform for development (and then for production, obviously). The whole culture of automation on Windows is just completely different – read anything from “nonexistent for many” through “very difficult” to “made artificially restricted”. No wonder many Linux people can script and too few Windows guys can. Licensing terms are to be blamed as well. And virtual machine sizes for Windows are also ridiculous – although Microsoft is reportedly trying to do something in this field to offer reasonably small base image for containerization.

Anyway, back to the topic. Automation is what I want to try to improve. I’m still doing it anyway, but recently the progress is not that good I wished it to be. I fell behind with Gradle, I didn’t use Docker as much as I’d like to, etc. Well – but life is not work only, is it? 😉

Conclusion

Good thing is there are many tools available for Windows that make developer’s (and former Linux user’s) life so much easier. And if you look at Java and its whole ecosystem, it seems to be alive and kicking – so everything seems good on this front as well.

Maybe you ask: “What does 5/5 mean anyway?” Is it perfect? Well, probably not, but at least it means I’m satisfied – happy even! Without happiness it’s not 5, right?

JPA – modularity denied

Here we are for another story about JPA (Java Persistence API) and its problems. But before we start splitting our persistence unit into multiple JARs, let’s see – or rather hear – how much it is denied. I can’t even think about this word normally anymore after I’ve heard the sound. 🙂

Second thing to know before we go on – we will use Spring. If your application is pure Java EE this post will not work “out of the box” for you. It can still help you with summarizing the problem and show some options.

Everybody knows layers

I decided to modularize our application, because since reading Java Application Architecture book I couldn’t sleep that well with typical mega-jars containing all the classes at a particular architecture level. Yeah, we modularize but only by levels. That is, we typically have one JAR with all entity classes, some call it “domain”, some just “jpa-entities”, whatever… (not that I promote ad-lib names).

In our case we have @Service classes in different JAR (but yeah, all of them in one) – typically using DAO for group of the entities. Complete stack actually goes from REST resource class (again in separate JAR, using Jersey or Spring MVC) which uses @Service which in turns talks to DAOs (marked as @Repository, although it’s not a repository in the pure sense). Complex logic is pushed to specific @Components somewhere under service layer. It’s not DDD, but at least the dependencies flow nicely from top down.

Components based on features

But how about dependencies between parts of the system at the same level? Our system has a lot of entity classes, some are pure business (Clients, their Transactions, financial Instruments), some are rather infrastructural (meta model, localization, audit trail). Why can’t these be separated? Most of the stuff depends on meta model, localization is quite independent in our case, audit trail needs meta model and permission module (containing Users)… It all clicks when one thinks about it – and it is more or less in line with modularity based on features, not on technology or layers. Sure we can still use layer separation and have permission-persistence and permission-service as well.

Actually, this is quite repeating question: Should we base packages by features or by layer/technology/pattern? From sources I’ve read (though I might have read what I wanted to :-)) it seems that the consensus was reached – start by feature – which can be part of your business domain. If stuff gets big, you can split them into layers too.

Multiple JARs with JPA entities

So I carefully try to put different entity classes into different JAR modules. Sure, I can just repackage them in the same JAR and check how tangled they are with Sonar, but it is recommended to enforce the separation and to make dependencies explicit (not only in the Java Application Architecture book). My experience is not as rich as of the experts writing books, but it is much richer compared to people not reading any books at all – and, gosh, how many of them is there (both books and people not reading them)! And this experience confirms it quite clearly – things must be enforced.

And here comes the problem when your persistence is based on JPA. Because JPA clearly wasn’t designed to have a single persistence unit across multiple persistence.xml files. So what are these problems actually?

  1. How to distribute persistence.xml across these JARs? Do we even have to?
  2. What classes need to be mentioned where? E.g., we need to mention classes from upstream JARs in persistence.xml if they are used in relations (breaks DRY principle).
  3. When we have multiple persistence.xml files, how to merge them in our persistence unit configuration?
  4. What about configuration in persistence XML? What properties are used from what file? (Little spoiler, you just don’t know reliably!) Where to put them so you don’t have to repeat yourself again?
  5. We use EclipseLink – how to use static weaving for all these modules? How about a module with only abstract mapped superclass (some dao-common module)?

That’s quite a lot of problems for age when modularity is so often mentioned. And for technology that is from “enterprise” stack. And they are mostly phrased as questions – because the answers are not readily available.

Distributing persistence.xml – do I need it?

This one is difficult and may depend on the provider you use and the way you use it. We use EclipseLink and its static weaving. This requires persistence.xml. Sure we may try keep it together in some neutral module (or any path, as it can be configured for weaving plugin), but it kinda goes against the modularity quest. What options do we have?

  • I can create some union persistence.xml in module that depends on all needed JARs. This would be OK if I had just one such module – typically some downstream module like WAR or runnable JAR or something. But we have many. If I made persistence.xml for each they would contain a lot of repetition. And I’d reference downstream resource, which is ugly!
  • We can have dummy upstream module or out of module path with union persistence.xml. This would keep things simple, but it would be more difficult to develop modules independently, maybe even with different teams.
  • Keep persistence.xml in the JAR with related classes. This seems best from modularity point of view, but it means we need to merge multiple persistence.xml files when the persistence unit starts.
  • Or can we have different persistence unit for each persistence.xml? This is OK, if they truly are in different databases (different JDBC URL), otherwise it doesn’t make sense. In our case we have rich DB and any module can see the part it is interested in – that is entities from the JARs it has on the classpath. If you have data in different databases already, you’re probably sporting microservices anyway. 🙂

I went for third option – especially because EclipseLink’s weaving plugin likes it and I didn’t want to redirect to non-standard path to persistence.xml – but it also seems to be the right logical way. However, there is nothing like dependency between persistence.xml files. So if you have b.jar that uses a.jar, and there is entity class B in b.jar that contains @ManyToOne to A entity from a.jar, you have to mention A class in persistence.xml in b.jar. Yes, the class is already mentioned in a.jar, of course. Here, clearly, engineers of JPA didn’t even think about possibility of using multiple JARs in a really modular way.

In any case – this works, compiles, weaves your classes during build – and more or less answers questions 1 and 2 from our problem list. And now…

It doesn’t start anyway

When you have a single persistence.xml, it will get found as a unique resource, typically in META-INF/persistence.xml – in any JAR actually. But when you have more of them, they don’t get all picked and merged magically – and the application fails during startup. We need to merge all those persistence.xml files during the initialization of our persistence unit. Now we’re tackling questions 3 and 4 at once, for they are linked.

To merge all the configuration XMLs into one unit, you can use this configuration for PersistenceUnitManger in Spring (clearly, using MergingPersistenceUnitManager is the key):

@Bean
public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
    MergingPersistenceUnitManager persistenceUnitManager =
        new MergingPersistenceUnitManager();
    persistenceUnitManager.setDefaultDataSource(dataSource);
    persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
    // default persistence.xml location is OK, goes through all classpath*
    return persistenceUnitManager;
}

But before I unveil the whole configuration we should talk about the configuration that was in the original singleton persistence.xml – which looked something like this:

<exclude-unlisted-classes>true</exclude-unlisted-classes>
<shared-cache-mode>ENABLE_SELECTIVE</shared-cache-mode>
<properties>
    <property name="eclipselink.weaving" value="static"/>
    <property name="eclipselink.allow-zero-id" value="true"/>
<!--
Without this there were other corner cases when field change was ignored. This can be worked-around calling setter, but that sucks.
-->
    <property name="eclipselink.weaving.changetracking" value="false"/>
</properties>

The biggest question here is: What is used during build (e.g. by static weaving) and what can be put into runtime configuration somewhere else? Why somewhere else? Because we don’t want to repeat these properties in all XMLs.

But before finishing the programmatic configuration we should take a little detour to shared-cache-mode that showed the problem with merging persistence.xml files in the most bizarre way.

Shared cache mode

Firstly, if you mean it seriously with JPA, I cannot recommend enough one excellent and comprehensive book that answered tons of my questions – often before I even asked them. I’m talking about Pro JPA 2, of course. Like, seriously, go and read it unless you are super-solid in JPA already.

We wanted to enable cached entities selectively (to ensure that @Cacheable annotations have any effect). But I made a big mistake when I created another persistence.xml file – I forgot to mention shared-cache-mode there. My persistence unit picked both XMLs (using MergingPersistenceUnitManager), but my caching went completely nuts. It cached more than expected and I was totally confused. The trouble here is – persistence.xml don’t get really merged. The lists of classes in them do, but the configurations do not. Somehow my second persistence XML became dominant (one always does!) and because there was no shared-cache-mode specified, it used defaults – which is anything EclipseLink thinks is the best. No blame there, just another manifestation that JPA people didn’t even think about this setup scenarios.

It’s actually the other way around – you can have multiple persistence units in one XML, that’s a piece of cake.

If you really want to get some hard evidence how things are in your setup, put a breakpoint somewhere where you can reach your EntityManagerFactory, and when it stops there, dig deeper to find what your cache mode is. Or anything else – you can check the list of known entity classes, JPA properties, … anything really. And it’s much faster than mess around just guessing.

jpa-emf-debuggedIn the picture above you can see, that now I can be sure what shared cache mode I use. You can also see which XML file was used, in our case it was from meta-model module (JAR), so this one would dominate. Luckily, I don’t rely on this anymore, not for runtime configuration at least…

Putting the Spring configuration together

Now we’re ready to wrap up our configuration and move some stuff from persistence.xml into Spring configuration – in my case it’s Java-based configuration (XML works too, of course).

Most of our properties were related to EclipseLink. I read their weaving manual, but I still didn’t understand what works when and how. I had to debug some of the stuff to be really sure.

It seems that eclipselink.weaving is the crucial property namespace, that should stay in your persistence.xml, because it gets used by the plugin performing the static weaving. I debugged maven build and the plugin definitely uses eclipselink.weaving.changetracking property value (we set it to false which is not default). Funny enough, it doesn’t need eclipselink.weaving itself, because running the plugin implies you wish for static weaving. During startup it gets picked though, so EclipseLink knows it can treat classes as statically weaved – which means it can be pushed into programmatic configuration too.

The rest of the properties (and shared cache mode) are clearly used at the startup time. Spring configuration may then look like this:

@Bean public DataSource dataSource(...) { /* as usual */ }

@Bean
public JpaVendorAdapter jpaVendorAdapter() {
    EclipseLinkJpaVendorAdapter jpaVendorAdapter = new EclipseLinkJpaVendorAdapter();
    jpaVendorAdapter.setShowSql(true);
    jpaVendorAdapter.setDatabase(
        Database.valueOf(env.getProperty("jpa.dbPlatform", "SQL_SERVER")));
    return jpaVendorAdapter;
}

@Bean
public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
    MergingPersistenceUnitManager persistenceUnitManager =
        new MergingPersistenceUnitManager();
    persistenceUnitManager.setDefaultDataSource(dataSource);
    persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
    persistenceUnitManager.setSharedCacheMode(
        SharedCacheMode.ENABLE_SELECTIVE);
    // default persistence.xml location is OK, goes through all classpath*

    return persistenceUnitManager;
}

@Bean
public FactoryBean<EntityManagerFactory> entityManagerFactory(
    PersistenceUnitManager persistenceUnitManager, JpaVendorAdapter jpaVendorAdapter)
{
    LocalContainerEntityManagerFactoryBean emfFactoryBean =
        new LocalContainerEntityManagerFactoryBean();
    emfFactoryBean.setJpaVendorAdapter(jpaVendorAdapter);
    emfFactoryBean.setPersistenceUnitManager(persistenceUnitManager);

    Properties jpaProperties = new Properties();
    jpaProperties.setProperty("eclipselink.weaving", "static");
    jpaProperties.setProperty("eclipselink.allow-zero-id",
        env.getProperty("eclipselink.allow-zero-id", "true"));
    jpaProperties.setProperty("eclipselink.logging.parameters",
        env.getProperty("eclipselink.logging.parameters", "true"));
    emfFactoryBean.setJpaProperties(jpaProperties);
    return emfFactoryBean;
}

Clearly, we can set the database platform, shared cache mode and all runtime relevant properties programmatically – and we can do it just once. This is not a problem for a single persistence.xml, but in any case it offers better control. You can now use Spring’s @Autowired private Environment env; and override whatever you want with property files or even -D JVM arguments – and still fallback to default values – just as we do for database property of the JpaVendorAdapter. Or you can use SpEL. This is flexibility persistence.xml simply cannot provide.

And of course, all the things mentioned in the configuration can now be removed from all your persistence.xml files.

I’d love to get rid of eclipselink.weaving.changetracking in the XML too, but I don’t see any way how to provide this as the Maven plugin configuration option, which we have neatly unified in our parent POM. That would also eliminate some repeating.

Common DAO classes in separate JAR

This one (question 5 from our list) is no problem after all the previous, just a nuisance. EclipseLink refuses to weave your base class regardless of @MappedSuperclass usage. But as mentioned in one SO question/answer, just add dummy concrete @Entity class and you’re done. You never use it, it is no problem at all. And you can vote for this bug.

This is probably not problem for load-time weaving (haven’t tried it), or for Hibernate. I never had to solve any weaving problem with Hibernate, but on the other hand current project pushed my JPA limits further, so maybe I would learn something about Hibernate too (if it was willing to work for us in the first place).

Any Querydsl problems?

Ah, I forgot to mention my favourite over-JPA solution! Were there any Querydsl related problems? Well, not really. The only hiccup I got was NullPointerException when I moved some entity base classes and my subclasses were not compilable. Before javac could have printed reasonable error, Querydsl went in and gave up without good diagnostic on this most unexpected case. 🙂 I filed an issue for this, but after I fixed my import statements for the superclasses, everything was OK again.

Conclusion

Let’s do it in bullets, shall we?

  • JPA clearly wasn’t designed with modularity in mind – especially not when modules form a single persistence unit, which is perfectly legitimate usage.
  • It is possible to distribute persistence classes into multiple JARs, and then:
    • You can go either with a single union persistence.xml, which can be downstream or upstream – this depends, if you need it only in runtime or during build too.
    • I believe it is more proper to pack partial persistence.xml in each JAR, especially if you need it during build. Unfortunately, there is no escape from repeating some upstream classes in the module again, just because they are referenced in relations (typical culprit when “I don’t understand how this is not entity, when it clearly is!”).
  • If you have multiple persistence.xml files, it is possible to merge them using Spring’s MergingPersistenceUnitManager. I don’t know if you can use it for non-Spring applications, but I saw this idea reimplemented and it wasn’t that hard. (If I had to reimplement it, I’d try to merge the configuration part too!)
  • When you’re merging persistence.xml files, it is recommended to minimize configuration in them, so it doesn’t have to be repeated. E.g., for Eclipselink we leave only stuff necessary for built-time static weaving, the rest is set programmatically in our Spring @Configuration class.

There are still some open questions, but I think they lead nowhere. Can I use multiple persistence units with a single data source? This way I can have each persistence.xml as a separate unit. But I doubt relationships would work across these and the same goes for transactions (without XA that is). If you think multiple units is relevant solution, let me know, please.

I hope this helps if you’re struggling with the noble quest of modularity. Don’t be shy to share your experiences in the comments too! No registration required. 😉

EclipseLink joins without mapped attributes

EDIT April 2016: Originally I believed I described pure JPA 2.1 solution, much later I found it is merely an EclipseLink extension to the JPQL that is not supported in standard.

OK, whoever knows it is possible, just stop reading right away (before my JPA rant starts). Note, that this post builds on JPA 2.1 syntax for ON and EclipseLink extension that allows to join root entities (JPA specification allows only association paths).

JPA is hard

Java Persistence API is not easy – let’s face it. Please. Because it is easy only for two kinds of programmers – those that know everything (again, congratulation) and those who think they know everything. Or something similar. While I’m all for minimizing complexity to make things easy, it is known fact that making things simple (that is non-complex) is hard. Everytime I hear “no problem, that’s easy” I know what I’ll get. Complex ball of mud that does not correspond to “easy” at all.

I’ve read parts of Hibernate reference, I’ve read Pro JPA 2/2.1 – and I’d love to read it all finally, but there is always something else to read too. And so while I was aching for solution how to get rid of some my @Many/OneToOne mappings from entities (because these are not guaranteed to be lazy and can trigger too many selects), I thought I couldn’t because I’ll be unable to join in the direction of that association. I was wrong, but first…

Can’t you avoid triggered selects for *ToOne somehow?

Yes, you can. You can eagerly fetch that association with JOIN, some smart ORMs do that for you to some depth. Let’s talk about this example:

A Dog points to a Breed. How can I populate a table on the user interface with selected attributes from the dog and its breed as well?

  • The worst case (and still traditionally used!) – select Dogs and don’t care. This will likely result in N+1 select (officially name does not reflect the cause and effect – as it is rather 1+N). This plain sucks. I really don’t know why some mainstream ORM solutions still execute this as N+1 even when it’s eager (*ToOne is by default), but that just underlines… well, the state of ORM, obviously.
  • Fetch the relationship! Of course. It is bound to happen anyway, so let’s deal with it in one select. Does it mean we’re done in single select? If you have more *ToOne relations on your Dog – or Breed – then you’re not. Any fetch may trigger additional selects for further to-one relation.
  • Fetch and select columns explicitly. This works, but you have to deal with Lists of arrays of Objects, or – in better case – Querydsl’s handy Tuple. Or you may use projection to DTO. When you need to do this with 60 columns you’ll know the pain.

But yes, it is possible. The question is if it’s worth to use JPA if 80% of time you go around these problems. Depending on relationship optionality you need to use LEFT JOIN of course. “Use LEFT JOIN, Luke!” is my answer when another developer asks me questions like “when I want to display fields from breed, some rows just disappear from my results!” or similar with a bit of fun in it “count select returns different count than is the size of final result!” – because you don’t need to join for count unless you perform WHERE part on joined columns.

Summed up, it’s hard. Gone is the premise that you’ll enjoy working with some mapped objects. No, no, deal with Lists of Object[] or DTOs! It’s hard to say what is better – DTO in theory, but if the list of columns is changing a lot then it’s just another place you’ll have to change. If you still use vanilla JPA, consider Querydsl, seriously. Tuple may be lightweight DTO for you and on the other side you can get the stuff out using expression paths that are compile time safe – without the need to mess with accessors. (Old enough to remember this article? :-))

Dropping @ManyToOne annotation

(The same applies for @OneToOne where relevant.)

To write left join from Dog to Breed you can find virtually exclusively examples like this (Querydsl, but the same goes for JPQL):

QBreed breed = new QBreed(“breed”); // alias
List<Tuple> result = new JPAQuery().from(QDog.dog)
  .leftJoin(QDog.dog.breed, breed)
  .list(QDog.dog.name, breed.name);

That is, in left join you first state the path how to get to the breed from the dog and then you assign an alias to it (breed). Sometimes you can live without aliases, but in general you need them for any deeper joins. Notice also how the leftJoin implies ON clause. This is logical and expected, that’s why we have the mapping there.

But thing I never realized (and decided to try it the first time today) is that you can just leftJoin to alias and add your ON explicitly – just like you would in SQL! (This is not JPA standard, but EclipseLink QL extension. Querydsl does not have any problem to generate it for you.)

QBreed breed = new QBreed(“breed”); // alias
List<Tuple> result = new JPAQuery().from(QDog.dog)
  .leftJoin(breed).on(breed.id.eq(QDog.dog.breedId))
  .list(QDog.dog.name, breed.name);

Obviously you have to have breedId attribute mapped on Dog – but that is probably some plain type like Integer. This will not trigger any additional select. If you really want, you can have dual mapping for the same column like this:

@Column(name = "breed_id", insertable = false, updatable = false)
private Integer breedId;

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "breed_id")
private Breed breed;

Notice the insertable/updatable=false – this has to be present on one of those mappings. The question is on which one. If you save/update the entity a lot and breed’s ID is good enough then move it to @JoinColumn annotation. But if you don’t need Breed in most cases at all, remove it completely. Also notice how the code is lying to you about LAZY fetch. It is not. In theory it can be with some byte-code modified Breed (Hibernate does it out of the box), but that is not portable anyway. If you absolutely need that attribute in Dog class, you can still make it @Transient, fetch it and list it explicitly (here I show it for singleResult for brevity):

QBreed breed = new QBreed(“breed”); // alias
Tuple result = new JPAQuery().from(QDog.dog)
  .leftJoin(breed).on(breed.id.eq(QDog.dog.breedId))
  .singleResult(QDog.dog, breed);
Dog dog = result.get(QDog.dog);
dog.setBreed(result.get(breed)); // sets transient field

No problem at all. Full control. Couple of lines longer. Probably way more performant on scale.

What if JPA allowed us to ignore relations on demand?

If you use only id value that copies exactly FK in your entities then you’ll lose some of the ORM benefits – but as said you’ll also dump a lot of its burden that is in many cases unconsidered. Imagine you could specify mappings like this:

@Column(name = "breed_id")
@ManyToOne(targetEntity = Breed.class) // exists, but invalid when types don’t match
private Integer breedId;

This way you’d say what the target entity is, generated metamodels could offer you joins, etc. I’m not going to evolve this idea at all because there are some obvious disadvantages. Your JOIN path would be dog.breedId, not breed, so if you followed it to Breed’s attributes it would read confusingly (assuming that when used as a path it would act as Breed of course). Then maybe this:

@ManyToOne(fetch = FetchType.JOIN_ONLY) // does not exist
@JoinColumn(name = "breed_id")
private Breed breed;

This would mean that unless you require JOIN explicitly in your select this field will NOT be populated. Actually, this would be best to combine with dual mapping as shown sooner. ID is easy to get without JOIN – and referenced entity would be insertable/updatable false. Otherwise semantics of save would be questionable. Does null mean that entity is removed and FK should be nulled? Or does it mean to ignore it? Should JPA ignore it you need that explicit breedId field. If not ignored even trivial find/change single attribute/save (or let it happen automatically at the transaction commit) would delete your relation unexpectedly. You can add some dirty checking magic of course (“if the attribute changes then do that, otherwise don’t do anything”), but we’d just add to existing mess. So this definitely breaks “least surprise” principle. But with dual mapping that is already possible, I’d love to have this really lazy fetch type.

In case you wonder why LAZY is not guaranteed (without byte-code modification magic) – the answer is that breed attribute should be null only if NULL in database. So if the FK is there, you need to instantiate some Breed and if that supposes to be lazy, it has to be some kind of proxy to a real one that is lazily loaded when you actually touch (get something from) the breed object. This is very easy with collections (*ToMany) and ORM’s can do that (although Eclipselink’s Vector is spectacularly wrong way to do it) – that’s why I’m so possessed with *ToOne problem.

There is also another alternative solution – we would have Breed instantiated as a wrapper for the FK value. That is dog.breed.id would not be null, but other attributes of dog.breed would not be initialized. Programmer would have to know – after all he requested it this way. That would allow the mapping to be still object based, but we would have more low-level control over the fetches.

Is it still ORM?

With dual mapping or with FK wrapper it would still be ORM as we know it. You’d have bigger control, single em.find would not trigger 47 selects (real story, seriously, and all traversing *ToOne relationships) and there would be minimal implications to JPA (although there are more competent to say so or otherwise).

Currently I plan to drop quite a lot of *ToOne mappings because:

  • There is no portable way to avoid the fetch when I know I don’t want it.
  • When I want it, I can join it myself anyway.
  • If I want to pick specific relation I can just find it by ID that is available on the entity (and then store it to @Transient field for example).

Am I still using it as ORM then? I’m dropping a lot of “mappings” of my “relations(hips)” from my “objects”. But I can still utilize a lot of JPA (and Querydsl). There still is persistence context, you can use L2 cache (if you want), and maybe you don’t have uber-cool mapping to your domain… but let’s face it. How often we really map it? How often we just mirror our databases?

Do you map @ManyToMany or do you map the association table with two IDs instead? With explicit mapping you can get list of IDs of related object by ID of owning object. In SQL it is a single table query. Eclipselink joins Master, MasterDetail and Detail to get information that is clearly available on MasterDetail table alone (not mapped explicitly). Is it so difficult to treat IDs properly? It probably is. How does ORM lower the complexity here?

Execution postponed

JPA adds tons of accidental complexity every time. Not first, but later. And virtually all teams working with it have no real JPA expert. I study JPA more than most people I know and it still keeps surprising me. Rarely in a good way. Left join using unrelated alias (at least from mapping point of view) and explicit ON is the good one. I don’t even know why I never tried it when I officially longed for it! I ranted: “Why don’t we have free joins on any numbers (PK/FK) just like SQL can do?” And nobody ever contradicted me. Most examples always start with path going from FROM entity, using alias only as their second argument (Querydsl, but JPQL examples are the same). We are slaves of our *ToOne relations and can’t get rid of undesired selects – and there is a simple way how to do it all the time. Partial reason for this is that ON for JOIN requires JPA 2.1 and didn’t work before.

If you don’t care about *ToOne problem, no problem. To be fair, if you have lazy set up properly you may be happy with it. If you do want to get rid of to-one relations the treatment is:

  • Replace your *ToOne mapping with simple ID as a value attribute.
  • Change anyJoin(x.y, alias) to anyJoin(alias).on(x.yId.eq(alias.id)) – this is Querydsl example but it transforms to JPQL directly. List both entities in projection if you need it (maybe you just WHERE on it).
  • If you really want +1 select, do it with em.find yourself. That may actually use L2 cache (if it’s not in the persistence context already).

With this tiny speck of knowledge I’m backing off from my plans to get rid of ORM in our project (you can imagine what pain that would be). Now it gives me so-so enough control to contain the select explosions to required levels.

EDIT 2016: Just to repeat it – this is based on EclipseLink support for JOIN for root entities. Maybe it will get into JPA eventually (I’d love it), but as of 2.1 it is not there. ON condition is there already, but without root entity JOIN it is not enough.

 

JPA – is it worth it? Horror stories with EclipseLink and Hibernate

One friend of mine brought Hibernate to our dev-team back in 2004 or so. But now he uses something much simpler and avoids ORM/JPA whenever possible. He can, because he is mostly master of his projects.

I had to get more familiar with JPA on my path. There is always more and more to learn about it. When you discover something like orphanRemoval = true on @OneToMany, it may bring you to the brink of crying. Because of happiness of course. (Stockholm Syndrome probably.) But then there are other days, when you just suffer. Days when you find bugs in JPA implementations – or something close to them. And you actually can’t choose from so many providers, can you? There are just two mainstream players.

Right now we are using EclipseLink and there are two bugs (or missing features or what) that kinda provoked our big effort to switch to Hibernate. What were our problems and what was the result of our switch? Read on…

EclipseLink and Java 8

You can avoid this problem, although you have to be extra careful. EclipseLink’s lazy list – called IndirectList – cannot be used with streams. Actually – if it didn’t compile, it would be good. Worse is it works but very badly.

Guys creating Java do their best to make streams work effortlessly, there are those default methods in interfaces, etc. But no, no luck here. IndirectList returns empty stream. Why? Because of this bug. (It works fine for IndirectSet.)

What is the cause of this problem? Well… someone decided to extend Vector, but instead of sticking with it, they also decided that the extended Vector will not be the actual backing collection, but they added their Vector delegate too. This works fine for interfaces like List, but not so for extended things. Now add “ingenious” design of Vector – full of protected fields you have to take care of… here IndirectList clearly collided with Vector badly.

Why do you need to take care of protected fields? Because creating the stream on Vector uses its special java.util.Vector.VectorSpliterator, that uses these protected values. So if you delegate to other Vector, you either delegate java.util.Vector#spliterator too (but then it won’t compile with earlier Java) or – preferably – don’t delegate to other vector and use that extended one as your backing collection. Of course, guys at Java might have used size() and get(int) instead of accessing Vector’s protected elementCount, elementData, etc. – but that would not be as efficient.

BTW: Anybody here likes Vector? I personally hate Java’s wannabe compatibility that really hurts its progress as we need to drag more and more burden with us.

JPA and counts on joins with distinct

The other trouble was related to a specific select where we used join with distinct instead of subquery with exists. We have an entity called Client that can be bound to any number of Domains, @ManyToMany mapping is used on Client side. Domain does not know about Client – hence the arrow.

jpa-client-domains

We often need to obtain Clients where any of their domains is in a provided list (actual input parameter contains IDs of those domains). Using Querydsl, it looks like this:

QDomain cd = new QDomain("cd");
JPAQuery query = new JPAQuery(em).from(QClient.client)
    .join(QClient.client.domains, cd)
        .on(cd.id.in(domainIds))
    .distinct();

I’m sure you can easily transform it mentally to Criteria or JPQL – Querydsl produces JPQL, BTW. Distinct here has a special meaning that is used by JPA specification. Now you can call query.list(QClient.client) – it produces nice join and returns only distinct Clients. You try query.count(), and it works as expected as well.

But imagine that Client has composite primary key. We actually have exactly the same case, an entity bound again to domains, everything looks the same – just that entity has composite PK. List works alright, but if you try query.count() you’ll get completely different illogical query using exists and the join to Domains is lost completely. Results are wrong, definitely not the size of the list result.

There are some bugs filed for this – for instance this one. Because there was a discussion about this behavior we decided to find out how Hibernate treats this.

Spoiler:

The trouble is that it is not the bug after all – at least not according to JPA 2.1specification which reads in section 4.8.5: The use of DISTINCT with COUNT is not supported for arguments of embeddable types or map entry types.

I found this today when I was gathering proofs for this post. Our story however would unfold in different direction. Are we using JPA? Yes. Are there other options? Yes!

Spoiler 2:

We used subselect with exists in the end, but that’s not the point. 😉

So we switch to Hibernate, right?

We got kinda carried away by some bug reports resolved as FIXED – like this one or the ones you can see in its links. But switching JPA provider isn’t that easy as promised. In the end we found out that count distinct for entities with composite PKs doesn’t work with SQL Server anyway. But we learned a lot of interesting things about how far both implementations are apart when it comes to translating JPQL to SQL. Mind you, I’m not actually sure whether everything we wrote in Querydsl (that generates JPQL) is 100% correct, but then it should say something. When it works I expect it to work after changing a JPA implementation.

Hibernate screwing deletes with “joins”

We have a lot of delete clauses that worked just fine in EclipseLink:

new JPADeleteClause(em, QGuiRolePermission.guiRolePermission)
    .where(QGuiRolePermission.guiRolePermission.guiRole.description.eq(TEST_DESCRIPTION))

This essentially means “delete all role permission assignments for role with specified description”. Piece-a-cake, right? We’re using implicit join there again, but it is the same story all over again – and it worked in EclipseLink just fine. EclipseLink creates proper SQL with exists:

DELETE FROM GUI_RolePermissions WHERE EXISTS(
  SELECT t1.gui_role_id FROM GUI_Roles t0, GUI_RolePermissions t1
    WHERE ((t0.description = ?) AND (t0.id = t1.gui_role_id)) AND t1.gui_role_id = GUI_RolePermissions.gui_role_id
      AND t1.domain_id = GUI_RolePermissions.domain_id AND t1.permission_id = GUI_RolePermissions.permission_id)

It is not perfect – and we get back to it in section Random Query Generator – but it works. Let’s just compare it to Hibernate now. This is JPQL (actually this is the same for both providers as it’s produced by Querydsl):

delete from GuiRolePermission guiRolePermission
  where guiRolePermission.guiRole.description = ?1

This does not seem alarming – but the SQL is completely off:

delete from GUI_RolePermissions cross join GUI_Roles guirole1_ where description=?

This does not work on our currently used SQL Server, whatever dialect we choose. Why not go with exists? We have helper method that takes entity (Querydsl base path) and its where condition and performs delete. Now instead of:

deleteTable(QGuiRolePermission.guiRolePermission,
    QGuiRolePermission.guiRolePermission.guiRole.description.eq(TEST_DESCRIPTION));

We have to write this:

deleteTable(QGuiRolePermission.guiRolePermission,
    new JPASubQuery().from(QGuiRolePermission.guiRolePermission)
    .where(QGuiRolePermission.guiRolePermission.guiRole.description
        .eq(DataPreparator.TEST_DESCRIPTION))
    .exists());

Not terrible… but why? EDIT 2016-01-07: There is an issue filed for this, so far not fixed.

Another Hibernate’s twist

Things may get complicated when more relationships are involved. Take this simple JPQL for instance:

delete from Security security
  where security.issuer.priority = ?1

We want to remove all Securities with issuer (Client) having specific priority value:

jpa-security-delete

There is another implicit join there, but one subquery with exists should cover it. Security class contains @ManyToMany relationships to Domain class through intermediate table Securities_Domains. That’s why we need two deletes here – and this is what EclipseLink generates (issuer is of class Client):

DELETE FROM Securities_Domains WHERE EXISTS(
  SELECT t1.id FROM "Clients" t0, Securities t1
    WHERE ((t0."priority" = ?) AND (t0."id" = t1."issuer_id"))
      AND t1.id = Securities_Domains.security_id);
DELETE FROM Securities WHERE EXISTS(
  SELECT t1.id FROM "Clients" t0, Securities t1
    WHERE ((t0."priority" = ?) AND (t0."id" = t1."issuer_id"))
      AND t1.id = Securities.id);

It works just fine. But Hibernate shows its muscles really strong here!

delete from Securities_Domains where (security_id) in (
   select id from Securities where priority=?)

Obviously right the first delete is missing join to the Client entity in that subselect – and fails spectacularly. And we’re actually lucky we don’t have any other column called priority on Securities as well. 🙂 That could hide the error for ages.

With or without id?

Love that song… I mean the U2 one, not the Hibernate’s one. When you see JPQL with equality test on two entities you assume it is performed on their ids. So if you actually specify those ids, it should be still the same, right? Maybe it’s just some JPA myth after all. Consider this JPQL:

delete from MetaObjectSetup metaObjectSetup
  where not exists (select 1 from Permission permission
    where permission.metaObjectSetup = metaObjectSetup)

This produces query that does not work properly, probably taking last id from permission1_.

delete from META_Object_Setups where  not (exists (
  select 1 from Permissions permission1_
    where permission1_.meta_object_setup_id=id))

Version working with Hibernate must perform eq on id fields explicitly (EclipseLink doesn’t mind either):

deleteTable(QMetaObjectSetup.metaObjectSetup,
   new JPASubQuery().from(QPermission.permission)
    .where(QPermission.permission.metaObjectSetup.id
        .eq(QMetaObjectSetup.metaObjectSetup.id))
    .notExists());

SQL result:

delete from META_Object_Setups where  not (exists (
  select 1 from Permissions permission1_
    where permission1_.meta_object_setup_id=META_Object_Setups.id))

Experiences like these were the last drop for us and we reverted all the efforts to start using Hibernate.

Happy ending?

We switched back to EclipseLink after this. I don’t remember so much resistance from Hibernate like… ever. Maybe our JPQLs were too loose for it, joins were not explicit, aliases were missing, etc. But in the end it did not solve our problem.

It is really shame that count and distinct are not possible in the way making using query.list and query.count operations consistent in libraries like Querydsl. It is also shame that when it is not supported (as officially stated in the specification) it does not throw exception and does something fishy instead – silently.

You can do the same in SQL wrapping the select into another one with count – but JPQL does not support queries in the FROM clause. Pitty. However this is one of those cases when you can’t go wrong with correlated subquery. You just have to remember that subquery does not imply any equality implied by JPA joins (it can’t actually, there are cases when this would be an obstacle) and you have to do it yourselves – see the last examples from previous part (With Or Without Id).

This is all really crazy. Remotely, it reminds me horror stories about JSF and bugs of its various implementations (always different). Sure, things are really complicated, but then maybe the trouble is they are. Maybe it can be simpler. Maybe it’s wrong that I have to define @ManyToOne to be able to express joins. (Edit: Not true for JPA 2.1 as fixed in this post.) Any @XToOne has consequences often unseen even by experienced JPA users. Maybe some jOOQ or Querydsl over plain SQL is better than this stuff. I just don’t know…

Random Query Generator in EclipseLink

Let’s look at something else while we stick with JPA. Here we are back to Eclipse and there is one very easy thing I want to do in SQL:

DELETE FROM Contacts WHERE EXISTS(
  SELECT t1.id FROM Clients t0 WHERE ((t0.priority = ?) AND t0.id = Contacts.id)

In plain wording – I want to remove all Client’s Contacts when this Client has some specific priority. If you ask for mapping… I hope there is nothing screwed here:

@Entity
@Table(name = "Clients")
public class Client {
...
    @OneToMany(mappedBy = "client", cascade = CascadeType.ALL)
    private List<ClientContact> contacts;
...
}

@Entity
@Table(name = "Contacts")
public class ClientContact {
...
    @Column(name = "client_id", insertable = false, updatable = false)
    private Integer clientId;

    @ManyToOne
    @JoinColumn(name = "client_id")
    private Client client;
...
}

Back reference to client is mapped both ways, but one mapping is read-only. Both primary keys are simple Integers. No miracles here. And now three ways we tried, always with Querydsl, JPQL and SQL:

new JPADeleteClause(em, QClientContact.clientContact).where(
    new JPASubQuery().from(QClient.client)
        .where(QClient.client.priority.eq(DataPreparator.TEST_CLIENT_PRIORITY)
            .and(QClient.client.eq(QClientContact.clientContact.client)))
        .exists());

// JPQL - logically exactly what we want
delete from ClientContact clientContact
  where exists (select client from Client client
    where client.priority = ?1 and client = clientContact.client)

// SQL contains double EXISTS and two unnecessary Clients more
DELETE FROM Contacts WHERE EXISTS(
  SELECT t0.id FROM Contacts t0 WHERE EXISTS (
    SELECT ? FROM Clients t2, Clients t1
      WHERE ((t2.priority = ?) AND (t2.id = t0.client_id)))  AND t0.id = Contacts.id)
bind => [1, 47]

Ok, it works, right? Can we help it when we mention id equality explicitly (client.id.eq(…client.id))?

new JPADeleteClause(em, QClientContact.clientContact).where(
    new JPASubQuery().from(QClient.client)
        .where(QClient.client.priority.eq(DataPreparator.TEST_CLIENT_PRIORITY)
            .and(QClient.client.id.eq(QClientContact.clientContact.client.id)))
        .exists());

// JPQL looks promising again, I bet this must generate proper query - or at least the same
delete from ClientContact clientContact
  where exists (select client from Client client
    where client.priority = ?1 and client.id = clientContact.client.id)

// Three clients?!
DELETE FROM Contacts WHERE EXISTS(
  SELECT t0.id FROM Contacts t0 WHERE EXISTS (
    SELECT ? FROM Clients t3, Clients t2, Clients t1
      WHERE (((t2.priority = ?) AND (t2.id = t3.id)) AND (t3.id = t0.client_id)))
        AND t0.id = Contacts.id)
bind => [1, 47]

You gotta be kidding me, right? What for?! I always believed that if id is entitie’s @Id, than it can do virtually the same with or without it. I even dreamt about using foreign key directly – but that’s way beyond capabilities of current JPA providers, so it seems.

Ok, let’s try something lame. Something I’d write only in JPA, not in real SQL, of course. Something with a lot of implicit identity equality hidden between the lines:

new JPADeleteClause(em, QClientContact.clientContact)
  .where(QClientContact.clientContact.client.priority
    .eq(DataPreparator.TEST_CLIENT_PRIORITY));

// not like SQL at all, but pretty logical if you know the mapping
delete from ClientContact clientContact
  where clientContact.client.priority = ?1

/* Surprise! Still one unnecessary Contacts in inner join that could go through FK,
 * but definitely the best result so far.
 * Mind though - this refused to work with Hibernate, at least with SQL Server dialect (any). */
DELETE FROM Contacts WHERE EXISTS(
  SELECT t1.id FROM Clients t0, Contacts t1
    WHERE ((t0.priority = ?) AND (t0.id = t1.client_id)) AND t1.id = Contacts.id)
bind => [47]

I hope this demonstrates the impotence of this field of human activity after more than a decade of effort. I’m not saying it all sucks – but after seeing this I’m not far from that.

With JPA doing too many things you don’t want or expect (unless you’re an expert) and many limitations compared to SQL I’d expect at least semi-good SQL. This is not close really.

Ramble On

So that’s another day with JPA. With proper query (or best approximation when it’s not critical) there is just one last thing we have to pay attention to… those IndirectLists that can’t be streamed. Just one last thing I said? Ouch… of course, we have to check our SQLs still and watch for any bugs or gray zones.

Yeah, we’re staying with JPA for a while on this project. But the next one I’ll start with metamodel directly over DB. I hope I can’t suffer more that way. 🙂 And maybe I’ll be able to do ad-hoc joins without @XToY annotations that always bring in stuff you don’t want. That’s actually one of my biggest gripes with JPA.

Many people avoid as much relation mappings as possible while they still can express their joins. Many people avoid @ManyToMany and map the association table explicitly, so they can reach all entities A for B’s id (or list of ids) – with a single join. Otherwise EclipseLink stubbornly joins all three tables as it’s understanding of PKs and FKs is obviously lacking.

Lessons learned? Not that much actually. But we’re definitely not switching provider for another couple of months at least!

JPA Pagination of entities with @OneToMany/@ManyToMany

Displaying paginated results of entities with their *-to-many sub-entities is a tricky thing and many people actually don’t even know about the problems. Let’s check what JPA does and what it does not for you. Besides standard JPA I’ll use Querydsl for queries – this effectively results in JPQL. Querydsl is type-safe and compared to Criteria API much more readable.

Source of N+1 select problem

Imagine having an entity called User that can have many Roles. Mapping looks like this:

@JoinTable(name = "user_roles",
  joinColumns = @JoinColumn(name = "user_id", referencedColumnName = "id"),
  inverseJoinColumns = @JoinColumn(name = "role_id", referencedColumnName = "id"))
@ManyToMany(fetch = FetchType.EAGER)
private Set<Role> roles;

You want to display a paginated list of users that contains list of their roles – for instance as tiny icons with tooltips containing role names.

Naive approach is to query the Users and then wait for whatever may come:

QUser u = QUser.user; // we will use this alias further
List<User> result = new JPAQuery(em).from(u)
    .where(...)
    .offset((page - 1) * pageSize)
    .limit(pageSize)
    .list(u);

Now something may or may not happen. Simple select from USER is definitely executed and if you don’t touch user.role that will be all. But you want those roles and you want to display them – for instance when you iterate the results. Each such access executes another select from ROLE where user_id = iterated user’s id. This is the infamous N+1 select problem (although the order is rather 1+N :-)). It is extremely bad for selects with large result sets, while for pages N has reasonable limits. But still – no reason to do additional 25 selects instead of… how many actually?

Left join for the help?

Were the mapping @…ToOne, then you could just left join this entity (e.g. user’s Address assuming there is a single address needed for a user). Left join in this case does not change the number of rows in the result – even for optional addresses, whereas inner join would omit users without addresses (also surprisingly common error for programmers who try to avoid SQL for no good reasons).

Left join for *ToOne relations is good solution for both paginated and non-paginated results.

But for *ToMany paginating and left join don’t go well together. The problem is that SQL can only paginate rows of the resulting join, it cannot know the proper offset and limit to select for instance second user only. Here we want page 2 with page size 2:

List<User> result = new JPAQuery(em).from(u)
    .where(...)
    .leftJoin(u.roles).fetch()
    .offset(2).limit(2)
    .list(u);

The result is delivered in a single select – but what a result…

We definitely don’t want User1 with a single RoleC and User2 with RoleA. That is not only not the page we want (we should get User3 on the final page 2), but it’s just plain disinformation (aka lie).

(Don’t) Use Eager!

For most cases EAGER mapping for *ToMany is not a good default. But in case you’re asking what will happen… JPA provider at least knows at the time of select from USER that we want the ROLEs too. EclipseLink performs N additional selects to get users’ roles although it utilizes any cached data possible. Actually I tested it with cache turned off with these lines in persistence.xml:

<property name="eclipselink.query-results-cache" value="false"/>
<property name="eclipselink.cache.shared.default" value="false"/>
<property name="eclipselink.cache.size.default" value="0"/>
<property name="eclipselink.cache.type.default" value="None"/>

Cache can bring some relief, but it actually just obscures the problem and may mislead you that the problem is not there. Especially if you create the data using JPA just before the test. 🙂

In case of LAZY JPA provider has to wait when we touch the role list – and it always loads only the one that is needed. But for our page we need to check all of them, so we will end up with N selects again. Provider has no way to optimize for this, not to mention that you may end up with LazyInitializationException (your fault) unless you use OSIV (open session in view), which I consider bad design after my previous experiences with it.

If provider knows that you want the roles upfront (EAGER) it may optimize, but I would not count on that. EclipseLink does not, and you simply cannot know. So let’s just do it ourselves, shall we? And switch that EAGER back to LAZY, which is default, so you can just as well remove the fetch parameter altogether.

1+1 select, no problem

We need to get the proper page of users and then load their roles. There are two ways how to do this:

  1. if you have reverse mappings (role.users) then you can get List<User> and then select roles containing these roles;
  2. if there is no reverse mappings, you list users first and then again with left join to roles – again only where user is in the first list.

In both cases list of user IDs may be enough. Because our role does not have reverse mapping (I don’t like it that much for @ManyToMany relations) we end up with code like this:

List<Integer> uids = find.queryFrom(u)
    .where(...)
    .orderBy(u.login.asc())
    .offset(2).limit(2)
    .list(u.id);
List<GuiUser> users = find.queryFrom(u)
    .where(u.id.in(uids))
    .leftJoin(u.guiRoles).fetch()
    .orderBy(u.login.asc())
    .distinct()
    .list(u);

What you display is the result of the second query, obviously. There are two important points here though:

  • You have to repeat the order. While in the first select it is essential for paging (paging without order is meaningless, right?) in the second case you could do it in memory, but there is no reason why, as you let DB to order just a single page and you save Java lines of code. Not to mention Java possibly (and quite probably!) using different collator… this really is not worth it.
  • You have to use distinct() before the list() of the second query. This not only adds distinct to the select, but also instructs Querydsl or JPA (really don’t know who does the job, sorry :-)) to give you the list of distinct users, not list of size of that joined select. Such distinct is required in most leftJoins on *ToMany entities (unless you create projection results for them).

1+1 for OneToMany

Now let’s pretend we have different situation – some @OneToMany case, like an Invoice and its Items. Item will have invoice attribute and Invoice will have @OneToMany items mapped by Item.invoice – like this:

@OneToMany(<b>mappedBy = "invoice"</b>)
private Set<Item> items;

And now the select part:

QInvoice i = QInvoice.invoice;
Map<Integer, Invoice> invoices = find.queryFrom(i)
    .where(...)
    .orderBy(i.id.asc())
    .offset(2).limit(2)
    .map(i.id, i);
// this performs items = new HashSet<>() for each invoice
invoices.values().forEach(Invoice::initItems);

QItem ii = QItem.item;
List<Item> items = find.queryFrom(ii)
    .where(ii.invoice.in(invoices.values()))
    .list(ii);
for (Item item : items) {
    // addItem is always nicer than getItems().add(item)
    invoices.get(item.getInvoice().getId()).addItem(item);
}

Always check what selects are generated in the end. In my case ii.invoice.in performs no additional join and puts ids into the IN list – and that’s how it should be.

Also note the usage of map(i.id, i) – this returns LinkedHashMap, so the order is preserved (cool!) and makes getting the invoice for item much easier. Difference in select is just one redundant id which is no problem at all.

Conclusion

We demonstrated the essential problem with pagination of one/many-to-many joins and checked two solutions that are both based on one additional select. These solutions are not necessary if pagination is not required, obviously. There may be other ways to do it, there are even some extensions for specific JPA providers (like @BatchFetch for EclipseLink), but I’d prefer this for couple of reasons:

  • Full control over selects – well, at least as full as you can get with Querydsl over JPQL. You can select into DTOs, you can enumerate required columns – there is much more you can do around the basic idea. (Funny. Just minutes after writing this sentence I travelled home from work, reading neat little book SQL Performance Explained – and coincidentally I was going through the part about Joins and Nested Loops. Markus Winard provided also short analysis of ORMs in various languages, how they perform joins and what you can do about it. The final tip was pretty clear about it: Get to know your ORM and take control of joins.)
  • It scales! There is no N. As long as your pagination is reasonably fast (but this is separate problem and you have to do what you have to do), second select goes after IDs, which should be pretty fast anyway.
  • It is unrelated to any specific features of any specific JPA provider.
  • And after all – you have full control over your contract. Don’t leave presentation layer trigger your selects. Even better – close that persistence context after leaving the service layer.

The only problem you should be aware of is limitation about amount of expressions in the IN clause. Check it out, read documentation (both for your JPA provider and RDBMS), experiment, find out what the maximum page size will be – and (obviously) cover it in your tests (at least when it crashes the first time ;-)).

The most important takeaway from this post should be: 1) EAGER is not a solution, 2) result of the left join with “to-many” cardinality cannot be reasonably paginated in SQL. We’re not talking about specific solutions here – hell, clever DB guys can even provide you procedures delivering you the right page plus total count of all records – all in a single call to DB. But that’s beyond basic SQL/JPA. 🙂

Converting Java enums to values and back

You know the stuff, you have some enums that logically wraps for instance integer constants. Typical case is numeric DB column mapped to enum. We all know that using EnumType.ORDINAL is one of those worse solutions. And EnumType.STRING is not really loved by our DB admins (if you have them) – even though it’s easy to read for sure. 🙂 Normalization suffers, enum constant renaming is a bit problem too. So we want to map it to numeric (for instance) which would be Integer in JPA entity.

This mapping does not necessarily related only to JPA, there are many cases when you want to convert enum to something and back. For instance, with generated web-services I often map generated enum to my “business” one and if it suits me (as it is not always clean) I embed this mapping into my enum (not the generated one of course :-)). So I map enum to another related enum and back. But we will work with JPA example. JPA 2.1 with AttributeConverter<X,Y> to be precise. You can do something similar with Hibernate converters too, I’m sure.

This blog post is accompanied by this GitHub project containing all the variations. Copy and paste and use whatever however you like.

Naive approach

Before going on, know that you can check all examples in this GitHub repository. First case can be found in this location. We have our simple enum:

public enum SomeEntityType {
	NORMAL,
	SPECIAL
}

And we have our entity that is mapping numeric column to our enum:

@Entity
public class SomeEntity {
	@Id private Integer id;

	@Convert(converter = SomeEntityTypeConverter.class)
	@Column(name = "type")
	private SomeEntityType type;
}

This entity will be the same throughout all of our solutions, so we will not repeat it. Important line is the one with @Convert annotation that finally allows us to do conversion (JPA 2.1, part of Java EE 7). All we have to do now is to implement that converter:

import javax.persistence.AttributeConverter;

/**
 * This is coupled too much to enum and you always have to change both classes in tandem.
 * That's a big STOP (and think) sign in any case.
 */
public class SomeEntityTypeConverter implements AttributeConverter<SomeEntityType, Integer> {
	@Override
	public Integer convertToDatabaseColumn(SomeEntityType someEntityType) {
		switch (someEntityType) {
			case NORMAL: return 0;
			case SPECIAL: return 1;
			default:
				// do we need this? if for nothing else it catches forgotten case when enum is modified
				throw new IllegalArgumentException("Invalid value " + someEntityType);
				// actually the value is valid, just this externalized switch sucks of course
		}
	}

	@Override
	public SomeEntityType convertToEntityAttribute(Integer dbValue) {
		switch (dbValue) {
			case 0: return SomeEntityType.NORMAL;
			case 1: return SomeEntityType.SPECIAL;
		}
		// now what? probably exception would be better just to warn programmer
		// but if it happens in production, it doesn't really matter if it's here or NPE later
		return null;
	}
}

Ok, so I revealed the problems in the comments. There may be just one reason to do it this way. If you need the enum independent from that dbValue mapping – but I really don’t know why. On the contrary, you should go for cohesive solution. And in this case you will be just fine with encapsulation. Put the stuff that changes into one place – the enum.

Encapsulated conversion

You need to implement AttributeConverter, because it is kind of glue between JPA and your class. But there is no reason to leave the actual mapping in this infrastructure class. So let’s enhance the enum to keep the converter simple. Converter I want to see looks like this:

public class SomeEntityTypeConverter implements AttributeConverter<SomeEntityType, Integer> {
	@Override
	public Integer convertToDatabaseColumn(SomeEntityType someEntityType) {
		return someEntityType.getDbValue();
	}

	@Override
	public SomeEntityType convertToEntityAttribute(Integer dbValue) {
		// this can still return null unless it throws IllegalArgumentException
		// which would be in line with enums static valueOf method
		return SomeEntityType.fromDbValue(dbValue);
	}
}

Much better, we don’t have to think about this class at all when we add values to enum. The complexity is now here, but it’s all tight in a single class:

public enum SomeEntityType {
	NORMAL(0),
	SPECIAL(1);

	// ENUM -> DB VALUE CONVERSION
	// first part is easy and most programmers get here (Effective Java 101 after all)
	// fields are not final implicitly, but we better make them
	private final Integer dbValue;

	private SomeEntityType(Integer dbValue) {
		this.dbValue = dbValue;
	}

	public Integer getDbValue() {
		return dbValue;
	}

	// DB VALUE -> ENUM CONVERSION
	// static reverse resolving:
	public static final Map<Integer, SomeEntityType> dbValues = new HashMap<>();

	static {
		for (SomeEntityType value : values()) {
			dbValues.put(value.dbValue, value);
		}
	}

	public static SomeEntityType fromDbValue(Integer dbValue) {
		// this returns null for invalid value, check for null and throw exception if you need it
		return dbValues.get(dbValue);
	}
}

I saw also some half-solutions without the static reverse resolving, but I hope we all agree it goes into the enum. If it’s two value enum, you may start with switch in fromDbValue, but that’s just another thing to think about – and one static map will not kill anyone. You can find this solution here on GitHub.

Now this works, so let’s imagine we need this for many enums. Can we find some common ground here? I think we can.

Conversion microframework

Let’s say we want to have order in these things. We will require the method named toDbValue, etc. So our enums will implement interface ConvertedEnum:

/**
 * Declares this enum as converted into database, column value of type Y.
 *
 * In addition to implementing {@link #toDbValue()} converted enum should also
 * provide static method for reverse conversion, for instance {@code X fromDbValue(Y)}.
 * This one should throw {@link IllegalArgumentException} just as {@link Enum#valueOf(Class, String)} does.
 * Check {@link EnumAttributeConverter} for helper methods that can be used during reverse conversion.
 */
public interface ConvertedEnum<Y> {
	Y toDbValue();
}

It’s parametrized, hence flexible. Javadoc says it all – we can’t enforce the static stuff, because that’s how Java works. While I suggest that reverse fromDbValue should throw IllegalArgumentException, I’ll leave it to return null for now – just know that I’m aware of this. 🙂 I’d personally go strictly for IAE, but I’ll show how we can use null in conversion.

What are the changes in the enum? Minimal really, just add implements ConvertedEnum<Integer> and you can add @Override over toDbValue method. Not worth the listing as it’s all here on GitHub anyway.

Now to utilize all this we need a base class for AttributeConverter – here it goes:

/**
 * Base implementation for converting enums stored in DB.
 * Enums must implement {@link ConvertedEnum}.
 */
public abstract class EnumAttributeConverter<X extends ConvertedEnum<Y>, Y>
	implements AttributeConverter<X, Y>
{
	@Override
	public final Y convertToDatabaseColumn(X x) {
		return x.toDbValue();
	}

// you can end here, or you can add handy stuff if necessary
	public X notNull(X x) {
		return notNull(x, null);
	}

	public X notNull(X x, Y dbValue) {
		if (x == null) {
			throw new IllegalArgumentException("No enum constant" + (dbValue != null ? (" for DB value " + dbValue) : ""));
		}
		return x;
	}

	public X withDefault(X x, X defaultValue) {
		return x != null ? x : defaultValue;
	}
}

As you can see, only convertToDatabaseColumn is mandatory here. Here is our concrete converter:

public class SomeEntityTypeConverter extends EnumAttributeConverter<SomeEntityType, Integer> {
	@Override
	public SomeEntityType convertToEntityAttribute(Integer integer) {
		return notNull(SomeEntityType.fromDbValue(integer));
	}
}

What a beauty suddenly! I also used notNull to enforce IllegalArgumentException (unless you do it on enum level), or I could use withDefault to fallback to some default value – if it makes sense of course (better when it doesn’t :-)).

Refinement?

And now the last push. What is repeating? And what we can do about it?

  • It would be cool to have just a single converter class – but this is impossible, because there is no way how to instruct the converter about its types. Especially method convertToEntityAttribute is immune to any approach because there is nothing during runtime that can tell you what the expected enum type would be. No reflection or anything helps here, so it seems.
  • So we have to have separate AttributeConverter classes, but can we pull convertToEntityAttribute into our EnumAttributeConverter? Not easily really, but we’ll try something.
  • How about the static resolving? Can we get rid of that static initialization block? It is static, so it seems problematic – but indeed, we can do something about it.

Let’s try to hack our converters first. We need to get the type information into the instance of the superclass. It can be protected field like this:

public abstract class EnumAttributeConverter<X extends ConvertedEnum<Y>, Y>
	implements AttributeConverter<X, Y>
{
	protected Class<X> enumClass;

And subclass would initialize it this way:

public class SomeEntityTypeConverter extends EnumAttributeConverter<SomeEntityType, Integer> {
	{
		enumClass = SomeEntityType.class;
	}

But this is not enforced in any way! Rather use abstract method that must be implemented. In abstract converter:

	protected abstract Class<X> enumClass();

And in concrete class:

public class SomeEntityTypeConverter extends EnumAttributeConverter<SomeEntityType, Integer> {
	@Override
	protected Class<SomeEntityType> enumClass() {
		return SomeEntityType.class;
	}
}

But we’re back to 3 lines of code (excluding @Override) and we didn’t get to ugly unified convertToEntityAttribute:

	@Override
	public X convertToEntityAttribute(Y dbValue) {
		try {
			Method method = enumClass().getMethod("fromDbValue", dbValue.getClass());
			return (X) method.invoke(null, dbValue);
		} catch (IllegalAccessException | InvocationTargetException | NoSuchMethodException e) {
			throw new IllegalArgumentException("...this really doesn't make sense", e);
		}
	}

Maybe I missed something on the path, but this doesn’t sound like good solution. It would be if it lead to unified converter class, but it is not. There may be one more problem with hunt for unified solution. While the concrete implementation contains methods that have concrete parameter and return types, unified abstract implementation don’t. They can use the right types during runtime, but the method wouldn’t tell you if you used reflection. Imagine JPA checking this. Right now I know that unified public final Y convertToDatabaseColumn(X x) {…} works with EclipseLink, but maybe we’re asking for problems. Let’s check it really:

// throws NoSuchMethodException
// Method method = converter.getClass().getMethod("convertToDatabaseColumn", Integer.class);
Method method = converter.getClass().getMethod("convertToDatabaseColumn", Object.class);
System.out.println("method = " + method);

// prints:
method = public java.lang.Object com.github.virgo47.enumconv._4evenmore.EnumAttributeConverter.convertToDatabaseColumn(java.lang.Object)

Great… so if something strictly matches the method types with column/enum types, we may have a misunderstanding. Sometimes too smart may be just that – too smart. Check this overboard solution here on GitHub.

Simplified static resolving

Anyway, let’s look into that enum’s static resolving. This is actually really useful. Without further ado, this is how enum part may look like:

	// static resolving:
	public static final ConvertedEnumResolver<SomeEntityType, Integer> resolver = new ConvertedEnumResolver<>(SomeEntityType.class);

	public static SomeEntityType fromDbValue(Integer dbValue) {
		return resolver.get(dbValue);
	}

So we got rid of static initializer. But the code must be somewhere:

/**
 * Helps reverse resolving of {@link ConvertedEnum} from a DB value back to enum instance.
 * Enums that can be resolved this way must have unified interface in order to obtain
 * {@link ConvertedEnum#toDbValue()}.
 *
 * @param <T> type of an enum
 * @param <Y> type of DB value
 */
public class ConvertedEnumResolver<T extends ConvertedEnum<Y>, Y> {

	private final String classCanonicalName;
	private final Map<Y, T> dbValues = new HashMap<>();

	public ConvertedEnumResolver(Class<T> enumClass) {
		classCanonicalName = enumClass.getCanonicalName();
		for (T t : enumClass.getEnumConstants()) {
			dbValues.put(t.toDbValue(), t);
		}
	}

	public T get(Y dbValue) {
		T enumValue = dbValues.get(dbValue);
		if (enumValue == null) {
			throw new IllegalArgumentException("No enum constant for dbValue " + dbValue + " in " + classCanonicalName);
		}
		return enumValue;
	}
}

And this I actually really like. Here I went for strict checking, throwing exception. Without it, or without needing/wanting the type name it could be even shorter. But you write this one once and save in each enum. Check the complete sources here on GitHub.

So there are three players in this “framework”:

  • ConvertedEnum – interface for your enums that are converted in this way.
  • ConvertedEnumResolver – wraps the reverse mapping and saves you most of the static lines in each converted enum.
  • EnumAttributeConverter – maybe the most questionable part here. It takes care of one direction of the conversion, just be aware of potential problems if something introspect the method types.

Alternative conversions with entities

While not exactly on the topic of enum to value mapping (and back), we were demonstrating this all in context of JPA 2.1. So I’d like to at least mention alternative solutions for older JPA versions.

  • You can always use vendor specific extensions. Hibernate has its Custom types and I’m sure EclipseLink doesn’t fall behind.
  • But there is also possibility to use mapping annotations on properties. That is put your @Id on the getter, or specify AccessType.PROPERTY. This allows you to convert anything to anything in getter/setter. I even used it to back Dates by long (or Long if nullable). This way various tools (like Sonar) didn’t complain about silly mutable Date breaking the encapsulation when used directly in get/set methods, because it was not stored directly anymore. Hibernate used get/set for Date, and I had @Transient get/set for millis long available. I actually liked comparing millis more than before/after methods on Date, but that’s another story. The same can be used for mapping enums – just have JPA compatible type mapped for JPA and @Transient get/set with enum type. Most of the stuff about enum encapsulation and static resolving still applies.

I hope you liked our tour around enums. If you knew or used this before, good for you. I’m surprised how many programmers don’t try to design and refactor these little things where you don’t even need to think in UML.

And if you have better solutions, share them in comments, please. 🙂