How I unknowingly deviated from JPA

In a post from January 2015 I wrote about possibility to use plain foreign key values instead of @ManyToOne and @OneToOne mappings in order to avoid eager fetch. It built on the JPA 2.1 as it needed ON clause not available before and on EclipseLink which is a reference implementation of the specification.

To be fair, there are ways how to make to-one lazy, sure, but they are not portable and JPA does not assure that. They rely on bytecode magic and properly configured ORM. Otherwise lazy to-one mapping wouldn’t have spawned so many questions around the internet. And that’s why we decided to try it without them.

Total success

We applied this style on our project and we liked it. We didn’t have to worry about random fetch cascades – in complex domain models often triggering many dozens of fetches. Sure it can be “fixed” with second-level cache, but that’s another thing – we could stop worrying about cache too. Now we could think about caching things we wanted, not caching everything possibly reachable even if we don’t need it. Second-level cache should not exist for the sole reason of making this flawed eager fetch bearable.

When we needed a Breed for a Dog we could simply do:

Breed breed = em.find(Breed.class, dog.getBreedId());

Yes, it is noisier than dog.getBreed() but explicit solutions come with a price. We can still implement the method on an entity, but it must somehow access entityManager – directly or indirectly – and that adds some infrastructure dependency and makes it more active-record-ish. We did it, no problem.

Now this can be done in JPA with any version and probably with any ORM. The trouble is with queries. They require explicit join condition and for that we need ON. For inner joins WHERE is sufficient, but any outer join obviously needs ON clause. We don’t have dog.breed path to join, we need to join breed ON dog.breedId = breed.id. But this is no problem really.

We really enjoyed this style while still benefiting from many perks of JPA like convenient and customizable type conversion, unit of work pattern, transaction support, etc.

I’ll write a book!

Having enough experiences and not knowing I’m already outside of JPA specification scope I decided to conjure a neat little book called Opinionated JPA. The name says it all, it should have been a book that adds a bit to the discussion about how to use and tweak JPA in case it really backfires at you with these eager fetches and you don’t mind to tune it down a bit. It should have been a book about fighting with JPA less.

Alas, it backfired on me in the most ironic way. I wrote a lot of material around it before I got to the core part. Sure, I felt I should not postpone it too long, but I wanted to build an argument, do the research and so on. What never occurred to me is I should have tested it with some other JPA too. And that’s what is so ironic.

In recent years I learned a lot about JPA, I have JPA specification open every other day to check something, I cross reference bugs in between EclipseLink and Hibernate – but trying to find a final argument in the specification – I really felt good at all this. But I never checked whether query with left join breed ON dog.breedId = breed.id works in anything else than EclipseLink (reference implementation, mind you!).

Shattered dreams

It does not. Today, I can even add “obviously”. JPA 2.1 specification defines Joins in section 4.4.5 as (selected important grammar rules):

join::= join_spec join_association_path_expression [AS] identification_variable [join_condition]
join_association_path_expression ::=
  join_collection_valued_path_expression |
  join_single_valued_path_expression |
  TREAT(join_collection_valued_path_expression AS subtype) |
  TREAT(join_single_valued_path_expression AS subtype)
join_spec::= [ LEFT [OUTER] | INNER ] JOIN
join_condition ::= ON conditional_expression

The trouble here is that breed in left join breed does not conform to any alternative of the join_association_path_expression.

Of course my live goes on, I’ve got a family to feed, I’ll ask my colleagues for forgiveness and try to build up my professional credit again. I can even say: “I told myself so!” Because the theme that JPA can surprise again and again is kinda repeating in my story.

Opinionated JPA revisited

What does it mean for my opinionated approach? Well, it works with EclipseLink! I’ll just drop JPA from the equation. I tried to be pure JPA for many years but even during these I never ruled out proprietary ORM features as “evil”. I don’t believe in an easy JPA provider switch anyway. You can use the most basic JPA elements and be able to switch, but I’d rather utilize chosen library better.

If you switch from Hibernate, where to-one seems to work lazily when you ask for it, to EclipseLink, you will need some non-trivial tweaking to get there. If JPA spec mandated lazy support and not define it as mere hint I wouldn’t mess around this topic at all. But I understand that the topic is deeper as Java language features don’t allow it easily. With explicit proxy wrapping the relation it is possible but we’re spoiling the domain. Still, with bytecode manipulation being rather ubiquitous now, I think they could have done it and remove this vague point once for all.

Not to mention very primitive alternative – let the user explicitly choose he does not want to cascade eager fetches at the moment of usage. He’ll get a Breed object when he calls dog.getBreed(), but this object will not be managed and will contain only breed’s ID – exactly what user has asked for. There is no room for confusion here and at least gives us the option to break the deadly fetching cascade.

And the book?

Well the main argument is now limited to EclipseLink and not to JPA. Maybe I should rename it to Opinionated ORM with EclipseLink (and Querydsl). I wouldn’t like to leave it in a plane of essay about JPA and various “horror stories”, although even that may help people to decide for or against it. If you don’t need ORM after all, use something different – like Querydsl over SQL or alternatives like JOOQ.

I’ll probably still describe this strategy, but not as a main point anymore. Main point now is that JPA is very strict ORM and limited in options how to control its behavior when it comes to fetching. These options are delegated to JPA providers and this may lock you to them nearly as much as not being JPA compliant at all.

Final concerns

But even when I accept that I’m stuck to EclipseLink feature… is it a feature? Wouldn’t it be better if reference implementation strictly complained about invalid JPQL just like Hibernate does? Put aside the thought that Hibernate is perfect JPA 2.1 implementation, it does not implement other things and is not strict in different areas.

What if EclipseLink reconsiders and removes this extension? I doubt the next JPA will support this type of paths after JOINs although that would save my butt (which is not so important after all). I honestly believed I’m still on the standard motorway just a little bit on the shoulder perhaps. Now I know I’m away from any mainstream… and the only way back is to re-introduce all the to-one relations into our entities which first kills the performance, then we turn on the cache for all, which hopefully does not kill memory, but definitely does not help. Not to mention we actually need distributed cache across multiple applications over the same database.

In the most honest attempt to get out of the quagmire before I get stuck deep in it I inadvertently found myself neck-deep already. ORM indeed is The Vietnam of Computer Science.

JPA – is it worth it? Horror stories with EclipseLink and Hibernate

One friend of mine brought Hibernate to our dev-team back in 2004 or so. But now he uses something much simpler and avoids ORM/JPA whenever possible. He can, because he is mostly master of his projects.

I had to get more familiar with JPA on my path. There is always more and more to learn about it. When you discover something like orphanRemoval = true on @OneToMany, it may bring you to the brink of crying. Because of happiness of course. (Stockholm Syndrome probably.) But then there are other days, when you just suffer. Days when you find bugs in JPA implementations – or something close to them. And you actually can’t choose from so many providers, can you? There are just two mainstream players.

Right now we are using EclipseLink and there are two bugs (or missing features or what) that kinda provoked our big effort to switch to Hibernate. What were our problems and what was the result of our switch? Read on…

EclipseLink and Java 8

You can avoid this problem, although you have to be extra careful. EclipseLink’s lazy list – called IndirectList – cannot be used with streams. Actually – if it didn’t compile, it would be good. Worse is it works but very badly.

Guys creating Java do their best to make streams work effortlessly, there are those default methods in interfaces, etc. But no, no luck here. IndirectList returns empty stream. Why? Because of this bug. (It works fine for IndirectSet.)

What is the cause of this problem? Well… someone decided to extend Vector, but instead of sticking with it, they also decided that the extended Vector will not be the actual backing collection, but they added their Vector delegate too. This works fine for interfaces like List, but not so for extended things. Now add “ingenious” design of Vector – full of protected fields you have to take care of… here IndirectList clearly collided with Vector badly.

Why do you need to take care of protected fields? Because creating the stream on Vector uses its special java.util.Vector.VectorSpliterator, that uses these protected values. So if you delegate to other Vector, you either delegate java.util.Vector#spliterator too (but then it won’t compile with earlier Java) or – preferably – don’t delegate to other vector and use that extended one as your backing collection. Of course, guys at Java might have used size() and get(int) instead of accessing Vector’s protected elementCount, elementData, etc. – but that would not be as efficient.

BTW: Anybody here likes Vector? I personally hate Java’s wannabe compatibility that really hurts its progress as we need to drag more and more burden with us.

JPA and counts on joins with distinct

The other trouble was related to a specific select where we used join with distinct instead of subquery with exists. We have an entity called Client that can be bound to any number of Domains, @ManyToMany mapping is used on Client side. Domain does not know about Client – hence the arrow.

jpa-client-domains

We often need to obtain Clients where any of their domains is in a provided list (actual input parameter contains IDs of those domains). Using Querydsl, it looks like this:

QDomain cd = new QDomain("cd");
JPAQuery query = new JPAQuery(em).from(QClient.client)
    .join(QClient.client.domains, cd)
        .on(cd.id.in(domainIds))
    .distinct();

I’m sure you can easily transform it mentally to Criteria or JPQL – Querydsl produces JPQL, BTW. Distinct here has a special meaning that is used by JPA specification. Now you can call query.list(QClient.client) – it produces nice join and returns only distinct Clients. You try query.count(), and it works as expected as well.

But imagine that Client has composite primary key. We actually have exactly the same case, an entity bound again to domains, everything looks the same – just that entity has composite PK. List works alright, but if you try query.count() you’ll get completely different illogical query using exists and the join to Domains is lost completely. Results are wrong, definitely not the size of the list result.

There are some bugs filed for this – for instance this one. Because there was a discussion about this behavior we decided to find out how Hibernate treats this.

Spoiler:

The trouble is that it is not the bug after all – at least not according to JPA 2.1specification which reads in section 4.8.5: The use of DISTINCT with COUNT is not supported for arguments of embeddable types or map entry types.

I found this today when I was gathering proofs for this post. Our story however would unfold in different direction. Are we using JPA? Yes. Are there other options? Yes!

Spoiler 2:

We used subselect with exists in the end, but that’s not the point. 😉

So we switch to Hibernate, right?

We got kinda carried away by some bug reports resolved as FIXED – like this one or the ones you can see in its links. But switching JPA provider isn’t that easy as promised. In the end we found out that count distinct for entities with composite PKs doesn’t work with SQL Server anyway. But we learned a lot of interesting things about how far both implementations are apart when it comes to translating JPQL to SQL. Mind you, I’m not actually sure whether everything we wrote in Querydsl (that generates JPQL) is 100% correct, but then it should say something. When it works I expect it to work after changing a JPA implementation.

Hibernate screwing deletes with “joins”

We have a lot of delete clauses that worked just fine in EclipseLink:

new JPADeleteClause(em, QGuiRolePermission.guiRolePermission)
    .where(QGuiRolePermission.guiRolePermission.guiRole.description.eq(TEST_DESCRIPTION))

This essentially means “delete all role permission assignments for role with specified description”. Piece-a-cake, right? We’re using implicit join there again, but it is the same story all over again – and it worked in EclipseLink just fine. EclipseLink creates proper SQL with exists:

DELETE FROM GUI_RolePermissions WHERE EXISTS(
  SELECT t1.gui_role_id FROM GUI_Roles t0, GUI_RolePermissions t1
    WHERE ((t0.description = ?) AND (t0.id = t1.gui_role_id)) AND t1.gui_role_id = GUI_RolePermissions.gui_role_id
      AND t1.domain_id = GUI_RolePermissions.domain_id AND t1.permission_id = GUI_RolePermissions.permission_id)

It is not perfect – and we get back to it in section Random Query Generator – but it works. Let’s just compare it to Hibernate now. This is JPQL (actually this is the same for both providers as it’s produced by Querydsl):

delete from GuiRolePermission guiRolePermission
  where guiRolePermission.guiRole.description = ?1

This does not seem alarming – but the SQL is completely off:

delete from GUI_RolePermissions cross join GUI_Roles guirole1_ where description=?

This does not work on our currently used SQL Server, whatever dialect we choose. Why not go with exists? We have helper method that takes entity (Querydsl base path) and its where condition and performs delete. Now instead of:

deleteTable(QGuiRolePermission.guiRolePermission,
    QGuiRolePermission.guiRolePermission.guiRole.description.eq(TEST_DESCRIPTION));

We have to write this:

deleteTable(QGuiRolePermission.guiRolePermission,
    new JPASubQuery().from(QGuiRolePermission.guiRolePermission)
    .where(QGuiRolePermission.guiRolePermission.guiRole.description
        .eq(DataPreparator.TEST_DESCRIPTION))
    .exists());

Not terrible… but why? EDIT 2016-01-07: There is an issue filed for this, so far not fixed.

Another Hibernate’s twist

Things may get complicated when more relationships are involved. Take this simple JPQL for instance:

delete from Security security
  where security.issuer.priority = ?1

We want to remove all Securities with issuer (Client) having specific priority value:

jpa-security-delete

There is another implicit join there, but one subquery with exists should cover it. Security class contains @ManyToMany relationships to Domain class through intermediate table Securities_Domains. That’s why we need two deletes here – and this is what EclipseLink generates (issuer is of class Client):

DELETE FROM Securities_Domains WHERE EXISTS(
  SELECT t1.id FROM "Clients" t0, Securities t1
    WHERE ((t0."priority" = ?) AND (t0."id" = t1."issuer_id"))
      AND t1.id = Securities_Domains.security_id);
DELETE FROM Securities WHERE EXISTS(
  SELECT t1.id FROM "Clients" t0, Securities t1
    WHERE ((t0."priority" = ?) AND (t0."id" = t1."issuer_id"))
      AND t1.id = Securities.id);

It works just fine. But Hibernate shows its muscles really strong here!

delete from Securities_Domains where (security_id) in (
   select id from Securities where priority=?)

Obviously right the first delete is missing join to the Client entity in that subselect – and fails spectacularly. And we’re actually lucky we don’t have any other column called priority on Securities as well. 🙂 That could hide the error for ages.

With or without id?

Love that song… I mean the U2 one, not the Hibernate’s one. When you see JPQL with equality test on two entities you assume it is performed on their ids. So if you actually specify those ids, it should be still the same, right? Maybe it’s just some JPA myth after all. Consider this JPQL:

delete from MetaObjectSetup metaObjectSetup
  where not exists (select 1 from Permission permission
    where permission.metaObjectSetup = metaObjectSetup)

This produces query that does not work properly, probably taking last id from permission1_.

delete from META_Object_Setups where  not (exists (
  select 1 from Permissions permission1_
    where permission1_.meta_object_setup_id=id))

Version working with Hibernate must perform eq on id fields explicitly (EclipseLink doesn’t mind either):

deleteTable(QMetaObjectSetup.metaObjectSetup,
   new JPASubQuery().from(QPermission.permission)
    .where(QPermission.permission.metaObjectSetup.id
        .eq(QMetaObjectSetup.metaObjectSetup.id))
    .notExists());

SQL result:

delete from META_Object_Setups where  not (exists (
  select 1 from Permissions permission1_
    where permission1_.meta_object_setup_id=META_Object_Setups.id))

Experiences like these were the last drop for us and we reverted all the efforts to start using Hibernate.

Happy ending?

We switched back to EclipseLink after this. I don’t remember so much resistance from Hibernate like… ever. Maybe our JPQLs were too loose for it, joins were not explicit, aliases were missing, etc. But in the end it did not solve our problem.

It is really shame that count and distinct are not possible in the way making using query.list and query.count operations consistent in libraries like Querydsl. It is also shame that when it is not supported (as officially stated in the specification) it does not throw exception and does something fishy instead – silently.

You can do the same in SQL wrapping the select into another one with count – but JPQL does not support queries in the FROM clause. Pitty. However this is one of those cases when you can’t go wrong with correlated subquery. You just have to remember that subquery does not imply any equality implied by JPA joins (it can’t actually, there are cases when this would be an obstacle) and you have to do it yourselves – see the last examples from previous part (With Or Without Id).

This is all really crazy. Remotely, it reminds me horror stories about JSF and bugs of its various implementations (always different). Sure, things are really complicated, but then maybe the trouble is they are. Maybe it can be simpler. Maybe it’s wrong that I have to define @ManyToOne to be able to express joins. (Edit: Not true for JPA 2.1 as fixed in this post.) Any @XToOne has consequences often unseen even by experienced JPA users. Maybe some jOOQ or Querydsl over plain SQL is better than this stuff. I just don’t know…

Random Query Generator in EclipseLink

Let’s look at something else while we stick with JPA. Here we are back to Eclipse and there is one very easy thing I want to do in SQL:

DELETE FROM Contacts WHERE EXISTS(
  SELECT t1.id FROM Clients t0 WHERE ((t0.priority = ?) AND t0.id = Contacts.id)

In plain wording – I want to remove all Client’s Contacts when this Client has some specific priority. If you ask for mapping… I hope there is nothing screwed here:

@Entity
@Table(name = "Clients")
public class Client {
...
    @OneToMany(mappedBy = "client", cascade = CascadeType.ALL)
    private List<ClientContact> contacts;
...
}

@Entity
@Table(name = "Contacts")
public class ClientContact {
...
    @Column(name = "client_id", insertable = false, updatable = false)
    private Integer clientId;

    @ManyToOne
    @JoinColumn(name = "client_id")
    private Client client;
...
}

Back reference to client is mapped both ways, but one mapping is read-only. Both primary keys are simple Integers. No miracles here. And now three ways we tried, always with Querydsl, JPQL and SQL:

new JPADeleteClause(em, QClientContact.clientContact).where(
    new JPASubQuery().from(QClient.client)
        .where(QClient.client.priority.eq(DataPreparator.TEST_CLIENT_PRIORITY)
            .and(QClient.client.eq(QClientContact.clientContact.client)))
        .exists());

// JPQL - logically exactly what we want
delete from ClientContact clientContact
  where exists (select client from Client client
    where client.priority = ?1 and client = clientContact.client)

// SQL contains double EXISTS and two unnecessary Clients more
DELETE FROM Contacts WHERE EXISTS(
  SELECT t0.id FROM Contacts t0 WHERE EXISTS (
    SELECT ? FROM Clients t2, Clients t1
      WHERE ((t2.priority = ?) AND (t2.id = t0.client_id)))  AND t0.id = Contacts.id)
bind => [1, 47]

Ok, it works, right? Can we help it when we mention id equality explicitly (client.id.eq(…client.id))?

new JPADeleteClause(em, QClientContact.clientContact).where(
    new JPASubQuery().from(QClient.client)
        .where(QClient.client.priority.eq(DataPreparator.TEST_CLIENT_PRIORITY)
            .and(QClient.client.id.eq(QClientContact.clientContact.client.id)))
        .exists());

// JPQL looks promising again, I bet this must generate proper query - or at least the same
delete from ClientContact clientContact
  where exists (select client from Client client
    where client.priority = ?1 and client.id = clientContact.client.id)

// Three clients?!
DELETE FROM Contacts WHERE EXISTS(
  SELECT t0.id FROM Contacts t0 WHERE EXISTS (
    SELECT ? FROM Clients t3, Clients t2, Clients t1
      WHERE (((t2.priority = ?) AND (t2.id = t3.id)) AND (t3.id = t0.client_id)))
        AND t0.id = Contacts.id)
bind => [1, 47]

You gotta be kidding me, right? What for?! I always believed that if id is entitie’s @Id, than it can do virtually the same with or without it. I even dreamt about using foreign key directly – but that’s way beyond capabilities of current JPA providers, so it seems.

Ok, let’s try something lame. Something I’d write only in JPA, not in real SQL, of course. Something with a lot of implicit identity equality hidden between the lines:

new JPADeleteClause(em, QClientContact.clientContact)
  .where(QClientContact.clientContact.client.priority
    .eq(DataPreparator.TEST_CLIENT_PRIORITY));

// not like SQL at all, but pretty logical if you know the mapping
delete from ClientContact clientContact
  where clientContact.client.priority = ?1

/* Surprise! Still one unnecessary Contacts in inner join that could go through FK,
 * but definitely the best result so far.
 * Mind though - this refused to work with Hibernate, at least with SQL Server dialect (any). */
DELETE FROM Contacts WHERE EXISTS(
  SELECT t1.id FROM Clients t0, Contacts t1
    WHERE ((t0.priority = ?) AND (t0.id = t1.client_id)) AND t1.id = Contacts.id)
bind => [47]

I hope this demonstrates the impotence of this field of human activity after more than a decade of effort. I’m not saying it all sucks – but after seeing this I’m not far from that.

With JPA doing too many things you don’t want or expect (unless you’re an expert) and many limitations compared to SQL I’d expect at least semi-good SQL. This is not close really.

Ramble On

So that’s another day with JPA. With proper query (or best approximation when it’s not critical) there is just one last thing we have to pay attention to… those IndirectLists that can’t be streamed. Just one last thing I said? Ouch… of course, we have to check our SQLs still and watch for any bugs or gray zones.

Yeah, we’re staying with JPA for a while on this project. But the next one I’ll start with metamodel directly over DB. I hope I can’t suffer more that way. 🙂 And maybe I’ll be able to do ad-hoc joins without @XToY annotations that always bring in stuff you don’t want. That’s actually one of my biggest gripes with JPA.

Many people avoid as much relation mappings as possible while they still can express their joins. Many people avoid @ManyToMany and map the association table explicitly, so they can reach all entities A for B’s id (or list of ids) – with a single join. Otherwise EclipseLink stubbornly joins all three tables as it’s understanding of PKs and FKs is obviously lacking.

Lessons learned? Not that much actually. But we’re definitely not switching provider for another couple of months at least!

Live architecture with Java, Spring, JPA and OSIV

This post is about an architecture where live (attached) JPA objects are used in the presentation layer. You can expect OSIV (Open Session In View) pattern mentioned, though I’ll focus more on ways how we made it work well enough for us – safely and without LIEs (LazyInitializationException). It is just my story with my experiences, no big discovery here. 🙂

I can’t tell if it is any official name, but we call it “Live architecture” because live JPA entities are available in the presentation layer. While we use it with Spring/Wicket mostly, it is the same with any other presentation framework – and probably applies to JavaEE without Spring too (if you use OSIV).

DTO vs Live architecture

In our company there are “DTO guys” and “live architecture guys”. We all know DTOs (Data Transfer Object) and how to work with them, more or less. Their rise to fame came with the need of coarse-grained calls to remote EJBs and they became prominent “pattern” then. Even with local calls people use them to strictly divide layers. I used them on some projects, then not on others and then again I used them with GWT/Seam applications (never liked the idea of JPA entities being preprocessed for me and dragged all the way to the GWT application).

Everytime I start talking about “live architecture” that drags entity objects into the view there are architects who just say “that is no architecture at all”. And I say “whatever…” I remember projects where we “broke” a clean architecture (e.g. “everything must go through this facade!”) and the result was less and cleaner code, easier to understand, better performance even. Was it universal? Hell no, it wouldn’t scale in most cases, but in that particular case scaling was not (and after all those years still is not) necessary.

My recent story with the live architecture is based on a project where it was settled that it will be used instead of DTOs. You have to translate DTOs somehow from business objects and back. You can generate it, you can automate it, use reflection – or do it manually. Any way always adds something that is not necessary for all cases. Our views were mostly based on JPA entities and it was just shame to translate them to DTOs for the sake of transformation itself. I’m not saying DTOs are bad – well we use them for more complicated views, mostly for lists showing joined tables. You can of course build a view and design an entity over it – and we do it too…

There is no fundamentalism in this – we use entities as much as we can. I strongly believe that in normal scope projects people often overdo it with “clean architecture” and don’t care about “clean code” as much. And I strongly believe that cleaner code itself matters much more than that cloud castle of architecture (without underestimating the architecture itself!). After all our projects are quite simple multi-tier applications with a bit of clustering. No grid, no hi-perf, no America. So we use entities, because they are placed under the presentation layer (good dependency direction) and they only carry data. And when this is not enough, we use DTOs too. Simple.

Business logic objects and dumb entities

You may have different rules for your live architecture (projects using OSIV) – and that is fine. Ours start with don’t use entities to anything else – no business logic, maybe some simple computed properties, that is alright. You may call this Anemic Domain Model – but I don’t care. Logic is in separated objects that use one or more entities. It is not exactly DCI, but it is not very far from this. For many other reasons (unrelated to the live architecture) I prefer having business logic objects that performs specific scenario – the best case is 1-to-1 mapping with a Use case from the analysis document.

Let’s talk about this picture for a while:

Presentation layer can be anything – component (Wicket) or controller (Web MVC) driven. It calls the service layer (typically a Spring bean or EJB) and this further uses that “cloud” with various business logic objects. Very often I prefer create/use/throw-away pattern. In constructor the object gets its context and then it does something – preferably in one method call, but it may be a sequence too, although this is more fragile approach. Important thing is that business object can store its state during the business logic execution – it is thread safe if it is created locally for one service call (that’s why I don’t use singletons here). Sometimes state is not necessary, but in more complex cases it is. And I like fields much more than dragging list of parameters between private methods.

This business logic uses DAOs (or @EntityManager directly) to work with the DB – and of course works with entities in the process. Because entities are dumb (DCI idea, but not only theirs) they are perfect DTOs (that are also dumb). Of course there are some concerns about entities used as DTOs and you can find many questions about this issue (and not only in the Java world). Entities are POJOs – in theory – but you may drag some proxy object up there into the presentation layer. There is a lot of magic in entities, you sometimes don’t know what they are (my class or some modified class already?) – but under the most circumstances you don’t have to care that much really.

Best practices

Now let’s talk about our best practices. Presentation layer code knows entities, but doesn’t know ORM! This is probably the most important thing. Of course the dependency on the JPA is implied somehow. Of course client programmer has to know the data model and has to know how to traverse the objects he wants to display. But he absolutely can’t use EntityManager. Our first “live architecture” project didn’t have clear separation of these roles and some LIEs were fixed like “you know, here in this page before you call the service… put evict on this object there”. I wasn’t there when this project started, so I just went like “what?!?!” And I forbade this for the next project I could affect.

Next rule is rather about the communication than the technical one – presentation programmer always has to know what he gets from the service call. Otherwise he risks that LIE again. But LIEs in presentation are easy. They are easy to fix in model, in service/business code or in the presentation code (that is the most of the cases). You always have to share some model between business logic and presentation (and developers!) – and we share the data model itself. If you don’t plan to change your layers this is perfectly acceptable. I’ve actually never saw any change of technology that would satisfy using different model introduced on the facade level. So why to do it if you ain’t gonna need it? (Of course, you may need it – and you are there to say as an architect.)

Getting data is easy (talking about live architecture problems only :-)). You may need separate methods for every view – especially if selects are not generic enough. We have “filter beans” with single superclass and we use these beans with a few service methods (getSingleResult, getList, etc.) that are rather generic in nature. DAO-like even. It works for us, filter beans are the common ground for client and server programmer to communicate and they are part of the service layer API. We can have common FilterBean interface, because we use our custom filter framework behind. But you can use filter beans without common ancestor and have many service methods to obtain data. This is probably even cleaner.

Transactions, saves, updates

Originally we used DAO-like save on service layer too. We also didn’t have clear strategies when objects are alive and when not when the presentation layer called the service layer. If you had in one HTTP request read and write call, then the entities were alive if the write used result of the read. If you had just an update, then they were not. “Objects may come alive or not, let’s not assume that they are alive,” was our first strategy, though I didn’t feel very well about “or” used in the sentence. Never use contradictions in your assumptions. With a big help of our tests we managed to clean this mess up.

Our tests were TestNG based, they were not unit tests but mostly we tested the service layer playing the role of the presentation layer. It was funny how often the test passed and the user test (using browser) failed, but also vice-versa! Sometimes the test didn’t prepare the same environment – and we started to realize, that the service layer must assume less and be more strict. The biggest problem was that the presentation layer could change an entity A that was read in the request (hence alive) and then call service saving an entity B. The service layer had no chance to know about the A being saved in the same transaction. This lead to one very simple idea – we always clear session before calling transactional service methods. I forgot to say that we use transactions on service layer, so you can have more transactions in one HTTP request/persistence session.

Stepping back for a bit – client programmer knows that when he calls a service, his objects are alive. He can call multiple reads – and he knows that all things are still alive and he can base the next read on an attribute that is loaded lazily. In our case there is only one write/transaction called in one HTTP request – and it’s mostly the last call as well. If I wanted to make our policies even more precise I could say “always clear the session – for every service call”. This would mean less comfort for the client programmer. Or you can go for “dead” entities instead of live ones (see Other possibilities further).

Now the business programmer knows that any object that enters transactional service is detached and he can choose what to do with it. Do you need just to save the changes? Merge it (or call JPQL update, or whatever). Do you need to compare it to its original state? Read the object by its id and do what you need. Do you want to traverse its attributes? Well, better reload it first to make it attached again. We enforce this by a custom aspect that is hooked on an existing Spring @Transactional annotation.

This assumption would be very useful for read/list method too. Now the developer never knows if he has to reload or not. But read methods are not so complex and reload of the parameter entity should never harm either. Also – read/list methods are not transactional, so whatever he does, he can’t mess up with the persisted data. So this is our compromise between the client programmer using live objects and the service layer being secured enough. There is much less LIEs in our back-end code (which are harder to catch than those on the presentation layer) – actually I didn’t see one for a long time – and there is no chance to tamper with the data accidentally.

As a side note: Many of our problems were also caused by our presentation architecture – we load data, display them, then forget the content to keep page/session small and we just remember the IDs of the objects. When edit action comes, we reload the object from the service by its ID, modify it and then call the transactional write service method. To make this more convenient we have our custom ReloadableModel class for our Wicket pages, so before the model (entity obect) is to be updated, it is always reloaded from the service too (this is not a big performance hit, it often goes from the 2nd level cache anyway). This may not be very lucky solution but it was one of those we had to stick with for the time. You may or may not run into these kinds of problems. In any case, making your contracts and policies more strict and clean is always a good thing.

Other possibilities

There is not only Live vs DTO option. You can also use entities, yet always closing the session when the service call ends. This gives you the same model, less easy presentation changes, but it definitely is cleaner from the service layer point of view. You can make more strict contracts, performance is all down there and not ruined by lazy loads on the presentation layer, etc. I know this, we use this for other projects too. But I also know that people use OSIV a lot and that is why I wanted to wrap-up our experiences with it. You can come up with other policies too – for instance one read or write per request and nothing more. Do it all in one proper service call, don’t call many selects for every single combo-box model for instance. I agree with these approaches actually. But sometimes we don’t have the luxury of choice. 🙂

In any case, try to do your best to clean up the contracts as much as possible, avoid contradictory ORs in your assumptions and – I didn’t focus on this point much in this post – test your service/business layer. Contract and policy is one thing, but you have to ensure them – force them, otherwise they are not contracts, just promises. Because that is your safety net not only from the architectural standpoint, but also from the functional one. But that is a completely different story.

JPA, PostgreSQL and bytea vs. oid type

I started to work on a private project using JBoss Seam, Facelets, RichFaces (proven stack from Seam’s Booking demo :-)) with PostgreSQL 8.3 as a backing database – database I prefer for years for whatever reason. Hibernate is used as the JPA provider (packaged with Seam anyway) and the most recent JDBC 4 driver for PostgreSQL is used.

I added column:

picture bytea;

Somehow – after reading 8.4 Binary Data Types – I thought that bytea is the right guy for the job. Of course I had an entity with proper annotation and field:

    private byte[] picture;
    @Lob
    @Basic(fetch = FetchType.LAZY)
    public byte[] getPicture() {
        return picture;
    }

    public void setPicture(byte[] picture) {
        this.picture = picture;
    }

What was my surprise when following exception occurred:

java.sql.SQLException: ERROR: column "picture"
  is of type bytea but expression is of type oid

Now – when I know the solution – this message is pretty clear. It would be in case different types were mentioned in the message. While I understood that “he” somehow dislikes bytea, I couldn’t understand what the hell “oid” (as an “object identifier”) has to do with it?

I started my investigation with Google in one hand and PostgreSQL in the other. While 8.16 Object Identifier Types was not very helpful – as in “what does have ‘unsigned four-byte integer’ to do with my byte array?!” – some googled pages convinced me that oid indeed is that type I should use – mostly one from postgresql.com domain. Truth is that I didn’t understand why and how that oid type manages to store my byte array. But I’ve tried it, it stores it, it reads it, no exception is thrown.

Now I’ve finished second part of my investigation and the results are:

  • There is special Large Object facility in PostgreSQL and it’s based on using oid type that references actual large object stored in different table behind the scene.
  • JDBC documentation also mentions that you can use oid instead of bytea.
  • In that same chapter you can learn that bytea is supported since JDBC 7.2 version – that is not very long ago.
  • I don’t know if JPA/Hibernate can use bytea, but – honestly – I don’t care while this works fine. And it does.

So if you use PostgreSQL along with JPA and bytea type is throwing an exception, try oid. I’m not sure how well JDBC supports bytea, I don’t know why JPA wants to use oid and if other JPA providers (Toplink?) can use bytea. It’s not straightforward to use oid as a type for binary data – because you can learn this only in JDBC documentation and not in the mainstream PostgreSQL manual (I don’t count that chapter 31 because you’ll likely never use it while creating your tables). If you google this with along with “JPA” or “@Lob” your results might not lead you to proper articles quickly enough… so hopefully this post might help it a bit.

If you have some facts to add, feel free to comment.

Edit 8-Feb-2013 – warning: My colleague informed me that after removing oid row from a table, large object itself stays in the DB if you use Hibernate (can’t tell anything about other providers). Reportedly, there is hot ongoing debate between Hibernate and PostgreSQL guys. Both may be right that it’s not their job to do the stuff and they don’t have enough information to delete the object, because large object itself can be shared and referenced from more tables and in fact only you, the programmer can know if the object can go away or not. Check the PostgreSQL documentation to find out how to remove the object, also check if vacuumlo doesn’t solve your problem.