Saturday, July 21, 2007

What's Wrong With Java Part 2

OR Mapping With Hibernate

After the model, let's look at the implementation. The first candidate is the most successful OR mapper combination in the Java world: Hibernate.

Hibernate brings all the features we need: It can lazy-load ordered and unordered data sets from the DB, map all kinds of weird relations and it lets us use Java for the model in a very comfortable way: We just plain Java (POJO's actually) and Hibernate does some magic behind the scenes that connects the objects to the database. What could be more simple?

Well, an OO language which is more dynamic, for example. Let's start with a simple task: Create a standalone keyword and put that into the DB. This is simple enough:

   1:
   2:
   3:
   4:
   5:
Keyword kw = new Keyword();
kw.setType (Keyword.KEYWORD);
kw.setName ("test");

session.save (kw);

Saving Keyword in database

(Please ignore the session object for now.)

That was easy, wasn't it? If you look at the log, you'll see that Hibernate sent an INSERT statement to the DB. Cool. So ... how do we use this new object? The first, most natural idea, would be to use the object we just saved:

   1:
   2:
   3:
   4:
Knowledge k = new Knowledge ();
k.addKeyword (kw);

session.save (k);

Saving Knowledge with a keyword in the database

Unfortunately, this doesn't work. It does work in your test but in the final application, the Keyword is created in the first transaction and the Knowledge in the second one. So Hibernate will (rightfully) complain that you can't use that keyword anymore because someone else might have changed it.

Now, what? You have to ask Hibernate for a copy of every object after you closed the transaction in which you created it before you can use it anywhere else:

   1:
   2:
   3:
   4:
   5:
   6:
   7:
   8:
   9:
  10:
  11:
Keyword kw = new Keyword();
kw.setType (Keyword.KEYWORD);
kw.setName ("test");

session.save (kw);
kw = dao.loadById (kw.getId ());

Knowledge k = new Knowledge ();
k.addKeyword (kw);

session.save (k);

How to save Knowledge with a keyword in the database with transactions

Why do we have to load an object after just saving it? Well ... because of Java. Java has very strict rules what you can do with (or to) an object instance after it has been created. One of them is that you can't replace methods. So what, you'd think. In our case, things aren't that simple. In our model, the name of a Knowledge instance is a Keyword. When you look at the code, you'll see the standard setter. But when you run it, you'll see that someone loads the item from the KEYWORD table. What is going on?

   1:
   2:
   3:
public void setName (Keyword name) {
    this.name = name;
}

setName() method

Behind the scenes, Hibernate replaces this method by using a proxy object, so it can notice when you change the model (setting a new name). The most simple soltuion would be to replace the method setName() in session.save() with calls the original setter and notifies Hibernate about the modification. In Python, that's three lines of code. Unfortunately, this is impossible in Java.

So to get this proxy objects, you must show an object to Hibernate, let it make a copy (by calling save()) and then ask for the new copy which is in fact a wrapper object that behaves just like your original object but it also knows when to send commands to the database. Simple, eh?

Makes me wonder why session.save() doesn't simply return the new object when it is more safe to use it from now on ... especially when you have a model which is modified over several transactions. In this case, you can easily end up with a mix of native and proxy objects which will cause no end of headache.

Anyway. This approach has a few drawbacks:

  • If someone else creates the object, calls your code and then continues to do something with the original object (because people usually don't expect methods to replace objects with copies when they call them), you're in deep trouble. Usually, you can't change that other code. You loose. Go away.
  • The proxy object is very similar but not the same as the original object. The biggest difference is that it has a different class. This means, in equals(), you can't use this.getClass == other.getClass(). Instead, you have to use instanceof (the copy is derived from the original class). This breaks the contract of equals() which says that it must be symmetric.
  • If you have large, complex objects, copying them is expensive.
  • After a while, you will start to write factory methods that create the objects for you. The code is always the same: Create a simple object, save it, load it again and then return the copy. Apart from cut&paste, this means that you must not call new for some of your objects. Again, this breaks habits which leads to bugs.

All in all, the whole approach is clumsy. Really, it's not Hibernate's fault but the code is still ugly, hard to maintain (because it breaks the implicit rules we have become so used to). In Python, you just create the object and use it. The dynamic nature of Python allows the OR mapper to replace or wrap all the methods as it needs to and you never notice it. The code is clean, easy to understand and compact.

Another problem are the XML config files. Besides all the issues with Java XML parsers, it is always problematic to store the same information in two places. If you ever change your Java model, you better not forget to update the XML or you will get strange errors. You can't refactor the model classes anymore because there is code outside the scope of your refactoring tool. And let's not forget code completion which works pretty good for Java. Not so for XML files. If you're lucky, someone has written a code completion for your type of XML config. Still, there will be problems. If there is a new version, your code completion will lag behind.

It's like regexp: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski

Fortunately, Sun solved this problem with JPA (or at least eased the pain). JPA allows to use annotations to store the mapping configuration in the class file itself. Apart from a few small problems (like setting up everything), this works pretty well. Code completion works perfectly because any IDE which has code completion will be able to use the latest and greatest version of your helper JARs without any customization. Just drop the new JAR in your classpath and you're ready to do. Swell.

But there are more problems:

  • You must create a session object "somewhere" and hand it around. If you're writing a webapp, this better be thread-safe. Not to mention you must be able to override this for tests.
  • The session object must track if you have already started a transaction and nest them properly or you will have to duplicate code because you can't call existing methods if they use transactions.
  • Spring and AOP will help a lot but they also add another layer of complexity, you'll have to learn another API, another set of rules how to organize your code, etc.
  • JAR file-size. My code is 246KB. The JARs it depends on take ... 6'096KB, more than 40 times of my code. And I'm not even using Spring.
  • Even with JPA, Hibernate is not simple to use because Java itself is not simple to use.

In the end, the model was 5'400 LoC. A added a small UI to it using SWT/JFace which added 2'400 LoC.

If you look at the model in the previous installment, then the question is: Why do I need 5'000 LoC to write a program which implements an OR mapper for a model which has only three classes and 26 lines of code?

Granted, test cases and helper code take their toll. I could accept that this code needs four or five times the size of the model itself. Still, we have a gap.

The answer is that there are no or bad defaults. For our simple case, Hibernate could guess everything. Java could generate all the setters and getters, equals() and hashCode(). It's no black magic to figure out that Relation has a reference to Knowledge so there needs to be a database table which stores this information. Sadly, defaults in Java are always "safe" rather than "clever". This is the main difference to newer languages. They try to guess most of the stuff and then, you can fix those few exceptions that you always have. With Java, all the exceptions are handled but you have to do everyday stuff yourself.

The whole experience was frustrating, especially since I'm a seasoned Java developer. It took me almost two weeks to write the code for this small model mostly because because of a bug in Hibernate 3.1 and because I couldn't get my mind around the existing documentation. Also, parent-child relations were poorly documented in the first Hibernate book. The second book explains this much better.

Conclusion: Use it if you must. Today, there are better ways.

Next stop: TurboGears, a Python web framework using SQL Objects.

5 comments:

Unknown said...

I suggest you rethink this entry. You seem to have deeply misunderstood how Hibernate works. Hibernate uses proxies but not at all how and for the reason you are describing.

5000 LoC to CRUD + execute a few queries for 3 entities is way too much, you probably can divide the LoC by 5 to 10 at least

Oh and there is no need to read the object again to start using it, that would be awful :)

Aaron Digulla said...

Hibernate uses proxies but not at all how and for the reason you are describing.

What is Hibernate using proxies for if not to replace collections with its own implementations so it can watch model changes and allow lazy loading?

Oh and there is no need to read the object again[...]

I haven't found a way which doesn't throw exceptions later (duplicate key errors, transient object errors, cache sync problems, etc).

Please note that my model is consisting mostly of relations between objects instead of the simple field mappings.

5000 LoC to CRUD + execute a few queries for 3 entities is way too much[...]

I agree but with all my experience, I that's what I ended up with. Most of that code is boiler plate, cut&paste (because Java doesn't allow macros) and test cases to make sure everything works. Maybe I should post the project so people can have a look themselves where all the space went.

Unknown said...

Hibernate does not proxy collections, it just has its own collection implementations. Proxy are used to lazy ManyToOne associations, but it's not useful at object creation.

If you have such issues when manipulating object managed by Hibernate, it';s probably because you misused the Session (esp when to start it and when to close it).

I don't write boiler plate code, nor use copy/paste ( ok sometimes when I'm lazy ;) ) and most of my code is written in Java. Come on don't blame macros ;)

Anonymous said...

why not just re-attach the same object if it was created in a different session?

Aaron Digulla said...

Because it's not just "one object". It's the whole model. When I'm not doing webapps, I want to keep my whole model in memory all the time (or at least all the parts the user has seen/worked on). I certainly don't want to load everything from DB every time the user clicks on a node to navigate the model.

Which means that eventually, I'll end up with a mix of persisted and non-persisted objects unless I make sure that I never ever add an object to my model which doesn't come out of Hibernate.

Hence the need to have a factory method which loops every object through Hibernate before anyone could attach it to the model.