When it comes to testing, databases are kind of a sore spot. People like to think that "you can't test when you need a database" or "it's too complicated" or "it's not worth it." I'd like to give you some ideas what you can do when you need to test code that depends on a database. This list is sorted in the order in which I try to tackle the problem:
- Use POJOs to store the data from the database in the real code and for the tests, create some dummy objects with test data and use them.
- Make the database layer a plug-in of your application and replace it with a mockup for testing that doesn't need the database and which returns test objects instead.
- Instead of connecting to the real database, get HSQLDB or Derby and use an embedded or at least local database. I prefer HSQLDB because it's smaller and starts faster (and tests should always be fast) but Derby has more features.
- Create a second instance of the production database system on a different machine, preferably your own computer.
- Create another instance of the real database with test data on the same machine as the real database.
- Use database schemas to create a logical database in the real database, for example if all tables are in the schema APP, create APP_TEST and in your code, add a way to replace the schema name in the SQL statements. If you wrote the DB layer yourself, use a system property which isn't set in production. If you're using Hibernate, walk the mapping objects which are created and replace the table names after loading the production configuration. Field.setAccessible(true) is your friend.
If you can't decide, here are a few hints:
Creating two databases using schemas in the same instance can get you into serious trouble without you noticing. For example, the tests should be able to rebuild the test database from scratch at the press of a button so you can be sure in which state the database really is. If you make a mistake with the schema name during that setup, you'll destroy the real database. You might not notice you did, because the flawed statement is usually hidden under a few hundred others.
Installing a second instance on a different machine might trigger license fees or your DB admin might not like it for some reason. Also, a test database should be very flexible because you'll need to be able to drop and recreate it a dozen times per hour if you need to. Your DB admins might not like to give you the necessary rights to do that. Lastly, this means only one developer can run all the tests at any given point in time because you're all going against the same database. This is bad, really bad. More often than not, you'll have spurious errors because of that.
If you can legally get a copy of the real database on your own machine, that's cool ... until you see the memory, CPU and hard disk requirements plus a DB admin will probably hog your machine for a day or two to install it. Having to run two applications which need 1GB of RAM (your IDE and the DB) with a machine that has only 1GB of RAM isn't going to fun.
For many cases, using HSQLDB or Derby is a good compromise between all forces that pull at you. While that will make your tests slow, they will often run much faster than against the real DB. You can install these as many times you like without any license issues, fees or DB admins bothering you. They don't take much memory or hard disk space and they are under your total control.
Only, they are not the real DB. There might be slight differences, hidden performance issues and other stuff that you won't notice. Don't bother about that, though. If you can test your application, you'll find that you'll be able to fix any problems that come up when you run against the real database in little time. If you can't test your application, thought, well, you're doomed.
I strongly recommend to be able to setup the database from scratch automatically. With Derby, you can create a template database and clone that on the first connection. With HSQLDB, loading data is so fast that you can afford to rebuild it with INSERT statements every time you run the tests.
Still, test as much code as possible without a database connection. For one, any test without a DB will run 100-1000 times faster. Secondly, you're adding a couple more points of failure to your test which are really outside the scope of your test. Remember, every test in your suite should test exactly one thing. But if you can't get rid of the connection, you're testing the feature you want plus the DB layer plus the DB connection plus the DB itself. Plus you'll have the overhead of setup, etc. It will be hard to run a single test from your suite.
At the end of the day, testing needs to be fun. If you feel that the tests are the biggest obstacle in being productive, you wouldn't be the good developer you are if you didn't get rid of them.
One last thing: Do not load as much data as possible! It is a common mistake to think that your tests will be "better" if you have "as much data as possible". Instead load as little data as possible to make the tests work. When you find a bug, add as little data as possible to create a test for this bug. Otherwise, you'll hog your database with useless junk that a) costs time, b) no one can tell apart from the useful stuff and c) it will give you a false feeling of safety that isn't there.
If you don't know which data is useful and which isn't, then you don't know. Loading of huge amounts of junk into your database won't change that. In order to learn, you must start with what you know and work from there. Simply copying the whole production system will only slow you down and it will overwrite the carefully designed test cases you inserted yesterday.