Sunday, August 31, 2008

The Space Between Two Characters

If you're claustrophobic, you're afraid of confined spaces. If you're a software developer, you can be afraid of non-existing space.

When it comes to editing text, we usually don't think about the space between the characters. There simply isn't any. When you write a text editor, things start to look different. Suddenly, you have a caret or cursor which goes between the characters and that space between two characters can suddenly become uncomfortably tight.

Fire up your favorite text editor, Word, Writer, whatever. It has to support character formatting, though. Now enter this:

Hello, world!

If in doubt, the bold text ends before the comma and the italic part starts with the "w" and ends with "!" including both.

Now move to the "e" and type "x". What do you get?

Hxello, world!

Piece of cake.

Now move to the "H" and type "x". What do you get? Is the new x bold or not? Do you get "xHxello" or "xHxello"? How about typing "x" after the "o" of our abused "Hello"? Is that new x bold or not? If it is, what is the most simple way to make it non-bold? Do you have to delete the comma, do you have to go through menus or toolbars or is there a simple, consistent way to add a character inside and outside of a formatted range of text?

Let's go one step further. Add a character after the "!". Is it italic? If not, you're lucky. If it is ... what's the most simple way to you get rid of the italic? If you press Return now, will the italic leak to the next line? If not, how can you make it leak? If that italic is the last thing in your text, can you add non-italic text beyond without fumbling with the formatting options?

There is no space between two characters and when you write a text editor, that non-existing space is biting you. Which is actually the problem: There is no consistent way to move in and out of a formatted range of characters.

The naive attempt would be to say "depending on the side you came from, you're inside or outside." So, if we have this (| is the cursor or caret): "Hello |world" and you type something, the question is: How did the caret end there? Did it come from "w|o" and moved one to the left? Or from "o| " and move one position right?

That works somewhat but it fails at the beginning and the end of the text plus you're in trouble during deleting text. What should happen after the last character of "Hello" has been deleted? Should that also delete the character range or should there be an empty, invisible bold range left and when you type something now, it should appear again? If you keep the empty invisible range, when do you drop it? Do you keep it as long as the user stays "in" it? Or until the document is saved? Loaded again from disk?

It's a mess and there is a reason why neither Word nor OpenOffice get it right: You can't. There is information in the head of the user (what she wants) but no way for her to tell the computer. Duh.

That is, unless you start to give the user a visual cue what is going on. The problems we have is that there is no simple, obvious way for the user to say "I want ..." because there is no space on the screen reserved for this. We barely manage to squeeze a caret between the characters. There is just not enough room.

Well, there could be. A simple solution might be to add a little hint to the cursor to show which way it is leaning right now. Right. How about "A|B"? Here, you have three options. Add bold, italic and normal.

In HTML, this is simple. I'm editing this text in Firefox using the standard text area. What looks fancy to you looks like this to me: "<b>A</b>|<i>B</i>"

And this is the solution: I need to add a visual cue for the start and end of the format ranges. Maybe a simple U-shape which underlines the text for which the character format applies. Or an image (> and < in this example): ">A<|>B<". And suddenly, it's completely obvious on which side of the range start and end you are and what you want. You can delete the text in the range without losing it or you can delete both and you can move in and out of the range at will.

The drawback is that you need to keep that information somewhere. It adds a pretty huge cost to the limits of a format range. I'll have to try and see how much that is and if I can get away with less by cleverly using the information I already have.

Also, it clearly violates WHYSIWYG. On the other hand, we get WYSIWYW which is probably better for the user.

DecentXML 1.2

DecentXML 1.2, my own XML 1.1-compliant parser, is now available.

Wednesday, August 27, 2008

Text Editor Component and JADS

While working on DecentXML (1.2 due this weekend), I've had those other two things that were bugging me. One is that there is no high-quality, open-source framework with algorithms and data structures. I'm not talking about java.lang.Collections, I'm talking about red-black trees, interval trees, gap buffers, things like that. Powerful data structures you need to build complex software.

Welcome the "Java Algorithm and Data Structure" project - jads. I haven't started opened a project page on SourceForge or Google Code, yet, but I'll probably do that this weekend.

Based on that, I'm working on a versatile text editor component for Java software. The final editor will work with user interfaces implemented in Swing, SWT and Qt. It's an extensible framework where you can easily replace parts with your own code to get the special features you need. I currently have a demo running which can display text, which allows scrolling and where you can do some basic editing. Nothing fancy but it's coming alone nicely.

If you want to hear more about these projects, post a comment or drop me a mail.

Tuesday, August 19, 2008

Death Star in EVE Online

Apparently, a group of 4000 players of EVE Online have built a kind of a "Death Star" (a "titan ship" in the language of the game) to rule the game galaxy. Assembly took 8 months in total secrecy and the result was destroyed completely within 3 months.

Another Lesson on Performance

Just another story you can tell someone who fears that "XYZ might be too slow":

I'm toying with the idea to write a new text editor. I mean, I've written my own OS, my own XML parser and I once maintained XDME, an editor written originally by Matthew Dillon. XDME has a couple of bugs and major design flaws that I always wanted to fix but never really got to it. Anyway.

There are various data structures which are suitable for a text editor and some of those depend on copying data around (see gap buffers). The question is: How effective is that? The first instinct of a developer is to avoid copying large amounts data and to optimize the data structure instead.

After years of training, I've yet to overcome this instinct and start to measure:

    public static void main (String[] args)
    {
        long start = System.currentTimeMillis ();
        
        int N = 10000;
        for (int i=0; i<N; i++)
        {
            int[] buffer = new int[1024*1024];
            System.arraycopy (buffer, 0, buffer, 1,
                buffer.length-1);
        }
        
        long duration = System.currentTimeMillis () - start;
        System.out.println (duration);
        System.out.println (duration / N);
    }

On my computer at work (which is pretty fast but not cutting edge), prints: "135223" and "13". That's thirteen milliseconds to copy 4MB of RAM. Okay. It's obviously not worth to spend a second to think about the cost of moving data around in a big block of bytes.

Lesson: If you're talking about performance and you didn't measure, you have no idea what you're talking about.

Still not convinced? Read this.

Monday, August 04, 2008

Quantity Always Trumps Quality

While I wouldn't completely subscribe to that without a grain of salt, the story is nice.

Four harmful Java idioms, and how NOT to fix them

In his article "Four harmful Java idioms, and how to fix them", John O'Hanley writes about how to make Java more maintainable. He picks four common patterns and gives tips how to fix them ... or not. Follow me.

Names

The first idea is to prefix names with a letter giving a hint what they mean: Is "start" a method? A field? A parameter? The goal is to make the code more readable to humans.

Unfortunately, this doesn't work. The human brain doesn't read letters, it reads words. So "fStart" (meaning a field with the name "start") is rejected by the brain because it's not a word. This triggers the conscious analysis which John tries to avoid! Which is why modern IDEs use color to tell you what something is: The brain can decode color and words in independent parts - unconsciously.

Packaging Convention

Next, he moves on how to split code into packages. Currently, we use a "package by layer" scheme, meaning all DB code goes into one package and the model code into another and UI layer in a third, etc. He proposes to use a "by-feature" packaging with the litmus test "you should be able to delete a feature by deleting a single directory, without leaving behind any cruft".

Uhm. When have you ever written any code where you could remove a feature just by deleting a class? This sounds nice and simple but it's fails Einstein's litmus test: "Make it as simple as possible but not more simple". Even if you have a plug-in based software like Eclipse, this doesn't work because there are still references outside (otherwise, your plug-in wouldn't be able to do anything).

Also, to keep a feature as isolated from everything else as possible (which is a good thing), you need to copy a lot of code into the feature which would otherwise reside elsewhere, neatly packed up in its own package. Really just a limitation of Java where you can't tell the compiler to generate boiler plate code for you. Still, you need to cut code in such a way that it reduces dependencies, not increases them. Therefore, a general rule won't cut (or maybe it will cut: you).

Immutables

John quotes: "'Classes should be immutable unless there is a very good reason for making them mutable,' says Bloch.". And later: "From a practical perspective, many widely-used frameworks require application programmers to use JavaBeans (or something similar) to model database records. This is deeply unfortunate, because it doesn't allow programmers to take advantage of the many positive qualities of immutable objects."

From a practical perspective, immutable objects are dead weight. Applications are all about changing something. I read data from the database, I modify it, I write it back. I rarely read, display and forget about something. Yes, immutables have advantages because they can be shared between threads but that's their only advantage.

Just think about this: You must modify data from the database. So you read the data into an immutable. How to modify it, now? Obviously, you need a method to change it. If you prefer setters, the "setter" must return a copy. So you need to copy the object for every single change. If you want to get a feeling for that, try to do math with BigDecimal. Okay, after the copy you can write the copy back to the database. Question: How do you notify everyone else who might have a (now stale) copy of the old immutable? There are no listeners; immutables can't have listeners. Duh. Driving this to the extreme, lists wouldn't offer methods to add or remove items; or rather they would return new copies of themselves after every add/remove operation.

Sorry, no sale. I can't add money to my cash register. It's immutable.

And a colleague just introduced me to another great concept: Constructors which require values for all fields. The class in question has 95 fields. This idea has the following flaws: a) No matter how big your screen, you can't fit the call onto it. b) After argument #10, you lost track and you can't see anymore which value goes into which argument. Now imagine you have to remove a field. How do you find the right one in this mega-call?

No, nothing beats the no-arg constructor plus a list of setters, all costs considered.

Private members

John proposes to move private members to the end of the class. Here, I agree. I'd even put them close to the getter and setter so that a lot of stuff that belongs together is together.

In todays IDEs with their superb code navigation (I can't really believe there was a time before the F3 key), this doesn't matter much, though.

Conclusion: Think about it, but don't bother.