header image
The doctor and the programmer
October 23rd, 2011 under Devel, Life, rengolin, World. [ Comments: none ]

About 15 years ago, when I was working on a dodgy Brazilian firm, I had a conversation with an older programmer that I never forgot. He said something along the lines of:

Medicine is way easier than computer science. Doctors are still using the same books, written decades ago, while we have to buy only the latest ones. Our reality gets rewritten every five years or so, and by the time you leave university, you’re already outdated.

There is a lot of merit in this argument. Even though the common cold’s strain changes every week, its symptoms are exactly the same. Cancer, HIV, malaria, Lupus and other big diseases are treated more or less the same way as they were when first treated, and GPs still give you aspirin/paracetamol/ibuprofen for any kind of pain.

Human anatomy, physiology and psychology doesn’t change at all. Broken legs, runny noses and swollen tonsils are the same on every person and they require the same treatment for everyone.

While doctors kill one patient when they do make a mistake, developers can kill hundreds if they happen to introduce a bug on the Airbus fault-tolerant fail-over system.

But recently, I had to re-think this through, and I have to say I’m not 100% in agreement any more.

When programmers change jobs, they get a few weeks to get used to the new system. They might get months to actually be productive as their peers and will mature within a few years working with the same piece of software. Programmers can run tests, regression tests and usability tests, unit tests, etc, which is something a bit complicated with human beings.

When a doctor gets a new patient, it’s like getting a new programming job. It’s the same language, but the system is completely different. It might take you weeks to start getting the prognosis right, and in a few months you’ll be able to get it right before your patient even tells you the symptoms.

The similarities are remarkable

Consultant programmers get new systems to work on every week. Like ER doctors. They do what they can, with the time they have and the solution is most of the time acceptable. A more stable doctor or programmer might look at the job and cry, but the job was done, the fire was put off and the “patient” is out of the door.

Family doctors, that were there when you were born, know you better than yourself. They know when your symptoms are only psychological and what cause that and when it’s going to go away. They rarely touch the “system”, but normally fix an unrelated bug and you’re good as new.

But not everybody is lucky enough to have such doctors. They are expensive, and there aren’t enough good doctors in the health system of any country to account for every family. Even if the doctor share a hundred families, it’s still very far from enough.

This is the reason that systems fail, and get half-fixed, and why most GPs will send you home with a paracetamol unless you’re dying in front of them.

If doctors and programmers had such a different world, the emergent behaviour wouldn’t be that similar, I believe.


FreeCell puzzles solver API
September 25th, 2011 under Algorithms, Devel, Fun, Games, rengolin. [ Comments: none ]

This is a little pet project I did a while ago. It’s a FreeCell puzzle‘s solver API.

The idea is to provide a basic validation engine and board management (pretty much like my old chess validation), so people can write FreeCell solvers on top of it. It has basic board setup (of multiple sizes), movement legalisation, and a basic Solver class, which you must derive to create your own solvers.

There’s even a BruteFroceSolver that can solve a few small boards, and that gives you an idea on how to create your own solvers. However, the API is not clear enough yet that children could start playing with it, and that’s the real goal of this project: to get kids interested in solving complex optimisation problems in an easy way.

Freecell is a perfect game for it. Most boards can be solved (only a handful of them were proven – by exhaustion – not solvable), some movements can be rolled back and you can freely re-use cards that have already been placed into the foundations (their final destination) back in the game again.

It’s out of the scope of this project to produce a full-featured graphic interface for kids, but making the API easy enough so they understand the concepts without dragging themselves into idiosyncrasies of C++ is important.

Compiler optimisations

The reason why I did this was to make some of the optimisations compiler engineers have to do more appealing to non-compiler engineers or children with a taste for complex problems. But what does this have to do with compilers? The analogy is a bit far-fetching and somewhat reverse, but it’s interesting nevertheless and it was worth the shot.

Programming languages are transformed into graphs inside the compiler, which should represent the intentions of the original programmer. This graphs are often optimised multiple times until you end up with a stream of instructions representing the source code in the machine language.

Ignore for now the optimisations on those graphs, and focus on the final part: selecting machine instructions that can represent that final graph in the machine language (assembly). This selection can pick any assembly instruction at will, but it has to put them into a very specific order to represent the same semantics (not just syntactic) of the original program. Since many instructions have side effects, pipeline interactions, special flags set or cleaned up, it’s not trivial to produce correct code if you don’t check and re-check all conditions every time. This is a known complex optimisation problem and can be responsible for changes in speed or code size in orders of magnitude.

What does it have to do with the Freecell puzzle? Well, in Freecell, you have a number of cards and you have to put them in a specific order, just like assembly instructions. But in this case, the analogy is reverse: the “sequence” is trivial, but the “instructions” are hard to get.

There are other similarities. For example, you have four free cells, and they can only hold one value at a time. They are similar to registers, and manipulating them, gives you a good taste of how hard it is to do register scheduling when building the assembly result. But in this case, it’s much harder to spill (move the data back to memory, or in this case, card back to cascades), since there are strict rules on how to move cards around.

Reusing cards from the foundations is similar to expanding single instructions into a sequence of them in order to circumvent pipeline stalls. In real compilers you could expand a multiply+add (very useful for digital signal processing) into two instructions: multiply and add, if that gives you some advantage on special cases on special chips. In Freecell, you can use a 9 on top of a 10, to move an 8 from another cascade and free up a card that you need to clean up your freecells (registers).

I’m sure you can find many more similarities, even if you have to skew the rules a bit (or completely reverse them), but that’s not the point. The point is to interest people into complex optimisation techniques without the hassle of learning a whole new section of computer science, especially if that section puts fear in most people in the first place.


2011 European User Group Meeting
July 19th, 2011 under Devel, rengolin. [ Comments: none ]

For all LLVM users in Europe that didn’t catch the announcement on the main list, be warned that we’re organising the European LLVM User Group Meeting in September, 16th, in London. All information necessary to register and submit your work is on the link.

It’s free of charge and we’ll also provide a happy-hour after the meeting for an extended networking. If you’re just curious about LLVM or work with it day to day or is thinking of starting your PhD with LLVM, please register and show up. You won’t regret. ;)

More information will be available on the link above and through the main mailing list. Stay tuned!


C++ class sizes #2
March 31st, 2011 under Devel, rengolin. [ Comments: 1 ]

In my previous post about class sizes, we visited the empty classes and how their derived classes change in size. Now, we’re going to look what the ABI have to say about tail padding in inherited classes.

Again, for normal (POD) structures, everything is as it looks like, same as in C. But for more complex (non-POD) classes, things change a bit. The Itanium C++ ABI has a rather complex rule for defining the offset and size of class members, but each target-specific ABI (like ARM’s) can have additional rules that make it very complex to define the size of a class member.

Plain-Old-Data

So, first, a brief explanation of POD. POD is a concept in C++ that states, for the purpose of layout, a C++ structure (that is more than simple C structures) are just like plain old data. This normally means that the structure behaves like data, ie. it can be created on zero-initialized memory, copied with memcpy without any side effect, etc.

But not being a POD doesn’t always means you have a complex virtual class with multiple inheritance. Sometimes a simple change can transform it into a non-POD, like adding a simple empty constructor to the base class.

For example, in both C and C++, the structure below has 8 bytes:

struct Foo {
  int a;
  char b;
};

because the structure alignment is the same as the most aligned member, which is int. Assuming int is 32 bits, the second member has size one but aligns to four, yielding a size 8 for the final structure.

In C++, you can derive from that structure, and the derived class is normally implemented by having the base class as sort of a member inside it.

So, the C++ struct Bar below:

struct Foo {
	int a;
	char b;
};

struct Bar : Foo {
	char c;
};

is similar (for layout purposes) to a C construct:

struct Bar {
	struct {
		int a;
		char b;
	};
	char c;
};

So, both in C and in C++, the size of Bar is 12 bytes. Now, doing a slight change in Foo, we’re going to break that:

struct Foo {
	int a;
	char b;
	Foo() {}
};

struct Bar : Foo {
	char c;
};

You may be surprised to know that the size of Bar is still 8 bytes.

non-POD structures

Adding the constructor is one way of killing the POD-ness of a structure, but anything that adds some C++ flavour to it is enough. For instance, Bar itself is a non-POD, because it was derived from another structure. Adding simple methods (non constructors) is fine, but adding a virtual method makes it a non-POD (as you have to add the virtual table). Anything that leaves some room for implementation detail will be considered a violation of the POD contract and will transform your structure to a non-POD.

Ok, so what does that mean? If it just means your structures will be smaller, well, I’ll add a constructor to all my structures! But, no, the only change in in the tail padding. If you want to compress your structures, use packed structures, the tail padding rules are too complex to deal with them and still maintain your code clean.

So, the original Foo structure had 8 bytes. Four for the first int, Four for the char. But the char only used one of those four, so you have a 3-byte tail. When inheriting from a non-POD structure, the C++ ABI allows you to use the tail padding of the base class, providing your new type fits in there. In our example, the new type was char that fits in the 3-byte tail of the base class, so the final layout uses two of the four bytes and still have a 2-byte tail.

Further inheritance of the same type (adding a single char member) will still pack at the end of the base structure.

Bit-fields

Every time you think you have understood the rules of C++ layout, go back to bit-fields and you’ll always be surprised.

Lets have a look at this example for a moment:

struct Foo {
	char a;
	int b : 8;
	Foo(){}
};

struct Bar : Foo {
	int c : 8;
};

The first member is a char, so one byte. The second is an int, so 4 bytes and the final layout is 5 bytes, aligned to int goes to 8 bytes, right? Well, no. Even without the constructor, the size of that structure is 4 bytes.

Compilers are smart enough (and the ABI allows them to be) to figure that the member b will have only 8 bits, so there is no point in wasting 6 more bytes just to follow the declared type.

You may not be surprised, now, to know that the size of Bar is still 4 bytes, since the member c in Bar gets tail-padded into the base Foo.

But what about this:

struct Bar  {
	struct {
		char a;
		char a1;
		char a2;
		int b : 4;
	};
	int c : 4;
};

The three chars push the final member to the last byte in the block, of which only 4 bits are really used, so you have 4 bits of tail-padding. The second member c of Bar gets tail-padded into those four bits, since they fit, and the final size of that structure is 4 bytes.

I hope to have convinced you to never try to be smart and convert your structures and classes to arrays or pointers. Even disregarding virtual inheritance (which is probably coming in another post), the C++ ABI for structure layout is too complex to rely on and use it in your own code. Worse, because this is not in the standard, concrete implementations are free to change the details and sometimes, still following the generic ABI. You’ll find a few cases in the ARM ABI where they differ slightly from the ABI without actually being incompatible, sometimes not that lucky.

The final lesson is: be smart, use the language features and rely on the compiler to optimise things for you. If you’re not happy with the optimization, fill a bug rather than hack your code.


Why no MMORPG is good enough?
March 8th, 2011 under Devel, Fun, Games, rengolin, Software, Web. [ Comments: none ]

Massively multiplayer online role-playing game (MMORPG) are not new. The first I remember playing is the Legend Of the Red Dragon (LORD), but before that, of course, I’ve played other (real-life) multiplayer RPG games as well, and they were never quite the same thing.

Of course, at that time the graphic cards couldn’t quite compete with our imagination (not to mention connection speeds), but a lot has improved in both fronts, and lots of very good looking games have arrived, but still, there’s something missing. For years I couldn’t put my finger on it, but recently I think I nailed the issue: user driven content.

The interface

Most of the MMORGP are war games. World of Warcraft, LOTR online, Vendetta, Star Trek Online, Regnum and so many others rely on war to be fun. Of course, all of them have the side issues, some trade or silly missions, but the real fun is going to the battlefield.

If you look from the technical side of things, this is not surprising at all. Aside from good graphics, one of the hardest things to do in a game is a good interface. Very few games are perfect like Tetris. I gave Tetris to both my sons when they were about 2 years old and after about a minute they were already playing. There is no need to instructions, tutorials or any training and still today I find it quite fun. This is why it’s probably the most successful game in history.

But it’s all about the interface. It’s easy to create a seamless interface for Tetris. Try to create an interface for a strategy game that doesn’t require some hours of training, or an interface for first-person games that actually allows you to know where you are, or an interface for adventure games that doesn’t make you click in half-dozen places to get anything. Many have tried, all have failed.

At the dawn of internet games, strategy and quake were dominant. That’s because the internet wasn’t so fast and both were quite good in saving bandwidth. Quake had a special fix to avoid sending one packet for every bullet and only one packet when you press the trigger and another when you release it, the rest was up to the client.

But in war games, the interface is pretty much established. World of Warcraft didn’t invent anything, they just mixed Warcraft with Lara Croft (rubbish interface, by the way). Space ship games got the interface from Wing Commander (Vendetta got it from W.C. Privateer), Lord of the Rings and Regnum mixed Second Life with old-style RPG (even with the same deficiencies) and Star Trek Online copied from everyone above.

Now, the interface for a good strategy or adventure game is somewhat complicated. For a first-person 3D RPG, even worse. It doesn’t have to be mind controlled to be easy, nor you have to use 3D glasses or any immersion technology to be fun. Simplifying the game is one way, but then it’s even harder to make it more interesting.

It’s the user, stupid!

I can see only one way to add value to a game that is simple but still fun: user driven content.

You can enrich the world in which you’re immersed into. For instance, Zynga is quickly gathering an impressive amount of users by adding a lot of content. I don’t play those games, but people around me do and I can see why it’s so addictive. There are so many things to do and the frequency of updates is so impressive that it’s almost impossible not to be driven to it.

You might think that the games are too simple, and the graphics are average, but the interface is extremely simple, the challenges are simple, yet still challenging, and the number of paths you can choose for your character are enormous. In this case, the user experience is driven by his own choices. The content is so rich that each and every place is completely different from every other, solely driven by user choices.

Not many game companies (certainly not the indie ones) have time or money to do that. So, why are indie games generally more interesting than commercial ones? They go back to square one, simplify the game and optimise the interface. EA would never release something like Angry Birds or World of Goo, and yet those are the two best games I played in a long time. But, world of Goo is over and Angry Birds require constant attention (seasonal versions) to keep selling or making money (from ads).

They are missing the user content. It might not be their style, nor objective, but that’s a difference between Deep Purple and a one-hit-band.

Back to MMORGP

So, from all MMORPGs, there are many good looking, some with good challenges and a lot of slaughtering fun, but I tire quite quickly from playing them. The last I played was Vendetta. Quite good graphically, it has some reasonably accurate physics simulation (what drove me to it) but not accurate enough to keep me playing. The content tires too quickly to be fun after a few hours and even though I had 8 hours of free play, I spent less than two and dropped it.

This was always a pain, since Final Fantasy and the like, building up the character, hitting slugs for XP, fight-heal-run until you level up. Though Final Fantasy was better, as it normally would throw you on level 10 or so, so you didn’t need too much of levelling up. But why? Who likes beating 253 slugs to get 1000 experience points, going to level 2 and being able to equip a copper sword that doesn’t even cut a snail’s shell?

One of the best MMORGP experiences I had recently was Regnum. This is a free game made in Argentina and has a lot of content, good interface and a healthy community. They do the normal quest levelling up technique and it works quite well until level 15 or so. After that, it’s hitting bears, golems and phantoms for half a year until you can go outside and beat other users.

I got outside prematurely (couldn’t bother to wait) and the experience was also not great. Huge lag on big crowds, people disappearing in mid-air and appearing a few meters away, etc. But the most annoying of all was the content. It was always the same fort that we had to protect, always the same keep we had to attack, always the same talk on how our race is superior to your race, etc.

I haven’t seen Lord of the Rings (which sucks on Linux) or Star Trek Online (which doesn’t even run), but I bet they can get a bit further. They’re there to compete with World of Warcraft, not Regnum, but the fate will be the same: boring.

So, after quite a big rant, how would I do it?

User content

First, a memory refresh: all free first-person shooter I know of are a re-make of Quake. They use the same engine, the same world builders, the same techniques. On Debian repositories you’ll find at least a dozen, all of them running on some version of Quake. Nexuiz, Tremulous, Open Arena, Urban Terror, etc.

Not only the Quake engine is open source, but it was built to be extensible and that, even before the source was opened by ID. I made some levels for Doom, there were good editors at the time (1994?), probably there are full development suites today.

The user has the power to extend, create, evolve and transform your game in ways that you never thought possible. To think that only the few people you hire are good enough to create game content is to be, to say the least, naive.

Now, all those games are segmented. Nexuiz levels don’t connect to Tremulous levels. That’s because the mods (as they’re called) are independent. To be able to play all those different games you need to download a whole lot of data (objects, music, game logic, physics settings, etc) and each game has it radically different. Sarge with a rocket launcher would be invincible in most of other quake variants.

But that is, in my opinion, the missing link between short spurs of fun and long lasting enjoyment. I want to be able t build my world (like Zynga), but in a world with free movement (like Quake) with quests (like MMORPGs) made by the users themselves (like no FP-game I know) in a connected world. It cannot penalise those that connect seldom, or those that connect through text terminals, Android phones or browser users in any way.

As some games have started to understand, people like different things in games. I like building stuff and optimizing structures, some like carnage, others like to level up or wait 45 minutes for a virtual beef pie to be ready. You cannot have all that in one game if you’re the only content generator.

Also, if the engine is not open, people won’t be able to enhance it for their particular worlds. It doesn’t have to be open source, but it has to have a good API and an efficient plugin system. Tools to create mods and content is also essential, but the real deal is to give the users freedom to create their versions of the game and be able to connect them all.

If every game is also a server, people can create their small worlds inside the bigger world, that is in the central server. A business strategy would be, then, to host those worlds for people that really cared about them. Either host for free in exchange of ads and content generation, or paid hosting for the more serious users. You can also easily sell content, but more importantly, you can create a whole marketplace of content driven by the users! Read Neil Stephenson’s Snow Crash and you know what I mean.

I think Apple and Google have proven over and over that a market with apps generated by the users is very effective indeed! Intel is just following the same path with their new App store, so yes, this is a good business strategy. But, it’s still fun for a wider range of people, from game addicts to casual gamers, from heavy modders to passive Facebook users.

There are many ways of doing that, maybe not all of them will be successful, but at least from my point of view, until that day arrives, no game will be fun.


C++ class sizes
January 28th, 2011 under Algorithms, Devel, rengolin. [ Comments: none ]

In a recent struggle to define the class size in C++, I thought would be good to share some of my findings, since there isn’t much about it on the net. The empty class size is all over the place, neatly put in Stroustroup’s FAQ, but the other weird cases were less common to find.

For obvious reasons, the C++ standard is pretty vague about class sizes and memory layout. This is more of an ABI issue than a language standard, but the problem is that even the Itanium C++ ABI is a bit vague on that point.

The C++ standard allows the memory layout to be defined by the ABI and position the class members on where it fits more naturally on the target, but there is one specific clause: an object cannot have zero size. While most layout decisions are target specific, this is strictly a language decision. If objects were allowed to have zero bytes in size, an array of 1000 zero-sized objects would occupy no space at all and it would be impossible to choose one from the other.

Empty classes

First, a background on empty classes. An empty class is somewhat useless in most cases, but there is one case in which it plays an important role: type safety.

If you want to group objects that have nothing in common (to use in an algorithm), or to force a specific class of template parameters, or even differentiate between exceptions, even if they don’t have any data on it (just to catch them differently), you can use an empty class and it does the job pretty well. Thus, using and deriving from empty classes is somewhat a common enterprise in C++.

Empty class size

So, what happens when you declare an empty type?

class Empty {
};

What is the size that objects of this type will have? It cannot be zero, as the standard forbids, so it has to be something. The best choice is one byte, since no architecture will benefit from having less (please, don’t think of bit-fields!) and if alignment is a problem, most (all?) compilers will adjust the byte to a word aligned block in memory.

Obviously, an empty class that derives from another empty class also has to follow the same rule:

class Empty {
};

class NowWhat : public Empty {
};

In this case, both empty and NowWhat have size 1. But if NowWhat has data on it, will the extra byte still be there? On both standard and ABI, there is nothing saying that it shouldn’t, but also nothing saying it should. What’s important here is the reason why you need the extra byte: the address of different objects must be different.

When the derived class already has data, its objects will already be at different locations when laid out in memory, so there is no necessity of the extra byte.

class Empty {
};

class NowWhat : public Empty {
  int a;
};

Now, Empty still has 1 byte, but NowWhat has 4, not 5. Some people consider this an optimization, I just think it’s following the rules by what the standard requires.

The reverse case is much simpler. If the derived class is the empty one (for whatever reason), the size is already non-zero, so the classes below have both the same sizes (4):

class Full {
	int a;
};

class Empty : public Full {
};

Multiple Inheritance

Multiple inheritance brings a bit of colour to this problem. The C++ standard requires that two objects always occupy different memory locations (so they compare differently by their pointers and don’t allow non-empty zero-sized arrays). So what is the size of the Derived class below?

class Empty {
};

class M1 : public Empty {
};

class M2 : public Empty {
};

class Derived : public M1, M2 {
};

Before, we could make the NowWhat class with only one byte because that would be sufficient to discriminate it in an array, but what about comparison. If you cast a Derived object to Empty via M1 you get a different (logical) object than if you do it via M2, so they must compare differently. For that, you need to add another byte, and return the first address when via one class and the second via the other.

Empty members

With single and multiple inheritance nailed down, let’s think of a more complicated case.

class Empty {
};

class Another : public Empty {
};

class NotSoMuch : public Empty {
	Another a;
};

What happening here is that the first item of the derived class is, in fact, of an empty class type, that also derives from the same class as the type.

See, this last bit is important, because if you replace Another a; by another empty class type (say Empty2), the size of the derived class will still be 1. But in this case above, this is not true. The size of NotSoMuch is actually 2.

The field Another a; has size one, as any empty class objects, so what’s the second byte for? The second byte is a padding, at the beginning of the class, to avoid type conflicts.

Type conflicts

When deriving from empty classes, the ABI (2.4-II-3) states that you should always try to put the derived members at zero offset and, in case of type conflict, move down (alignment-wise) until you don’t have any more type conflicts. The question is, what is a type conflict in this case? In the multiple inheritance, the type conflict was clear (the diamond inheritance), but here, not so much.

Since both the class and its first member can be converted to Empty types (as pointers, for instance), you can have two Empty objects that, if compared must return different addresses. In the code below, n is (a pointer to) the derived object. Itself is converted to an Empty pointer, stored in en.

	NotSoMuch *n = new NotSoMuch();
	Empty *en = (Empty*)n;
	Empty *ea = (Empty*)&n->a;

If the member Another a; was in the offset zero of the class, the pointer ea would be in the same address as en, and on a comparison for equality, it’d return true.

The type conflict in this case is that, while both en and ea are pointers to an Empty type, the former gets there via NotSoMuch and the latter via Another. According to the C++ standard, they must be different, thus, return different addresses.

Again, if the empty class member is not the first element, none of that happens. The example below has size 4 (instead of 5), because the two similar (but different) Empty pointers would now be separated by 4 bytes, thus not violating the standard.

class Empty {
};

class Another : public Empty {
};

class NotSoMuch : public Empty {
	int i;
	Another a;
};

Templates

Template code is also susceptible to this problem, of course, and the iostream library is full of such examples. The problem is not so much for the abuse of empty classes (that doesn’t happen that much in STL), it’s just that non-virtual classes that only have member functions and no data fields are considered empty classes. And since templates are a good replacement for most virtual inheritance (and its inherent run-time complexity and slowness), basic libraries have to make extensive use of templates to avoid slowing down every single C++ program.

It’s true that the iostream library is particularly heavy and cumbersome, but it could be much worse.

The additional problem with templates (for the compiler, at least), is figuring out if there is a type conflict. While templates are already expanded by the time the basic C++ compiler kicks in, it’s complicated to carry the information on the source code about inheritance graph to every single template variation and specialization. Not to mention the new standard, where C++ is getting a variadic templates mechanism

I hope this is one more reason for you to avoid C-style pointer casts (especially void*) and only use C++ static_cast and dynamic_cast. Even reinterpret_cast is dangerous, but at least you can grep them on your source files and identify them one by one.


Zero-cost exception handling in C++
October 16th, 2010 under Algorithms, Devel, rengolin. [ Comments: none ]

C++ exception handling is a controversial subject. The intricate concepts are difficult to grasp, even to seasoned programmers but, in complexity, nothing compares to its implementation. I can’t remember any real-world (commercial or otherwise) product that makes heavy use of exception handling. Even STL itself only throws a couple of them and only upon terminal failures.

Java exceptions, on the other hand, are much more common. Containers throw them at will (ex. OutOfBounds) and creating your own exceptions is very easy, developers are encouraged to use them. But still, some C++ projects’ coding standards hint (like LLVM’s) or specifically disallow (like Google’s) the use of exceptions. The Android NDK, for example, doesn’t even have (nor plans to) support exceptions.

Some of the cons commonly listed are the encouragement of bad style (throwing at will, like in Java), compilation problems with old code and all the gotchas that exception handling can bring to your code, making it a nightmare to understand the real flow of your program and to figure out the critical DON’Ts of exception handling (like throwing from inside a destructor). For a deeper discussion on that subject, read Sutter’s book, for the implementation details of exception handling from a compiler point of view, keep reading.

Three distant worlds

Exception handling is not just a language construction, like if, it’s an intricate interaction between three distant worlds. All the complications that exception handling brings to us have to be implemented by a three-layer infrastructure: the user code, the run-time library and the compiler. The communication between them has to be absolutely perfect, one bit different and the control flow goes south, you start executing random code and blow up your program for good.

Not only it’s three very different places, but the code is written by three different kinds of people, at different times. The only formal document that binds them together is the ABI (ex. the Itanium C++ ABI), which is, to say the least, vague. Luckily, the guys who write the run-time libraries are close enough to the guys writing the compiler (and hopefully, but not necessarily, the same writing the ABI).

In the end, if the compiler implementation is correct, the user only has to care about the rules imposed by the C++ standard. But that’s not too comforting, the C++ standard is complex and frequently driven by previous implementations, rather than a fix and clear rule of thumb. Most users don’t precisely understand why you can’t throw from inside a destructor or why throwing a newly created object is calling terminate on their code, they just avoid it at all costs.

Dirty and expensive

The easy exception handling implementation is called SetJump/LongJump. It literally sets jumps between your code and exception handling code (created by the compiler). If an exception was thrown, the flow jumps directly to the exception handling code. This is very easy, but extremely expensive. Usually, if you follow good practices, you’d expect that your program would spend most of the time doing real computation and only occasionally handling exceptions. But if for every action you create a bunch of exception handling data, you’re wasting a lot of resources to check for something that might not even happen during the whole run of your program.

When an exception happens, you need to go back down the call graph and inspect if any of those functions (including the one you just thrown) is catching that exception. This is called unwinding the stack, and in a SetJump/LongJump exception handling this is done by inspecting if any LongJump was called by a series of SetJump checks. The problem is, that those checks are done even if an exception was not thrown at all.

Zero-cost

So, how is it possible to only execute exception handling code when an exception really happen? Simple, let the compiler/library do it for you. ;)

The whole idea of a zero-cost exception handling is to create a set of functions (like __cxa_throw, __cxa_begin_catch) that will deal with the exception handling flow by sending it to a different path, the EH path. Under normal situations, your code flow will be identical with or without exceptions. The flow could never reach the EH blocks and it’s only because the compiler knows those blocks are in the EH path that they don’t get discarded as unreachable code, since there is no explicit flow between user code and EH code.

Only the run-time library knows how to get there, and that’s helped by a table, normally in a read-only data section, specific to exception handling (ex. the .eh_frame ELF section), created by the compiler when reading the user’s exception flow. This is the core interaction between user, compiler and library codes.

When an exception is thrown via __cxa_allocate_exception + __cxa_throw, the first function allocates space and prepare the object to be thrown and the second starts reading the exception handling table. This table contains a list of exceptions that the particular piece of code is catching (often sorted by address, so it’s easy to do a binary search) and the code that will catch the exception.

The code that catches exceptions is called the personality routine. The personality routine library call enables you to use multiple zero-cost EH mechanisms in the same program. Normally, the table entry contains a pointer to the routine and some data it’ll use while unwinding, but some ABIs (like ARM’s EHABI) can add specially crafted tables for their purposes (like having the unwind code as 16-bits instructions embedded in the data section).

All in all, the general flow is the same: the personality routine will find the EH block (also called landing pads, unreachable from normal code) and jump there, where the catching happens. That code will get the exception allocated before the throw and make it available to the user code (inside the catch block) to do what the user requested. If there is no entry in the EH table that catches that exception, the stack will be unwound again and the library will read the caller’s EH table, until it reaches a table that does catch it or terminate if none does.

Clean-ups

So far so good, but what happens with the local variables and the temporaries (stack-based data)? If they have non-trivial destructors, they must be called, which could chain a lot of cleaning up to do. Clean-up code is part in the unreachable blocks (when you’re cleaning up EH stuff) and part in normal code, for automatic destruction at the end of a lexical block. Although they do similar things, their purpose is completely different.

Normal flow clean-up blocks are always called and always clean up all objects. But EH clean-up code has to clean up only the objects that have been allocated so far, in the current lexical block or in any parent until the root block of the function. If you can imagine deeply nested blocks, creating objects at will, with a throw inside in the increment part of a for loop, where the object beeing incremented has a non-trivial destructor, you got the picture.

What happens if you throw an exceptions when other was being caught? Here is where the complexity met reality and the boundaries were settled. If you are unwinding one exception, and have a flow being dealt with by the run-time library, and another exception is thrown, you can’t distinguish which exception you are handling. Since there is space for one exception been unwounded at a time, whenever a second exception is thrown, you have to terminate.

Of course, you could argue to create multiple lanes for exception handling, where in every exception handling is handled separately, so you know that one exception was caused when throwing another. That may be easy to implement in Java (or other interpreted/bytecode languages), because the VM/interpreter has total control of the language, but as stated earlier, exception handling in C++ is just a contract between three parties: compiler, run-time library and the user, and often these three parties are completely different groups in completely different space-time realities.

Conclusion

This is just a very brief, shallow demonstration of how complex exception handling can be. Many other pitfalls appear, especially when dealing with specific user code running on a specific hardware platform with a specific binary configuration for the object file. It’s impossible to test every single combination (including room temperature, or annual rainfall forecast) to make it an implementation final and weird bugs creep up every once in a while.

Enhancing the ABI to allow that is possible, but the complexities of doing so, together with the very scarce use of exception handling in C++, makes it a very unlikely candidate to be changed in the near future. It’d have to be done in so many levels (C++ standard, ABIs, compilers, run-time libraries) that it’s best left alone.


What I don’t miss about Java
July 26th, 2010 under Devel, rengolin, Software. [ Comments: 5 ]

Disclaimer: This is not a rant

I spent my last year working with Java, and it was not at all bad. But while Java has its moments and shines, I always felt a bit out of place when using it. In fact, when I moved back to C++, contrary to when I moved to Java, I felt that I actually wasn’t missing much…

Last year, while writing in Java at work, I felt compelled more often than usual to write C++ programs at home. Even simple programs, that would do better with scripting languages, they all came in C++.

Recently, working full time with C++, I noticed I’m doing very little home development and definitely not doing any Java. So, what did I miss about C++ that I don’t miss about Java?

Expressiveness: While functional languages are much more expressive than C++, there are few languages less expressive than Java. Java encourages child-like programming like forcing to call everything by methods not operators. By not having explicit pointers, operator overload and other dangerous things from C++, you end up repeating yourself quite a lot and it’s very hard to understand the logic afterwards, when all you have is bloatware.

While Java designers tried to avoid pointers and operators, they couldn’t. We still have null references (throwing null pointer exceptions) and the fake operators (like toString(), hash(), compare()) that can easily be overridden to change the expected behaviour pretty much the same way as C++ operators, but in the “method” notation.

In the end, you can do some bad things, but not all. So, they took away dangers by taking away functionality, without a proper redesign of C++.

Abuse of Object Orientation: While in Ruby, everything is an object, in Java, almost everything can be. Every class derive from Object silently, but base types do not. So you have the basic objects (Integer et al) which get automatically converted into basic types in subtle ways it’s hard to predict and has a huge performance impact (see auto-boxing).

Not just performance, but the language design is, again, incomplete.

Most OO programmers (mainly Java ones) complain a lot about Perl OO. They say Perl (or Python for that matter) has no proper OO, since everything is a hash and there is no concept of protection.

While Java objects and members are strongly typed, and you have the concept of protection, it’s way too easy to transform Java OO into Perl OO with reflection.

Of course, with C++ you can cast things to void pointers, mess up in the memory and so on, but getting objects by name, removing the private protection in a safe way is simply wrong. It’s like giving loaded guns to children and telling them where the lock is.

Abuse of Design Patterns: Java developers are encourage to use design patterns, to the point of stupidity. The first thing I learnt from design patterns is that their misuse is actually an anti-pattern.

Properties are important when the requirements change too often, not when they’re static. Factories are used when the objects created may differ or be customized, not for never-changing one-object construction. Still, most libraries (all?) will have Factories, Properties and so on, just for the sake of Design Patterns Compliance ™.

Fact, one of the strengths of Java development is that every one is encouraged to do things the same way. No Larry Wall style, all factory workers, doing their share in the big picture. While this is good for big, quick projects on companies with high turn-over (like consultancy companies), it’s horrible for start-ups or more creative development.

Half-implemented features: Well, templates is an issue. There is no template mechanism in Java. With the so-called Generics (like cheap version of meds), there is no type safety at all, it’s just syntax sugar for lists of Objects.

That generates a lot of misunderstandings and bad code being generated when the syntax is obviously correct, that is, if the types were actually being checked.

Again, incomplete design for the sake of backward compatibility with old codes and VMs.

Performance: Running in a JVM is already a bad start for performance, but a good compiler and a well done JIT environment can take most of it away by intelligently removing unused code, re-optimizing most used code during run-time and using profiling results to change branch-prediction code.

While the JVM does some of it, it also introduces several problems that take away the advantage and put it back on the back of the class. Auto-boxing and generics create a lot of useless casts, that can be a huge performance hit. Very few Java programmers really care about it and the compiler doesn’t do a good job in reducing that impact or even warning the programmer.

I often see Java developer scorn at performance issues. The phrase used most is “a programmer shouldn’t care about memory footprint or performance, only about business logic”. That, together with the fact that almost all universities now are teaching Java in undergraduate courses, kinda frightens me a bit.

Strong dependency on IDEs: Borland made quite a lot of money out of C++ IDEs in the 90′s, but most C++ programmers I know still use VIM or Emacs. On the other hand, every Java programmer I know use Eclipse, IntelliJ or something of the sort.

This is not just ease of use (code completion, syntax colouring, hints, navigation), it’s all about speeding up the development process by taking away boiler-plate code generation and refactoring.

IDEs are capable of writing complete pieces of code, refactor and re-write things (even behind your back). The programmers don’t care about it, the code becomes bloated, unintelligible and forgotten. Not to mention the desire of IDEs and people following IDE-style to use certain patterns for everything, like using Properties where simple structures would suffice. (see above, Abuse of Design-Patterns).

False Guarantees: The big selling point of Java, besides cheap cross-platform development, is it’s apparent safety and ease of use. But it isn’t in so many levels…

The abuses and problems related above are only part of the story. The garbage collector is another…

Some good garbage collection routines can help the initial development of programs, and they do take away the job of the lazy programmers to manage their own memory, but the Java garbage collection became a beast, with incomprehensible command-line options, undefined behaviour and total lack of control over it. You’re rendered hostage to its desires.

Not to mention the complete memory management that won’t cope with dynamic memory allocation. I mean, if you want to make memory management easy for programmers (as they went to all that trouble for a garbage collection), you could have gone a bit further and actually figured out the available memory and used it politely.

Join those with the fact that pointers and operators are still available, and you have a language that is not so much simpler than C++, with a huge price in performance and weirdness.

Undocumented APIs: Java claims to be platform independent, but has quite a few available (but undocumented) APIs to use platforms specific functionality (like signals). Still, Sun (now Oracle) reserves the right to change whenever they wish and there’s little you (or anyone) can do about it.

And that takes us to the final point:

Standards (or lack thereof): Sun did a nice job at many things (mostly hardware and OS), but they screwed up neatly when it came to support software. There is no standard, IBM and even Microsoft created their own JVM (which was better than Sun’s, btw) without any final definition about the standard API. During the Java 1.1 days, it was possible to be platform agnostic but VM specific in the same platform!

Conclusion: Java was meant to be an easy language, but it turns out that it’s deceitful enough to be just as bad as any other. And recent changes are making it worse.

Programmers are loosing the ability to understand how the machine works, how their languages behave and, more importantly, to know the implications of their actions.

Why spend time understanding the fiddlings some people had with Java if you can spend the same time understanding how the machines actually work and therefore be able to use any programming language you want?

Some argue that Java is the new Cobol and will disappear the same way… I tend to agree…


Barrelfish
March 18th, 2010 under Algorithms, Devel, Distributed, OSS, rengolin, Software. [ Comments: none ]

Minix seems to be inspiring more operating systems nowadays. Microsoft Research is investing on a micro-kernel (they call it multi-kernel, as there are slight differences) called Barrelfish.

Despite being Microsoft, it’s BSD licensed. The mailing list looks pretty empty, the last snapshot is half a year ago and I couldn’t find an svn repository, but still more than I would expect from Microsoft anyway.

Multi-kernel

The basic concept is actually very interesting. The idea is to be able to have multi-core hybrid machines to the extreme, and still be able to run a single OS on it. Pretty much the same way some cluster solutions do (OpenMPI, for instance), but on a single machine. The idea is far from revolutionary. It’s a natural evolution of the multi-core trend with the current cluster solutions (available for years) and a fancy OS design (micro-kernel) that everyone learns in CS degrees.

What’s the difference, then? For one thing, the idea is to abstract everything away. CPUs will be just another piece of hardware, like the network or graphic cards. The OS will have the freedom to ask the GPU to do MP floating-point calculations, for instance, if it feels it’s going to benefit the total execution time. It’ll also be able to accept different CPUs in the same machine, Intel and ARM for instance (like the Dell Latitude z600), or have different GPUs, nVidia and ATI, and still use all the hardware.

With Windows, Linux and Mac today, you either use the nVidia driver or the ATI one. You also normally don’t have hybrid-core machines and absolutely can’t recover if one of the cores fail. This is not the same with cluster solutions, and Barrelfish’s idea is to simulate precisely that. In theory, you could do energy control (enabling and disabling cores), crash-recovery when one of the cores fail but not the other, or plug and play graphic or network cards and even different CPUs.

Imagine you have an ARM netbook that is great for browsing, but you want to play a game on it. You get your nVidia and a coreOcta 10Ghz USB4 and plug in. The OS recognizes the new hardware, loads the drivers and let you play your game. Battery life goes down, so once you’re back from the game, you just unplug the cards and continue browsing.

Scalability

So, how is it possible that Barrelfish can be that malleable? The key is communication. Shared memory is great for single-processed threaded code and acceptable for multi-processed OSs with little number of concurrent process accessing the same region in memory. Most modern OSs can handle many concurrent processes, but they rarely access the same data at the same time.

Normally, processes are single threaded or have a very small number of threads (dozens) running. More than that is so difficult to control that people usually fall back to other means, such as client/server or they just go out and buy more hardware. In clusters, there is no way to use shared memory. For one, accessing memory in another computer via network is just plain stupid, but even if you use shared memory in each node and client/server among different nodes, you’re bound to have trouble. This is why MPI solutions are so popular.

In Barrelfish there’s no shared memory at all. Every process communicate with each other via messages and duplicate content (rather than share). There is an obvious associated cost (memory and bus), but the lock-free semantics is worth it. It also gives Barrelfish another freedom: to choose the communication protocol generic enough so that each piece of hardware is completely independent of all others, and plug’n'play become seamless.

Challenges

It all seem fantastic, but there’s a long road ahead. First, message passing scales much better than shared memory, but nowadays there isn’t enough cores in most machines to make it worth it. Message passing also introduces some other problems that are not easily solvable: bus traffic and storage requirements increase considerably, and messages are not that generic in nature.

Some companies are famous for not adhering to standards (Apple comes to mind), and a standard hardware IPC framework would be quite hard to achieve. Also, even if using pure software IPC APIs, different companies will still use slightly modified APIs to suit their specific needs and complexity will rise, exponentially.

Another problem is where the hypervisor will live. Having a distributed control centre is cool and scales amazingly well, but its complexity also scales. In a hybrid-core machine, you have to run different instructions, in different orders, with different optimizations and communication. Choosing one core to deal with the scheduling and administration of the system is much easier, but leaves the single-point-of-failure.

Finally, going the multi-hybrid-independent style is way too complex. Even for a several-year’s project with lots of fund (M$) and clever people working on it. After all, if micro-kernel was really that useful, Tanembaum would have won the discussion with Linus. But, the future holds what the future holds, and reality (as well as hardware and some selfish vendors) can change. Multi-kernel might be possible and even easier to implement in the future.

This seems to be what the Barrelfish’s team is betting on, and I’m with them on that bet. Even if it fails miserably (as did Minix), some concepts could still be used in real-world operating systems (like Minix), whatever that’ll mean in 10 years. Being serious about parallelism is the only way forward, sticking with 40 years old concepts is definitely not.

I’m still aspiring for non-deterministic computing, though, but that’s an even longer shot…


Post-Agile
February 9th, 2010 under Corporate, Devel, Games, Politics, rengolin. [ Comments: none ]

A while ago I wrote an article about Agile and Scrum and wanted to write another one following my recent experience with Agile. However, somehow I couldn’t add anything of that great value to my original post that would be worth a new one.

And now I know I don’t have to. In this fantastic post, Gwaredd takes a deep look into all failures and successes of Agile, with the common misconceptions of believers and decision-makers. In the end, the so called “Post Agile”, is just plain common sense.


« Previous entries