header image
Scientific explanation about the Force
November 17th, 2008 under Fun, Science, rengolin. [ Comments: 2 ]

트레이딩 Before the second trilogy (actually the first three films) came, the force was something spiritual that some people had more than others. As Obi-wan described: “an energy field created by all living things, that surrounds and penetrates living beings and binds the galaxy together”, the force was magical and intended to be interpreted as some form of God’s will.

Then the first film came with the midi-chlorians non-sense, trying to please religious as well as sceptical people, but failed to explain why midi-chlorians would please the gods more than normal cells would. After all, aren’t we all “sons of God”? Apparently, only the midi-chlorians were…

Anyway, getting rid of all that God stuff, I came up to a perfectly rational explanation of how the midi-chlorians work (not that I like them more than the energy field).

It’s actually quite simple. As Qui-Gon Jinn explains to Anakin when he was just a boy (before the pod race), the force helps you predict when things are going to happen, so you can avoid them before they actually happen.

Throughout the film, Jedi are always happily (and effortlessly) avoiding all moving boulders, laser shots, attacks from behind, and things like that. They can easily avoid danger but rarely love. Anakin couldn’t avoid fallnig in love with Padmé, not even the force, for he was the chosen one, could help him with that matter.

So, as it’s shown over and over on all six films, the force helps you avoid bad things, not good things. This brings me back to the famous axiom that we all know to be true, but was beautifully postulated by Douglas Adams:

Axiom #1: “Nothing travels faster than light, with the possible exception of bad news, which follows its own rules”.

Now, if you’re following my thinking you’ve guessed already. Midi-chlorians are nothing more than “bad news detectors“. If you haven’t been in sync with recent physics you might be puzzled, but the fact is that faster-than-light travel actually goes back in time!

Now, if midi-chlorians are bad news detectors, it’s perfectly clear that they will detect them in the past! So, you (in the past) will detect (bad) things that are about to happen, like: “Your head was just smashed by that boulder” or “Darth Vader has just cut your head off with his light sabre.”

Being trained in the Jedi art is to be able to understand this information faster than they do actually happen, otherwise you’re a dead Jedi.

Disturbances in the force

Now, you can often see Jedi masters saying “I sense a disturbing in the force” when some serious shit happened elsewhere in the galaxy but not a single master (not even Yoda) sensed that Obi-wan was in danger when he went after Jango Fett. He had to send a crappy message through Anakin to the high council in Coruscan to be heard.

This reinforces the idea that the midi-chlorians are detectors, and that the detection quality (or precision) depends on the distance of the event and the strength of the original signal.

What about the rest?

It still doesn’t explain how a Jedi can move boulders in the first place. Nor how can the light in the light sabre be confined to a limited range and interact with light from other sources.

Indeed, it’s all connected. We cannot throw away a very good theory like that just because it doesn’t address all the points of reality. Therefore, we force reality to fit our model, as usual.

So, the light sabre is not made of light, but some condensed bad news, confined by a midi-chlorian rich crystal. As we all know, bad news comes in bunch. A bunch of bad news is worse than just bad news, so when two sabres hit each other, sparks comes out of them.

Also, some bad news won’t cope with other bad news. Like, bad news for a Jedi is a Jedi dead, which in turn, is a good news for a Sith. So, when the sabre of a Jedi clash with the sabre of a Sith, the universe conspire to not allow them to co-exist in the same space, otherwise the whole universe will cease to exist in a picosecond. Pretty much the same way the black holes protect our universe from singularities.

In a similar way, Jedi (and Sith) can lift thing just by thinking of doing some really bad things to the universe just above the boulder. That thinking will make the universe emit help signals to every Jedi or Sith around. Emissions must carry energy, otherwise they wouldn’t be detected by anyone. Loosing energy makes the space around the boulder have a negative pressure (of energy or mass) and therefore move the object towards it.

Because bad news travel in the past, you don’t have to actually do anything to the universe at all, as thinking to do something in the future will act in the past (your present) and lift the boulder right now. If you keep doing that in straight lines you can actually make it move in any direction you want.

As confirmation of this statement, watch the films again and see the faces of Jedi masters moving boulders (except for Darth Vader, of course, we can’t see his face). Yoda lifting Luke’s ship or protecting Obi-wan and Anakin from the pole Count Dooku threw at them cowardly to escape in his cool ship.

They all do pretty bad faces. They’re obviously thinking some serious shit on the universe around things. I can almost hear Yoda thinking about the air above Luke’s ship: “I’m going to transform all your atoms into plasma soup and rip your space-time continuum and mix it with a Tom Jones’ album”. That would freak me out for sure!


Review on root-kits in UK
November 12th, 2008 under Digital Rights, Politics, rengolin. [ Comments: 1 ]

Please, if you are in UK, sign this petition to investigate the legality and fairness of DRM techniques, especially root-kits such as SecuROM.


On Knowledge and Power
October 14th, 2008 under Politics, Science, World, rengolin. [ Comments: 6 ]

Knowledge was always hand in hand with power. Still is, but recently there seems to be an unusually high value, higher than most concrete things, like land. The last 10 years of the financial market shows us a bit of this shift.

In the beginning…

… there was land. The old empires, from Sumer to Mongols, had a big fixation on land. The more the better. It was quite obvious, more land always means more food, slaves and tax payers. During that period, though, much was achieved on science. Astronomy, mathematics and philosophy were the biggest advances from that age, and they had a lot to do with power, as all good kings had their own good scientists around.

But there was something missing on that connection… The scientist had the power to give advices but the kings had the power to ignore them. Astronomy was entangled with astrology, chemistry with alchemy, biology with religion… The true scientist didn’t have any power whatsoever, as kings always liked best those that would say everything would be all right.

Land is not enough (aka. knowledge goes dead)

After the first period of land-setting, when all kings and landlords had their share, it was obvious that they needed a new coin to get richer. With the collapse of the trade network during the (so called) dark ages, new forms of power came to surface. Faith was by far the greatest. The catholic church (and lots of sub-divisions of it) became the most powerful entity world has known so far. This power comes from the opposite of knowledge, unfortunately, and many scientists lost their lives fighting against it.

Other things were also highly valuable, like gold and gems, vassals and slaves, soldiers and castles. And they had lots of them. With the discovery of the new world, there was a new boost for land, but the old emperors were smart already and knew that it was only a matter of time to consume all that land.

It was actually easier to work it around, when the Spanish found an unbelievable amount of gold on Central America. Most colonialists started to ripping off their colonies of whatever they could find, and knowledge, science or wisdom is hardly ever associated with power in that period.

People have brains, after all

Renaissance fighting bravely the dark ages (for science, at least), came the age of enlightenment, when science had, once again, its role on power. Descartes, Newton and many others were not only digressing, but defining the intrinsic mechanisms of the universe. Later on, Darwin would set the final course for life, and Adam Smith and Karl Marx would define the next centuries in politics and economy.

Those thinkers had a huge power in their hands, they were shaping how we understand life, the universe and everything, but yet concrete values were still concrete. Money was more valuable than ever, fact, but land was still a stable market. Trade routes, consumers and the food market was still at the top of most governments.

Image is everything…

When we get to the 20th century that things start to get fuzzy. The consumer market turned into the most important thing in the world. More than land, trade routes and food, consumers would buy anything. If they don’t need it, you can advertise as fundamental to their existence and they will buy it. Advertisement surpassed even faith in matters of power. It doesn’t matter your religion, skin colour, place of birth, as long as you keep buying. See the fantastic Story of Stuff to know more about it.

Nevertheless, science didn’t stop being important to power. The atomic bomb and the impressive developments on computing is a clear outcome of that. It was so important that the image of the scientist changed from the weird guy in the dark room to the visionary guy on wheelchairs. The computer nerd image changed from the long-bearded-large-glasses weirdo to the charming multi-billionaire on his private cruiser.

People are now trying to become smarter for some time in order to reach this grail, a clear demonstration of this power.

But with this power also comes corruption. Like in the old days, pure science is a rare myth. Governments will always prefer to invest in science that has a solid return in money.

The last 10 years and the image of knowledge

Even though Wall Street assured the world that the crisis of 1929 would never happen again, banks lending more than they should made a new crisis this year. Other crisis happened in the late 90’s and early 2000. The problem this time is the image of knowledge.

In the late 90’s, the trust on the power of the internet (a technological and scientific concept) was so great that many old investors got greedy enough to spend millions on crap or non-existent projects. The usual risk is invest in 10 to get 1 good return, but in this case the return was close to nothing. The image was everything.

A Harvard PhD’s paper was enough to release half-million investments. Closer to the dot-com bubble’s climax, even smart kids would get their funding anyway, leading to the collapse of the whole internet and technological market.

Speaking of image, what better case than Enron’s? They sold the image of electricity, gas and even broadband! Today we call this vapour-ware, and they managed to make billions only on that. Fooling the government, all major banks in the world and consumers.

Short selling was again the cause of the new financial crisis. Blamed also for the 1929 crisis, it consist in selling something you will have in the near future, as in, selling before you buy. Two things can happen wrong with that:

  1. Chain of short-selling: when you sell to someone who will sell to another person (again and again) before you even bought it in the first place. It’s not hard to see that this is a recipe to disaster. This was responsible for the 1929 and the current crisis.
  2. Not buying at all: As soon as you get the money you don’t actually have to buy the thing anyway. You just pretend to have it and delay the delivery. You can also borrow from someone else and give it to the other, in a circular dependency and never (ever) having to actually buy anything at all. Enron and the internet bubble were cases like this.

What’s the knowledge’s role on that?

Simple: Knowledge is difficult to acquire and accumulate. It’s also quite often difficult to test and assure consistency throughout all the scientific domain. As with every single program written, it’s impossible not to have bugs. There is no such thing as a perfectly safe system.

Nobel prizes were won defining “the rules” of short-selling. When such beautiful differential equations are demonstrated by famous professors and the whole economic community laureates this very idea with a Nobel prize, it’s quite difficult to be sceptical about it.

The pen is mightier than the sword, knowledge is mightier than land. Houses lost their real value and began spiralling to imaginary prices. Banks hoping for increasing prices forgot that it was all a dream and lent more money than they had, and it all ended in this.

Quants, locked in their hedge-funds offices, with pens and computers, dictates the future of the market. They change the way we, non-investors, buy houses to live and raise families.

In turn, computer nerds define the way people buy clothes and books, search for knowledge and talk to their granny and grand-children. All this technology and science is shaping the world of tomorrow. It’s defining how we think, who we are and how our children will be.

Do your part!

If you have this power, do your part. If you’re a quant, do it with responsibility. If you’re a programmer, think about the future. Think about the world tomorrow, not just your pocket today. Freedom is more important than money. Education, health and security is more important than the financial market. Think on the planet, think green!

Above all, please be sensible. There is no win-win situations, someone (or ultimately Earth) will always loose. And the more you gain, the more they (or it) will loose.


YOU are a criminal anyway…
October 13th, 2008 under Digital Rights, Life, Politics, Web, World, rengolin. [ Comments: 1 ]

DRM sucks, we all know, but I couldn’t have expressed in a better way than that!

Of course, I’m not an artist (and he’s one of the best), but still, clear as vacuum.


Search the Web and send a girl to school
October 12th, 2008 under Media, Politics, Web, World, rvincoletto. [ Comments: 2 ]

camfed.jpg

“Most of us wish we could give more, now we can. Everyclick is a really simple way to raise money for free, just by doing something you already do” said Polly Gowers CEO, co- founder and winner of the WEBA Ethical Entrepreneur of the year 2007. “As we see it, every search that is not raising money for charity is a search wasted.”

 Everyclick.com works just like any other search engine, but allows the users to choose the charity they would like to benefit from their searching. The revenue generated for charities comes from companies that advertise on the site. There is no sign up fee or hidden charge to the user or the charity, it’s free giving.

 Charities of all sizes are benefiting from this new fundraising service; they range from Cancer Research to small village schools. If 10% of the UK online population used Everyclick.com for their searches, an additional £172,000 would be raised for charity every day.

How to raise more money for Camfed using Everyclick:

About Everyclick Charity Challenge

The Everyclick Charity Challenge enables us to raise more money and have the chance to win a poster campaign on 1500 Clear Channel Outdoor sites that will be viewed an estimated 192 million times.

The challenge runs from 15th October 2008 to 1 March 2009 during which time we will have a range of innovative ways to raise money online.


Cloud fuss and computing life
October 1st, 2008 under Computers, OSS, Web, rengolin. [ Comments: 2 ]

Lot is being said about cloud computing recently that culminated on the heated rant from Richard Stallman. As always, I agree in parts, but sometimes RMS can be a bit too reactionary.

I do completely agree that giving away your personal data to companies like Google, Yahoo!, Amazon and Microsoft is not desirable. That this puts too much power on their hands, that they own your data, your history. Problematic ownership grants such as in Second Life proved to be even worse. So, what’s the catch?

Cloud computing

In essence, cloud computing is doing to the internet what IBM did to big companies in the 60’s. They had a big server and hundreds of dumb terminals from where you could access the system, your data and history. Today’s dumb terminals are a bit smarter, though, but the cost of keeping consistent data and history on your own home desktop, work desktop, laptop, mobile phone, pda and whatever else you have that access or deals with your data is still unbearable.

Not only that, but it’s virtually impossible to build a collection of systems that works with any kind of data on any type of device running any operating system and window manager etc etc. Lots of big companies (such as Microsoft and Apple) are trying hard at it for decades and they are failing miserably, over and over.

Agreeing over standards (HTTP, XML, RDF) is one way to go but the intrinsic details of every single application and this Intellectual Property paranoia the world is facing nowadays makes it impossible for two different companies to agree with standards. That, of course, when they don’t start their own standards just for the sake of having one of their own.

On the other hand, on-line software companies like Yahoo! (at its time) and Google (today) grew bigger than those two giants doing on the internet exactly what they couldn’t do on the desktop. Cloud computing is just a beautiful name for “we keep your data safe and sound and pay us with the right to do whatever we want with it“.

Desktop era

Where I don’t agree with RMS, though, is that we should keep our desktops. No matter where you store your data, on Amazon’s S3 or on your desktop, if you don’t protect your data, it’s not yours anymore.

It’s not just easy to break into any machine or network with the required amount of work (NASA and NSA constantly owned is the ultimate proof of that), but if everyone stores data on their desktops it’ll also be worthwhile. Today, if you break into my desktop you’ll see a bunch of pictures of my family and my (already public and GPL) programs.

What’s the point? No point, my desktop today is just a cache of the internet, a fast access to data that is already public on the internet. My personal banking information is on my bank’s website (which I don’t trust, by the way) but that’s life. My emails are on my mail servers, my personal history and chats on my blog, my friends list on social websites, etc. If I was to store all that information on my desktop, that’d be a huge security breach.

Not only is easier to safeguard a bunch of servers than millions of desktops, but it’s spread out. If you break into GMail to get someone’s emails you won’t (hopefully) get his banking details. If you click on scam and loose your financial data, you won’t loose your family pictures, emails from your dead granny, and so on.

Safe cloud computing

What we need to assure is that companies like Google and Amazon not only promise to “not use your data for their own profit”, we need to make sure they will never be able to, even when they change the EULA. How? Simple, use encryption!

We need to make sure that the email service uses GPG (or any encryption/authentication scheme) not only for sending and receiving, but also for storing your emails. Google says it spoils their fantastic advertising engine and you’ll get random ads instead of targeted ads for your email. Thing is, I’m not looking for answers or searching the internet, I’m just talking to my mum! I don’t need to buy “mums on eBay at unbeatable prices“!

On-line storage is easier to work around, just be sure to encrypt the files before you send it back and forth. A simple program could do that using the API they provide to access your data.

Social networks is far more complicated, though. But the way I see is simple: it’s public data, live with it. My blog and web-pages have lots of information about me. If you google my name you’ll see lots of other sites with lots of informations about me, and I can’t do absolutely anything about it. What would be good, though, is to be able to own this data, but it’s not storing on my desktop that will help anyway.

Would be cool, though, if one could download their own information in RDF format and import it to a local tool on their desktops or to another social network. Different websites could do it automatically (some do) to exchange information about you, but again we fall into the value of data ownership, and when money is at stake, people (and companies) can get very naughty.

Conclusion

As I see, cloud computing is inevitable, either because it’s really cool or because those companies will make you believe it’s really cool. It’s not a matter of liking or not, it’s a matter of accepting it, but enforcing your rules to your own data.

I will never, ever, buy anything that has DRM, Root-kits, feedback-systems, usability limitation or anything that takes away my own freedom to own what I’ve paid for, or created myself. A song, a picture of my son, my friends list, this post or my ideas, are all owned by me, no matter where I store them.

Storing copies of my data on another server should not bestow them ownership, but they do reserve the rights to do something with it (like targeted ads). They own a copy of your data. But if you regain control of it, by encrypting everything, you take this last right away from them.

So, if you’re really concerned that Google will take profit of your start-up idea described on your email, don’t use GMail for those things, or at least encrypt what’s sensitive. If you’re concerned that Yahoo! will use your personal photos for advertising, don’t store there sensitive images.

But please, don’t go crazy, blaming a new technology, just because it’ll take away the ownership of your own data. The internet already did that, a looong time ago.

If you’re still paranoid… keep your computer safe, unplug it from the mains. Don’t take pictures, don’t blog, store all that information on your brain. Don’t talk to other people, they might use your ideas for their own benefit, or for the greater good. It’s a choice you have to make, and be consistent with it… Good luck!


OOXML update
September 23rd, 2008 under Digital Rights, OSS, Politics, Web, rengolin. [ Comments: 1 ]

A while ago I’ve posted about how crap Microsoft’s “Open” OOXML is (GPL violations and redundancy among other things).

Now the battle seems to have heated up: IBM threatened to step out ISO (via slashdot) if they don’t roll back the OOXML approval.

Well, they’re big and still a bit powerful. MS is big, but falling apart. Probably other companies would join them, especially those against.

Microsoft is not only failing technically with Vista and their web platform but also financially. They probably spent too much with .NET, Vista and stupid patents. At least the European Patent Office went on strike (I’m really amazed) because they are: “granting as many patents as possible to gain financially”. I wonder is the US patent office ever considered that…

Nevertheless, it’s always good when a big company poses against something bad and restrictive (for the future), although the reasons are seldom for the greater good. Let’s hope for the best.


On Workflows, State Machines and Distributed Services
September 21st, 2008 under Devel, Distributed, rengolin. [ Comments: none ]

I’ve been working with workflow pipelines, directly and indirectly, for quite a while now and one thing that is clearly visible is that, the way most people do it, it doesn’t scale.

Workflows (aka. Pipelines)

If you have a sequence of, say, 10 processes in a row (or graph) of dependencies and you need to run your data through each and every single one in the correct order to get the expected result, the first thing that comes to mind is a workflow. Workflows focus on data flow rather than on decisions, so there is little you can do with the data in case it doesn’t fit in your model. The result of such inconsistency generally falls in two categories: discard until further notice or re-run the current procedure until the data is correct.

Workflows are normally created ad-hoc, from a few steps. But things always grow out of proportions. If you’re lucky, these few steps would be wrapped into scripts as things grow, and you end up with a workflow of workflows, instead of a huge list of steps to take.

The benefit of a workflow approach is that you can run it at your own pace and assure that the data is still intact after each step. But the downfall is that it’s too easy to incorrectly blame the last step for some problem on your data. The status of the data could have been deteriorating over the time and the current state was the one that picked that up. Also, checking your data after each step is a very boring and nasty job and no one can guarantees that you’ll pick up every single bug anyway.

It becomes worse when the data volume increases to a level you can’t just look it up anymore. You’ll have to write syntax checkers for the intermediate results, you’ll end up with thousands of logs and scripts just to keep the workflow running. This is the third stage: meta-workflow for the workflow of workflows.

But this is not all, the worse problem is still to come… Your data is increasing by the minute, you’ll have to split it up one day and rather sooner than later. But, as long as you have to manually feed your workflow (and hope the intermediate checks work fine), you’ll have to split the data manually. If your case is simple and you can just run multiple copies of each step in parallel with each chunk of data, you’re in heaven (or rather, hell for the future). But that’s not always the case…

Now, you’ve reached a point where you have to maintain a meta-workflow for your workflow or workflows and manually manage the parallelism and collisions and checks of your data only to find out at the end that a particular piece of code was ignoring an horrendous bug in your data when it was already public.

If you want to add a new feature or change the order of two processes… well… 100mg of prozac and good luck!

Refine your Workflow

Step 1: Get rid of manual steps. The first rule is that there can be no manual step, ever. If the final result is wrong, you turn on the debugger, read the logs, find the problem, fix it and turn it off again. If you can’t afford to have wrong data live than write better checkers or rather reduce the complexity of your data.

One way to reduce the complexity is to split the workflow into smaller independent workflows, which each one generate only a fraction of your final data. If you have a mission critical environment, you better off with 1/10th of it broken than with the whole. Nevertheless, try to reduce the complexity on your pipeline, data structure and dependencies. When you have no change to do, re-think about the whole workflow, I’m sure you’ll find lots of problems every iteration.

Step 2: Unitize each step. The important rule here is: each step must process one, and only one, piece of data. How does it help? Scalability.

Consider a multiple data workflow (fig. 1 below), where you have to send the whole thing through, every time. If one of the process is much slower than the others, you’ll have to split your data for that particular step and join it again for the others. Splitting your data once at the beginning and running multiple pipelines at the same time is a nightmare as you’ll have to deal with the scrambled error messages yourself, especially if you still have manual checks around.

Multiple data Workflow
Figure 1: Split/Join is required each parallel step

On the other hand, if only one unit passes through each step (fig. 2), there is no need to split or join them and you can run as many parallel processes as you want.

Single data Workflow
Figure 2: No need to split/join

Step 3: Use simple and stupid automatic checks. If possible, don’t code anything at all.

If data must be identical on two sides, run a checksum (CRC sucks, MD5 is good). If the syntax needs to be correct, run a syntax checker, preferably a schema/ontology based automatic check. If your file is too complex or specially crafted so you need a special syntax check, re-write your file to use standards (XML and RDF are good).

Another important point on automatic checking is that you don’t have to check your emails waiting for an error message. When the subject contains the error message is already a pain, but when you have to grep for errors inside the body of the message? Oh god! I’ve lost a few lives already because of that…

Only mail when a problem occurs, only send the message related to the specific problem. It’s ok to send a weekly or monthly report just in case the automatic checks miss something. Go on and check the data for yourself once in a while and don’t worry, if things really screw up your users will let you know!

Automation

But, what’s the benefit of allowing your pipeline to automatically compute individual values and check for consistency if you still have to push the buttons? What you want now is a way of having more time to look at the pipeline flowing and fix architectural problems (and longer tea time breaks), rather than putting down the fire all the time. To calculate how many buttons you’ll press just multiply the number of data blocks you have by the number of steps… It’s a loooong way…

If still, you like pressing buttons, that’s ok. Just skip step 2 above and all will be fine. Otherwise, keep reading…

To automate your workflow you have two choices: either you fire one complete workflow for each data block or do it like a workflow of data through different services.

Complete Workflow: State Machines

If your data is small, seldom or you just like the idea, you can use a State machine to build a complete workflow for each block of data. The concept is rather simple, you receive the data and fire it through the first state. The machine will carry on, sending the changed data through all necessary states and, at the end, you’ll have your final data in-place, checked and correct.

UML is pretty good on defining state machines. For instance, you can use a state diagram to describe how your workflow is organised, class diagrams to show how each process is constructed and sequence diagrams to describe how processes talk to each other (preferably using a single technology). With UML, you can generate code and vice-versa, making it very practical for live changes and documentation purposes.

The State Design Pattern allows you to have a very simple model (each state is of the same type) with only one point of decision where to go next (when changing states): the state itself. This gives you the power to change the connections between the states easily and with very (very) little work. It’ll also save you a lot on prozac.

If you got this far you’re really interested on workflows or state machines, so I assume you also have a workflow of your own. If you do, and it’s a mess, I also believe that you absolutely don’t want to re-code all your programs just to use UML diagrams, queues and state machines. But you don’t need to.

Most programming languages allow a shell to be created and an arbitrary command to be executed. You can then manage the inter-process administration (creating/copying files, fifos, flags, etc), execute the process and, at the end, check the data and choose the next step (based on the current state of your data).

This methodology is simple, powerful and straight-forward, but it comes with a price. When you got too many data blocks flowing through, you end up with lots of copies of the same process being created and destroyed all the time. You can, however let the machine running and only provide data blocks, but still this doesn’t scale as we wanted on step 2 above.

Layered Workflow: Distributed Services

Now comes the holy (but very complex) grail of workflows. If your data is huge, constant flowing, CPU demanding and with awkward steps in between, you need to program thinking on parallelism. The idea is not complex, but the implementation can be monstrous.

On figure 2 above, you have three processes, A, B and C, running in sequence, and process B had two copies running because it took twice as long as A and C. It’s that simple: the more it takes to finish, more copies you run in parallel to make the flow constant. It’s like sewage pipes, rain water can flow in small pipes, but house waste will need much bigger ones. but later on, when you filter the rubbish, you can use small pipes back again.

So, what’s so hard on implementing this scenario? Well, first you have to take into account that those processes will be competing for resources. If they’re on the same machine, CPU will be a problem. If you have a dual core, the four processes above will share CPU, not to mention memory, cache, bus etc. If you use a cluster, they’ll all compete for network bandwidth and space on shared filesystems.

So, the general guidelines for designing robust distributed automatic workflows are:

  • Use layered state architecture. Design your machine in layers, separate the layers into machines or groups of machines and put a queue or a load-balancer in between each layer (state). This will allow you to scale much easier as you can add more hardware to a specific layer without impacting on others. It also allows you to switch off defective machines or do any maintenance on them with zero down-time.
  • One process per core. Don’t spawn more than one process per CPU as this will impact in performance in more ways than you can probably imagine. It’s just not worth it. Reduce the number of processes or steps or just buy more machines.
  • Use generic interfaces. Use the same queue / load-balancer for all state changes and, if possible, make their interfaces (protocols) identical so the previous state doesn’t need to know what’s on the next and you can change from one to another with zero cost. Also, make the states implement the same interface in the case you don’t need queues or load-balancers for a particular state.
  • Include monitors and health checks in your design. With such complex architecture it’s quite easy to ignore machines or processes failing. Separate reports into INFO, WARNING and ERROR and give them priorities or different colours on a web interface and mail or SMS only the errors to you.

As you can see, by providing a layered load-balancing, you’re getting performance and high-availability for free!

Every time data piles up in one layer, just increase the number of processes on it. If a machine breaks, it’ll be off the rotation (automatic if using queues, ping-driven or MAC-driven for load-balancers). Updating the operating system, your software or anything on the machine is just a matter of taking it out, updating, testing and deploying back again. Your service will never be off-line.

Of course, to get this benefit for real you have to remove all single-points-of-failure, which is rarely possible. But to a given degree, you can get high-performance, high-availability, load-balancing and scalability at a reasonable low cost.

The initial cost is big, though. Designing such complex network and its sub-systems, providing all safe-checks and organizing a big cluster is not an easy (nor cheap) task and definitely not done by inexperienced software engineers. But once it’s done, it’s done.

More Info

Wikipedia is a wonderful source of information, especially for the computer science field. Search for workflows, inter-process communication, queues, load-balancers, commodity clusters, process-driven applications, message passing interfaces (MPI, PVM) and some functional programming like Erlang and Scala.

It’s also a good idea to look for UML tools that generates code and vice-versa, RDF ontologies and SPARQL queries. XML and XSD is also very good for validating data formats. If you haven’t yet, take a good look on design patterns, especially the State Pattern.

Bioinformatics and internet companies have a particular affection to workflows (hence my experience) so you may find numerous examples on both fields.


Shortlist for Computer Awards Announced
September 15th, 2008 under Author, Technology, rvincoletto. [ Comments: 2 ]

Just a quick note to say Computer Awards has announced their shortlist for this year… and guess what… they think I deserve to be between the eight finalists…

Who knows… The winners will be announced at a glittering prize-giving ceremony to be held on 5 November.

Fingers crossed!


Intel’s Game Demo Contest announce winners
September 15th, 2008 under Devel, OSS, Software, rengolin. [ Comments: none ]

…and our friend Mauro Persano won in two categories: 2nd on Intel graphics and 5th on best game on the go.

The game, Protozoa, is a retro Petri-dish style frenetic shooting-the-hell-out-of-the bacteria, virii and protozoa stuff that comes in your way. You can play with a PS2 (two-analogue sticks) control, one for the movements and other for the shooting, or just use the keyboard. The traditional timed-power-up and megalomaniac explosions raise even more the sense of nostalgia.

You can download the latest Windows version here but don’t worry, it also runs pretty fine with Wine.

Have fun!


« Previous entries