header image
Game Theory and the fate of a generation
May 24th, 2013 under Life, rengolin. [ Comments: none ]

An interesting though came up via Bruce Schneier’s blog that got me thinking, and having trouble educating my pre-teen child, that thought grew on me and now many of his behaviours can be explained by the inability of spotting which game to play in real life.

When I finally had this same conversation with him, a whole model of how much of a failure our society is becoming, appeared clear as day for both of us!

What games do we play?

First, a crash course on game theory, you can skip this part if you already know. Basically, a game is played between two players who can take decisions based on what they think the other player will do, and points are given whether you cooperate or not in conjunction with the other players cooperation or not. For example, if both cooperate, both get 5 points. If one cooperates and the other doesn’t, the cheater gets 7 points and the looser gets 0. If they’re both cheaters, both get 1 point.

Well, since you have no idea what the other will choose, there’s 50% chance that the other player will cooperate and 50% that she will not. If you choose to cooperate, you have 50% chance of getting 5 points and 50% of getting zero. If you don’t, it’s 50% 7 points and 50% 1 point.

Clearly, if you play the game only once, cheating is the answer. There is no reason not to cheat. However, if you’ll have to play the same game with the same player more than once, possibly your whole life, than, well, cheating tires quickly. If you cheat now, the other player will cheat next, and both of you will remain cheating forever, since you know that if you don’t, by definition, you’ll get 0 points and she will get 7. We call this a stable solution, once you get there, there’s no coming back.

However, if both cooperate, both get 5, and as long as you both cooperate, you’ll always get 5. Sure, it’s not as profitable as 7, but it’s close enough. But as soon as one cheats, the other will feel betrayed, and will cheat. We call this an unstable solution. It demands trust on the other player, and as soon as the trust is broken, it’ll be very hard to regain it.

If that made you think about how life treats you, it’s no coincidence. John Nash used that language to describe reality, and he could clearly see reality like better than most of us. When John Nash says that “life is a game”, he truly means it, and he came up with the mathematical notation to prove it, and studied it to great length.

Video Games

In the beginning, there was pong. Pong was simple and fun. Then, the explosion of video games in the 80′s brought a lot of easy and hard games, but in almost all of them you had to work hard to get the prize. Some of then didn’t even had a prize, it was just an infinite number of repetitions, faster and faster, and the real competition was among the players, who got the best score.

The real game, however, was not on the screen, was on the player’s brain. Those games have conditioned people that there is a prize, and there is a task, and they are related. If they perform the task better than a certain threshold, the prize is bronze/silver/gold. It feels really good to get a prize, and that way of making people feel good (or bad) was found a century ago by Ivan Pavlov.

But video games is as much Pavlovian as street games. They’re as innocent and as powerful as any Olympic game on the minds of people. Video games use a different part of the body, the brain, and for that it was much more popular amongst nerds than sporty types. They had found a niche, at least before the 90′s arrived, when a boom of consoles, PCs and 3D graphics made video games mainstream, with every house having at least one type of video game.

That boom had little change in the shape of how the games were teaching children about the world. There was still a task, a reward, and some work to do. Even though, by the end of the day, any task you performed during the game was worthless in real life, what you learnt, that is that you need to perform a task well enough to get a prize, and that the prize is proportional to the hard work you put in, was learnt for life.

Social Gaming

Enter the era of social gaming. Zynga and other Facebook games were made not to entertain, or to give prizes for specific tasks, but to reward the most socially active player. All that, of course, in order to give Facebook a boost in user numbers (and Zynga a boost in fake value), but that not only changed how games were played, but it changed the lessons that we learnt from them.

On a social game, since the objective is to share more than others, you’ll get things for free to share with your friends, who would also get free stuff to share with you. It means that, whoever got the most “friends”, got most free stuff, and progressed faster and longer in the game. What it’s teaching you is that you don’t need to work hard for something, you just need to convince people to give you for free, or even worse, you just need to wait to get it, because it’s the player’s right to receive.

Now, what children are learning with these games is that they don’t need to work hard for anything, because they have the right to be happy, the right to be fed, the right to be given jobs, or be subsided by the government.

If that sounds a lot like reality, well, welcome to the brave new world!

Addiction

So, we know how powerful Facebook is, and much of that came from the games section at the beginning, that forced people to spend more time on Facebook than on real life, and now it’s just an addiction that they cannot get free. The reason why it’s an addiction is the very same why Heroine is an addiction.

Whenever you use a psychotropic drug, your brain goes to a state that is not real. Whatever you feel, whatever you see is not real. You can see good things, or bad things, and that will change how your addiction will continue, but some drugs are more powerful that that. For instance, tobacco changes the concentration that your brain and peripheral nervous system respond to neurotransmitters, and that’s because nicotine is a joker in the land of neurotransmitters. It can trigger more than half of the different types of receptors in your body. Whenever you lower that concentration (by abstaining), your body doesn’t react like you would want, and you have withdraw, which compels you to smoke again.

Most drugs have the same effect, including easy over-rewarding video games. Note that not all video games act like drugs, it’s just the specific class of games where you get more than what you deserve for the amount of work you put in. And that’s the same kind of addiction that people have with films, series, books and anything that will take you away from the harsh reality into a land of dreams where you are more than you can actually accomplish (super hero) or you have accomplished more than you actually worked for it (fantasy and feel-good stories).

The crucial bit here is that, going back to reality is hard, painful and has a deep feeling of loss, since all the “hard work” you put during the game/film/book is gone and worthless. That feeling puts you into a dilemma: now that you lost a lot of time in reality that you could be doing something useful, while other people are already harvesting the fruits of their own works (a younger child playing piano or solving puzzles you cannot), you’ll have to work much harder to achieve the same level. Whereas, if you go back to the game, you’ll get instant satisfaction with very little effort. If you have no responsibilities in your life, the choice is easy.

Conflict

This creates a conflict with the parents because, not only they had to work hard for upbringing their children on the best environment possible, but they’re also seeing their children wast their time on a false reality while not understanding why the parent’s reality is so different from their own.

I played video games since I was very young and still play them constantly, but I simply cannot play social games. They feel wrong, false and demeaning of the very hard work that I learnt as a kid to foster. Moreover, they remind me of the kind of society we live today where children can’t fail.

For example, in Brazil, not enough people were reaching universities because they would fail so many times that they’d drop school and never bother. How do you fix this? Simple, make a law where kids younger than 10 cannot fail. Ever. Well, surprise, they reached 10 without being able to read or write, and that’s the state’s fault, so how do you fix this? Even simpler, pass a law where kids under 15 cannot fail. You get the idea.

This over protection that schools have on kids, society trying to avoid the problems of growing up and taking responsibility until very late, is possibly responsible for the increase in criminality of the new youth and the will of some people to reduce the criminal age to 16. It’s not hard to see that, again, that solution is only going to make things worse by treating children like adults without given them a chance to understand adulthood before it’s too late.

Game playing society

Since social gaming became so mainstream a few years ago, people started noticing how to use that for benefit and profit. Real life games, like fourSquare give you prizes for over-consumption, on the grounds that sharing your personal information is worthless for you, but not for them. Games where you feel you’re giving a worthless commodity (your privacy) for big rewards (a cup of coffee) but in reality the companies are getting the real profit (your private information) is where our society is leading and it doesn’t seem to bother many people.

We are already brainwashed to believe that sharing personal emails with Google is ok, as long as they keep the servers up. We put our credit card numbers on Amazon for the comfort of not having to type them so often on the trust that they will protect your data as if it was their own. We already believe that the cloud is the best place to store your photos, documents and music. While all of that looks free to you, it’s far from. It’s all a game, where you are being cheated while willingly cooperating, but they keep your profit positive (albeit small), so that you feel valued.

We already let our guard down, we’re living in that fantasy where we don’t have to work hard for anything, convinced ourselves that the profit is ours and in this fantasy world, we’re great. Easy pray to an ever relaxing predators. Maybe that will be the end of them… I hope.

Playing the wrong game

Now we pause to go back to the main theme: why people play a one-off game when they should actually play a rolling game?

100 years ago, justice wasn’t very just. Judge and executioner were often the same person, and people paid a lot more than they should for crimes that they may have not even committed. But as bad as it was, that taught a lesson to most people that the odds of cheating weren’t that great. The price was too high, and they’d see it far too often.

Years pass, people agree that totalitarian regimes are not nice and we come with democracy, republics and other less radical governments. Now, people have rights, inalienable and universal. Governments have to protect people, and people can now be what they want, follow their dreams and collect the fruits of their hard work. And the more educated people get, the more they realise they can get more rights.

In itself, having rights is the right thing to do (pun intended), but there has to be a balance, and the balance is the social interactions. Your rights are the same as everyone else, and you can’t just do what you “want”, but what you have the right to do. Well, clever people can turn those concepts around and they will cheat, and they will profit. Because they have to be protected by law, they will find ways of abusing the system short of breaking the law. If they get caught, the price is high, but since they have more rights than duties, and since justice is less impressive (but more just) nowadays, the feeling of cost and profits are skewed, so people cheat more often that they would if thinking straight.

We can’t have the concept of born rights without having the concept of born duties. You have the right to education, but you also have the duty to follow it through, no matter how hard it seems. It’s the teachers’ duty to do their best to make it more efficient (not easier), but it’s also their right to chose what they think it’s best for the kids. If rights and duty don’t go hand-in-hand, you get a lazy generation that thinks other people have to do whatever they want. Today, children will think that of their parents, what about tomorrow? Will they expect that their children will have to work for them? Or their brothers? It doesn’t add up. They’re not playing a rolling game, but a one-off one.

When you thrown the over-rewarding games into the mix, you get kids learning that they can just be lazy and the world will fix it for them while they get cheap happiness on their tablets. They’re cheating the system that protects them until they turn 18 when the system will just abandon them, and the hard reality will hit them in the face with no preparedness and no warning. Some survive, some don’t. Would you take a chance with your children?


Uno score keeper
March 31st, 2013 under Devel, OSS, rengolin, Software. [ Comments: none ]

With the spring not coming soon, we had to improvise during the Easter break and play Uno every night. It’s a lot of fun, but it can take quite a while to find a piece of clean paper and a pen that works around the house, so I wondered if there was an app for that. It turns out, there wasn’t!

There were several apps to keep card game scores, but every one was specific to the game, and they had ads, and wanted access to the Internet, so I decided it was worth it writing one myself. Plus, that would finally teach me to write Android apps, a thing I was delaying to get started for years.

The App

Adding new players

Card Game Scores

The app is not just a Uno score keeper, it’s actually pretty generic. You just keep adding points until someone passes the threshold, when the poor soul will be declared a winner or a loser, depending on how you set up the game. Since we’re playing every night, even the 30 seconds I spent re-writing our names was adding up, so I made it to save the last game in the Android tuple store, so you can retrieve it via the “Last Game” button.

It’s also surprisingly easy to use (I had no idea), but if you go back and forth inside the app, it cleans the game and start over a new one, with the same players, so you can go on as many rounds as you want. I might add a button to restart (or leave the app) when there’s a winner, though.

I’m also thinking about printing the names in order in the end (from victorious to loser), and some other small changes, but the way it is, is good enough to advertise and see what people think.

If you end up using, please let me know!

Download and Source Code

The app is open source (GPL), so rest assured it has no tricks or money involved. Feel free to download it from here, and get the source code at GitHub.


Distributed Compilation on a Pandaboard Cluster
February 13th, 2013 under Devel, Distributed, OSS, rengolin. [ Comments: 2 ]

This week I was experimenting with the distcc and Ninja on a Pandaboard cluster and it behaves exactly as I expected, which is a good thing, but it might not be what I was looking for, which is not. ;)

Long story short, our LLVM buildbots were running very slow, from 3 to 4.5 hours to compile and test LLVM. If you consider that at peak time (PST hours) there are up to 10 commits in a single hour, the buildbot will end up testing 20-odd patches at the same time. If it breaks in unexpected ways, of if there is more than one patch on a given area, it might be hard to spot the guilty.

We ended up just avoiding the make clean step, which put us around 15 minutes build+tests, with the odd chance of getting 1 or 2 hours tops, which is a great deal. But one of the alternatives I was investigating is to do a distributed build. More so because of the availability of cluster nodes with dozens of ARM cores inside, we could make use of such a cluster to speed up our native testing, even benchmarking on a distributed way. If we do it often enough, the sample might be big enough to account for the differences.

The cluster

So, I got three Pandaboards ES (dual Cortex-A9, 1GB RAM each) and put the stock Ubuntu 12.04 on them and installed the bare minimum (vim, build-essential, python-dev, etc), upgraded to the latest packages and they were all set. Then, I needed to find the right tools to get a distributed build going.

It took a bit of searching, but I ended up with the following tool-set:

  • distcc: The distributed build dispatcher, which knows about the other machines in the cluster and how to send them jobs and get the results back
  • CMake: A Makefile generator which LLVM can use, and it’s much better than autoconf, but can also generate Ninja files!
  • Ninja: The new intelligent builder which not only is faster to resolve dependencies, but also has a very easy way to change the rules to use distcc, and also has a magical new feature called pools, which allow me to scale job types independently (compilers, linkers, etc).

All three tools had to be compiled from source. Distcc’s binary distribution for ARM is too old, CMake’s version on that Ubuntu couldn’t generate Ninja files and Ninja doesn’t have binary distributions, full stop. However, it was very simple to get them interoperating nicely (follow the instructions).

You don’t have to use CMake, there are other tools that generate Ninja files, but since LLVM uses CMake, I didn’t have to do anything. What you don’t want is to generate the Ninja files yourself, it’s just not worth it. Different than Make, Ninja doesn’t try to search for patterns and possibilities (this is why it’s fast), so you have to be very specific on the Ninja file on what you want to accomplish. This is very easy for a program to do (like CMake), but very hard and error prone for a human (like me).

Distcc

To use distcc is simple:

  1. Replace the compiler command by distcc compiler on your Ninja rules;
  2. Set the environment variable DISTCC_HOSTS to the list of IPs that will be the slaves (including localhost);
  3. Start the distcc daemon on all slaves (not on the master): distccd --daemon --allow <MasterIP>;
  4. Run ninja with the number of CPUs of all machines + 1 for each machine. Ex: ninja -j6 for 2 Pandaboards.

A local build, on a single Pandaboard of just LLVM (no Clang, no check-all) takes about 63 minutes. With distcc and 2 Pandas it took 62 minutes!

That’s better, but not as much as one would hope for, and the reason is a bit obvious, but no less damaging: The Linker! It took 20 minutes to compile all of the code, and 40 minutes to link them into executable. That happened because while we had 3 compilation jobs on each machine, we had 6 linking jobs on a single Panda!

See, distcc can spread the compilation jobs as long as it copies the objects back to the master, but because a linker needs all objects in memory to do the linking, it can’t do that over the network. What distcc could do, with Ninja’s help, is to know which objects will be linked together, and keep copies of them on different machines, so that you can link on separate machines, but that is not a trivial task, and relies on an interoperation level between the tools that they’re not designed to accept.

Ninja Pools

And that’s where Ninja proved to be worth its name: Ninja pools! In Ninja, pools are named resources that bundle together with a specific level of scalability. You can say that compilers scale free, but linkers can’t run more than a handful. You simply need to create a pool called linker_pool (or anything you want), give it a depth of, say, 2, and annotate all linking jobs with that pool. See the manual for more details.

With the pools enabled, a distcc build on 2 Pandaboards took exactly 40 minutes. That’s 33% of gain with double the resources, not bad. But, how does that scale if we add more Pandas?

How does it scale?

To get a third point (and be able to apply a curve fit), I’ve added another Panda and ran again, with 9 jobs and linker pool at 2, and it finished in 30 minutes. That’s less than half the time with three times more resources. As expected, it’s flattening out, but how much more can we add to be profitable?

I don’t have an infinite number of Pandas (nor I want to spend all my time on it), so I just cheated and got a curve fitting program (xcrvfit, in case you’re wondering) and cooked up an exponential that was close enough to the points and use the software ability to do a best fit. It came out with 86.806*exp(-0.58505*x) + 14.229, which according to Lybniz, flattens out after 4 boards (about 20 minutes).

Pump Mode

Distcc has a special mode called pump mode, in which it pushes with the C file, all headers necessary to compile it solely on the node. Normally, distcc will pre-compile on the master node and send the pre-compiled result to the slaves, which convert to object code. According to the manual, this could improve the performance 10-fold! Well, my results were a little less impressive, actually, my 3-Panda cluster finished in just about 34 minutes, 4 minutes more than without push mode, which is puzzling.

I could clearly see that the files were being compiled in the slaves (distccmon-text would tell me that, while there was a lot of “preprocessing” jobs on the master before), but Ninja doesn’t print times on each output line for me to guess what could have slowed it down. I don’t think there was any effect on the linker process, which was still enabled in this mode.

Conclusion

Simply put, both distcc and Ninja pools have shown to be worthy tools. On slow hardware, such as the Pandas, distributed builds can be an option, as long as you have a good balance between compilation and linking. Ninja could be improved to help distcc to link on remote nodes as well, but that’s a wish I would not press on the team.

However, scaling only to 4 boards will reduce a lot of the value for me, since I was expecting to use 16/32 cores. The main problem is again the linker jobs working solely on the master node, and LLVM having lots and lots of libraries and binaries. Ninja’s pools can also work well when compiling LLVM+Clang on debug mode, since the objects are many times bigger, and even on above average machine you can start swapping or even freeze your machine if using other GUI programs (browsers, editors, etc).

In a nutshell, the technology is great and works as advertised, but with LLVM it might not be yet the thing. It’s still more profitable to get faster hardware, like the Chromebooks, that are 3x faster than the Pandas and cost only marginally more.

Would also be good to know why the pump mode has regressed in performance, but I have no more time to spend on this, so I leave as a exercise to the reader. ;)


LLVM Vectorizer
February 12th, 2013 under Algorithms, Devel, rengolin. [ Comments: 2 ]

Now that I’m back working full-time with LLVM, it’s time to get some numbers about performance on ARM.

I’ve been digging the new LLVM loop vectorizer and I have to say, I’m impressed. The code is well structured, extensible and above all, sensible. There are lots of room for improvement, and the code is simple enough so you can do it without destroying the rest or having to re-design everything.

The main idea is that the loop vectorizer is a Loop Pass, which means that if you register this pass (automatically on -O3, or with -loop-vectorize option), the Pass Manager will run its runOnLoop(Loop*) function on every loop it finds.

The three main components are:

  1. The Loop Vectorization Legality: Basically identifies if it’s legal (not just possible) to vectorize. This includes checking if we’re dealing with an inner loop, and if it’s big enough to be worth, and making sure there aren’t any conditions that forbid vectorization, such as overlaps between reads and writes or instructions that don’t have a vector counter-part on a specific architecture. If nothing is found to be wrong, we proceed to the second phase:
  2. The Loop Vectorization Cost Model: This step will evaluate both versions of the code: scalar and vector. Since each architecture has its own vector model, it’s not possible to create a common model for all platforms, and in most cases, it’s the special behaviour that makes vectorization profitable (like 256-bits operations in AVX), so we need a bunch of cost model tables that we consult given an instruction and the types involved. Also, this model doesn’t know how the compiler will lower the scalar or vectorized instructions, so it’s mostly guess-work. If the vector cost (normalized to the vector size) is less than the scalar cost, we do:
  3. The Loop Vectorization: Which is the proper vectorization, ie. walking through the scalar basic blocks, changing the induction range and increment, creating the prologue and epilogue, promote all types to vector types and change all instructions to vector instructions, taking care to leave the interaction with the scalar registers intact. This last part is a dangerous one, since we can end up creating a lot of copies from scalar to vector registers, which is quite expensive and was not accounted for in the cost model (remember, the cost model is guess-work based).

All that happens on a new loop place-holder, and if all is well at the end, we replace the original basic blocks by the new vectorized ones.

So, the question is, how good is this? Well, depending on the problems we’re dealing with, vectorizers can considerably speed up execution. Especially iterative algorithms, with lots of loops, like matrix manipulation, linear algebra, cryptography, compression, etc. In more practical terms, anything to do with encoding and decoding media (watching or recording videos, pictures, audio), Internet telephones (compression and encryption of audio and video), and all kinds of scientific computing.

One important benchmark for that kind of workload is Linpack. Not only Linpack has many examples of loops waiting to be vectorized, but it’s also the benchmark that defines the Top500 list, which classifies the fastest computers in the world.

Benchmarks

So, both GCC and Clang now have the vectorizers turned on by default with -O3, so comparing them is as simple as compiling the programs and see them fly. But, since I’m also interested in seeing what is the performance gain with just the LLVM vectorizer, I also disabled it and ran a clang with only  -O3, no vectorizer.

On x86_64 Intel (Core i7-3632QM), I got these results:

Compiler Opt Avg. MFLOPS Diff
Clang -O3 2413 0.0%
GCC -O3 vectorize 2421 0.3%
Clang -O3 vectorize 3346 38.6%

This is some statement! The GCC vectorizer exists for a lot longer than LLVM’s and has been developed by many vectorization gurus and LLVM seems to easily beat GCC in that field. But, a word of warning, Linpack is by no means representative of all use cases and user visible behaviour, and it’s very likely that GCC will beat LLVM on most other cases. Still, a reason to celebrate, I think.

This boost mean that, for many cases, not only the legality if the transformations are legal and correct (or Linpack would have gotten wrong results), but they also manage to generate faster code at no discernible cost. Of course, the theoretical limit is around 4x boost (if you manage to duplicate every single scalar instruction by a vector one and the CPU has the same behaviour about branch prediction and cache, etc), so one could expect a slightly higher number, something on the order of 2x better.

It depends on the computation density we’re talking about. Linpack tests specifically the inner loops of matrix manipulation, so I’d expect a much higher ratio of improvement, something around 3x or even closer to 4x. VoIP calls, watching films and listening to MP3 are also good examples of densely packet computation, but since we’re usually running those application on a multi-task operating system, you’ll rarely see improvements higher than 2x. But general applications rarely spend that much time on inner loops (mostly waiting for user input and then doing a bunch of unrelated operations, hardly vectorizeable).

Another important aspect of vectorization is that it saves a lot of battery juice. MP3 decoding doesn’t really matter if you finish in 10 or 5 seconds, as long as the music doesn’t stop to buffer. But taking 5 seconds instead of 10 means that on the other 5 seconds the CPU can reduce its voltage and save battery. This is especially important in mobile devices.

What about ARM code?

Now that we know the vectorizer works well, and the cost model is reasonably accurate, how does it compare on ARM CPUs?

It seems that the grass is not so green on this side, at least not at the moment. I have reports that on ARM it also reached the 40% boost similar to Intel, but what I saw was a different picture altogether.

On a Samsung Chromebook (Cortex-A15) I got:

Compiler Opt Avg. MFLOPS Diff
Clang -O3 796 0.0%
GCC -O3 vectorize 736 -8.5%
Clang -O3 vectorize 773 -2.9%

The performance regression can be explained by the amount of scalar code intermixed with vector code inside the inner loops as a result of shuffles (movement of data within the vector registers and between scalar and vector registers) not being lowered correctly. This most likely happens because the LLVM back-end relies a lot on pattern-matching for instruction selection (a good thing), but the vectorizers might not be producing the shuffles in the right pattern, as expected by each back-end.

This can be fixed by tweaking the cost model to penalize shuffles, but it’d be good to see if those shuffles aren’t just mismatched against the patterns that the back-end is expecting. We will investigate and report back.

Update

Got results for single precision floating point, which show a greater improvement on both Intel and ARM.

On x86_64 Intel (Core i7-3632QM), I got these results:

Compiler Opt Avg. MFLOPS Diff
Clang -O3 2530 0.0%
GCC -O3 vectorize 3484 37.7%
Clang -O3 vectorize 3996 57.9%

On a Samsung Chromebook (Cortex-A15) I got:

Compiler Opt Avg. MFLOPS Diff
Clang -O3 867 0.0%
GCC -O3 vectorize 788 -9.1%
Clang -O3 vectorize 1324 52.7%

Which goes on to show that the vectorizer is, indeed, working well for ARM, but the costs of using the VFP/NEON pipeline outweigh the benefits. Remember than NEON vectors are only 128-bits wide and VFP only 64-bit wide, and NEON has no double precision floating point operations, so they’ll only do one double precision floating point operations per cycle, so the theoretical maximum depends on the speed of the soft-fp libraries.

So, in the future, what we need to be working is the cost model, to make sure we don’t regress in performance, and try to get better algorithms when lowering vector code (both by making sure we match the patterns that the back-end is expecting, and by just finding better ways of vectorizing the same loops).

Conclusion

Without further benchmarks it’s hard to come to a final conclusion, but it’s looking good, that’s for sure. Since Linpack is part of the standard LLVM test-suite benchmarks, fixing this and running it regularly on ARM will at least avoid any further regressions… Now it’s time to get our hands dirty!

 


Hypocrite Internet Freedom
December 11th, 2012 under Digital Rights, Politics, rengolin, Web, World. [ Comments: none ]

Last year, the Internet has shown its power over governments, when we all opposed to the SOPA and PIPA legislations in protests across the world, including this very blog. Later on, against ACTA and so on, and we all felt very powerful indeed. Now, a new thread looms over the Internet, the ITU is trying to take over the Internet.

To quote Ars Technica:

Some of the world’s most authoritarian regimes introduced a new proposal at the World Conference on International Telecommunications on Friday that could dramatically extend the jurisdiction of the International Telecommunication Union over the Internet.

Or New Scientist:

This week, 2000 people have gathered for the World Conference on International Telecommunications (WCIT) in Dubai in the United Arab Emirates to discuss, in part, whether they should be in charge.

And stressing that:

WHO runs the internet? For the past 30 years, pretty much no one.

When in reality, the Internet of today is actually in the precise state the US is trying to avoid, only that now they’re in control, and the ITU is trying to change it to an international organization, where more countries have a say.

Today, the DNS and the main IP blocks are controlled by the ICANN, however, Ars Technica helps us reminding that ICANN and IANA are:

the quasi-private organizations that currently oversee the allocation of domain names and IP addresses.

But the ICANN was once a US government operated body, still with strong ties with Washington, localized solely on the US soil, operating on US law jurisdiction. They also failed on many accounts to democratize their operations, resulting in little or no impact for international input. Furthermore, all top level domains that are not bound to a country (like .com, .org, .net) are also within American jurisdiction, even if they’re hosted and registered in another country.

But controlling the DNS is only half the story. The control that the US has on the Internet is much more powerful. First, they hold (for historical and economical reasons), most of the backbone of the Internet (root DNS servers, core routers, etc). That means the traffic between Europe and Japan will probably pass through them. In theory, this shouldn’t matter and it’s actually an optimization of the self-structuring routing tables, but in fact, the US government has openly reported that they do indeed monitor all traffic that goes within their borders and they do reserve the right to cut it, if they think this presents a risk of national security.

Given the amount of publicity the TSA had since 2001 for their recognition of what poses a security threat, including Twitter comments from British citizens, I wouldn’t trust them, or their automated detection system to care for my security. Also, given the intrusion that they have on some governments like the case of Dotcom in January, where national security operations in New Zealand were shared inappropriately with the American government, I never felt safe when crossing American soil, physically or through the Internet.

Besides, Hollywood has shown in Scandinavia and in UK that they hold a strong leash on European governments when related to (US) copyright laws, forcing governments, once liberals, to abide to American rules, arresting their own citizens, when content is being distributed over the Internet. It’s also interesting to remember than SOPA, PIPA and ACTA, mainly driven by Hollywood, were all created within closed doors.

So, would ITU control be better?

No. Nothing could be further from the truth. Although, in theory, it’s more democratic (more countries with decision power), this decision power has been sought for one main purpose: to enforce more strict laws. I generally agree that the ITU would not be a good controlling body, but believing that nobody controls the Internet is, at least, naive, and normally a pretentious lie.

A legal control of many countries over something as free as the Internet would impose the same dangers as having it free of legal control, since it leaves us with indirect control from the strongest player, which so far, has been the US. The other countries are only so strongly minded about the ITU because the US won’t let them have their voices, and the ITU is a way to create an UN for the Internet.

In that sense, the ITU would be a lot like the UN. Worthless. A puppet in the hands or the strong players. Each country would have more control over their borders, and that would impact almost nothing in the US, but the general rules would stop being valid, and the US (and other countries) would have to do a lot more work than they do today. One example is the stupid rule in the UK where the sites, including international ones, have to warn users that they are using cookies.

Don’t be fooled, the US government is not really worried about your safety and security, nor your freedom. They’re trying to avoid a lot of work, and a big loss in market in the Middle East and South Asia. With countries (that they like to say are authoritarian regimes) imposing stricter rules on traffic, including fees, taxes and other things that they have on material goods, the commerce with those governments will be a lot more expensive.

Ever since the second world war, the US economy is based mainly on military activities. First, helping Europe got them out of the big depression, then they forced rebellions throughout Latin America to keep the coins clinking and currently, it’s the Middle East. With the climate change endangering their last non-war resources (oil), they were betting on the Internet to spread the American Way Of Life to the less fortunate, with the off chance of selling a few iPads on the process, but now, that profit margin is getting dangerously thin.

Not to mention the military threat, since a lot of the intelligence is now being gathered through the Internet, and recent attacks on Iranian nuclear power plants via the Stuxnet worm, would all become a lot harder. The fact that China is now bigger and more powerful than they are, in every possible aspect (I dare say even military, but we can’t know for sure), is also not helping.

What is then, the solution? Is it possible to really have nobody running the Internet? And, if at all possible, is it desirable?

Mad Max Internet

I don’t think so.

It’s true that IPv6 should remove completely the need for IP allocation, but DNS is a serious problem. Letting DNS registration to an organic self-organized process would lead to widespread malicious content being distributed and building security measures around it would be much harder than they already are. The same is true with SSL certificates. You’d expect that, on a land with no rules, trusted bodies would charge a fortune and extort clients for a safe SSL certificate, if they actually produce a good one, that is, but this is exactly what happens today, on ICANN rule.

Routing would also be affected, since current algorithms rely on total trust between parties. There was a time when China had all US traffic (including governmental and military) through its routers, solely done via standard BGP rules. On a world where every country has its own core router, digitally attacking another country would be as easy as changing one line on a router.

We all love to think that the Internet is a free world already, but more often than ever, people are being arrested for their electronic behaviour. Unfortunately, because there isn’t a set of rules, or a governing body, the rules that get people arrested are the rules of the strongest player, which in our current case, is Hollywood. So, how is it possible to reconcile security, anonymity and stability without recurring to governing bodies?

The simple answer is, it’s not. The Internet is a land with no physical barriers, where contacting people over 1000s of miles is the same as the one besides you, but we don’t live in a world without borders. It’s not possible to reconcile the laws of all countries, with all the different cultures, into one single book. As long as the world keeps its multiculturalism, we have to cope with different rules for different countries, and I’m not in favour of losing our identity just to make the Internet a place comfortable to the US government.

Regulating multi-body

It is my opinion that we do, indeed, need a regulating body. ICANN, ITU, it doesn’t matter, as long as the decisions are good for most.

I don’t expect that any such governing body would come up with a set of rules that are good for everybody, nor that they’ll find the best rules in the first N iterations (for large N), but if the process is fair, we should reach consensus (when N tends to infinity). The problem with both ICANN and ITU is that neither are fair, and there are other interests at play that are weighted much more than the interests of the people.

Since no regulating body, governmental or not, will ever account for the interests of the people (today or ever), people tend to hope that no-rule is the best rule, but I hope I have shown that this is not true. I believe that instead, a governing multi-body is the real solution. It’s hypocrite to believe that Russia will let the US create regulations within its borders, so we can’t assume that will ever happen from start, if we want it to work in the long run. So this multi-body, composed by independent organizations in Europe, Asia, Oceania, Africa and Americas would have strong powers on their regions, but would have to agree on very general terms.

The general terms would be something like:

  1. There should be no cost associated with the traffic to/from/across any country to any other country
  2. There should be no filtering of any content across countries, but filtering should be possible to/from a specific country or region based on religious or legal grounds
  3. It should be possible for countries to deny certain types of traffic (as opposed to filtering above), so that routing around would be preferred
  4. Misuse of Internet protocols (such as BGP and DNS spoofing) on root routers/DNS servers should be considered an international crime with the country responsible for the server in charge of the punishments or sanctions against that country could be enforced by the UN
  5. Legal rights and responsibilities on the Internet should be similar (but not identical) as they are on the physical world, but each country has the right and duty to enforce their own rules

Rule 1 is fundamental and would cut short most of the recent ITU’s proposals. It’s utter nonsense to cross-charge the Internet as it is to do it with telecoms around the world, and that is probably the biggest problem of the new proposal.

Rules 2 and 3 would leave control over regional Internet with little impact on the rest. It’d also foment creation of new routes around problematic countries, which is always beneficial to the Internet reliability as a whole. It’s hypocrite to assume that the US government has the right to impose Internet rules on countries like Iran or China, and it’s up to the people of China and Iran to fight their leaders on their own terms.

It’s extremely hypocrite, and very common, in the US to believe that their system (the American Way of Life) is the best for every citizen of the world, or that the people of other countries have no way of choosing their own history. It’s also extremely hypocrite to blame authoritarian governments on Internet regulations and at the same time provide weapons and support local authoritarian groups. Let’s not forget the role of the US on Afghanistan and Iraq prior to the Gulf War, as opposition to Russia and Iran (respectively), and their pivot role on all major authoritarian revolution in Latin America.

Most countries, including Russia and the ones in Middle East would probably be fine with rules 2 and 3, with little impact on the rest of the world. Which leaves us with rule 4, to account for the trust-worthiness of the whole system. Today, there is a gang of a few pals who control the main routers and giving more control over less trust-worthy pals over DNS and BGP routes would indeed be a problem.

However, in fact, this rule is in vigour today, since China routed US traffic for only 18 minutes. It was more a show of power than a real attack, but had China been doing this for too long, the US would think otherwise and with very strong reasons. The loose control is good, but the loose responsibility is not. Countries should have the freedom to structure their Internet backbones but also do it responsibly, or be punished otherwise.

Finally, there’s rule 5. How to account when a citizen of one country behaves in another country’s website as it’s legal for his culture, but not the other? Strong religious and ethical issues will arise from that, but nothing that there isn’t already on the Internet. Most of the time, this problem is identical to what already happens on the real world, with people from one country that commit crimes on another country. The hard bit is to know what are the differences between physical and logical worlds and how to reconcile the differences in interpretation of the multiple groups that will take part on such governing multi-body.

Conclusion

ITU’s proposal is not good, but ICANN’s is neither. The third alternative, to lack complete control is only going to make it worse, so we need a solution that is both viable and general enough, so that most countries agree to it. It also needs to relinquish control of internal features to their own governments in a way to not affect the rest of the Internet.

I argue that one single body, being it ITU or ICANN, is not a good model, since it’s not general enough nor they account for specific regions’ concerns (ICANN won’t listen to the Middle East and ITU won’t regard the US). So, the only solution I can see possible is one that unites them all into a governing multi-body, with very little in global agreement, but with general rules powerful enough to guarantee that the Internet will be free forever.

The American constitution is a beautiful piece of writing, but in reality, over the years, their government have destroyed most of its beauty. So, long term self-check must also be a core part of this multi-body, with regular review and democratic decisions (sorry authoritarian regimes, it’s the only way).

In a nutshell, while it is possible to write the Internet Constitution and make it work in the long term, humanity is very likely not ready to do that yet, and we’ll probably see the destruction of the Internet in the next 10 years.

Sigh…

 


Open Source and Innovation
September 13th, 2012 under Corporate, OSS, rengolin, Technology. [ Comments: 1 ]

A few weeks ago, a friend (Rob) asked me a pertinent question: “How can someone innovate and protect her innovation with open source?”. Initially, I scorned off with a simple “well, you know…”, but this turned out to be a really hard question to answer.

The main idea is that, in the end, every software (and possibly hardware) will end up as open source. Not because it’s beautiful and fluffy, but because it seems to be the natural course of things nowadays. We seem to be moving from profiting on products, to giving them away and profiting on services. If that’s true, are we going to stop innovating at all, and just focus on services? What about the real scientists that move the world forward, are they also going to be flipping burgers?

Open Source as a business model

The reason to use open source is clear, the TCO fallacy is gone and we’re all used to it (especially the lawyers!), that’s all good, but the question is really what (or even when) to open source your own stuff. Some companies do it because they want to sell the value added, or plugins and services. Others do because it’s not their core business or they want to form a community, which would otherwise use the competitors’ open source solution. Whatever the reason is, more and more we seem to be open sourcing software and hardware at an increasing speed, some times it comes off as open source on its first day in the wild.

Open source is a very good cost sharing model. Companies can develop a third-party product, not related to their core areas (where they actually make money), and still claim no responsibility or ownership (which would be costly). For example, the GNU/Linux and FreeBSD operating systems tremendously reduce the cost of any application developer, from embedded systems to big distributed platforms. Most platforms today (Apple’s, Androids, set-top boxes, sat-navs, HPC clusters, web-servers, routers, etc) have them at their core. If each of these products had to develop their own operating system (or even parts of it), it wouldn’t be commercially viable.

Another example is the MeshPotato (in Puerto Rico) box, which uses open software and hardware initially developed by Village Telco (in South Africa). They can cover wide areas providing internet and VoIP telephony over the rugged terrain of Puerto Rico for under $30 a month. If they had to develop their hardware and software (including the OS), it’d cost no less than a few hundred pounds. Examples like that are abundant these days and it’s hard to ignore the benefits of Open Source. Even Microsoft, once the biggest closed-source zealot, who propagated the misinformation that open source was hurting the American Way of Life is now one of the biggest open source contributors on the planet.

So, what is the question then?

If open source saves money everywhere, and promotes incremental innovation that wouldn’t be otherwise possible, how can the original question not have been answered? The key was in the scope.

Rob was referring, in fact, to real chunky innovations. Those that take years to develop, many people working hard with one goal in mind, spending their last penny to possibly profit in the end. The true sense of entrepreneurship. Things that might profit from other open source technologies, but are so hard to make that even so it takes years to produce. Things like new chips, new medicines, real artificial intelligence software and hardware, etc. The open source savings on those projects are marginal. Furthermore, if you spend 10 years developing a software (or hardware) and open source it straight away, how are you ever going to get your investment money back? Unless you charge $500 a month in services to thousands of customers on day one, you won’t see the money back in decades.

The big misunderstanding, I think, it’s that this model no longer applies, so the initial question was invalid to begin with. I explain.

Science and Tecnology

300 years ago, if you were curious about something you could make a name for yourself very easily. You could barely call what they did science. They even called themselves natural philosophers, because what they did was mostly discovering nature and inquiring about its behaviour. Robert Hooke was a natural philosopher and a polymath, he kept dogs with their internals in the open just to see if it’d survive. He’d keep looking at things through a microscope and he named most of the small things we can see today.

Newton, Liebniz, Gauss, Euler and few others have created the whole foundation of modern mathematics. They are known for fundamentally changing how we perceive the universe. It’d be preposterous to assume that there isn’t a person today as bright as they were, but yet, we don’t see people changing our perception of the universe that often. The last spree was more than a hundred years ago, with Maxwell, Planck and Einstein, but still, they were corrections (albeit fundamental) to the model.

Today, a scientist contents in scratching the surface of a minor field in astrophysics, and he’ll probably get a Nobel for that. But how many of you can name more than 5 Nobel laureates? Did they really change your perception of the universe? Did they invent things such as real artificial intelligence or did they discover a better way of doing politics? Sadly, no. Not because they weren’t as smart as Newton or Leibniz, but because the easy things were already discovered, now we’re in for the hard and incremental science and, like it or not, there’s no way around it.

Today, if you wrapped tin foil around a toilet paper tube and played music with it, people would, at best, think you’re cute. Thomas Edison did that and was called a Wizard. Nokia was trying to build a smartphone, but they were trying to make it perfect. Steve Jobs made is almost useless, people loved it, and he’s now considered a genius. If you try to produce a bad phone today, people will laugh at you, not think you’re cute, so things are getting harder for the careless innovators, and that’s the crucial point. Careless and accidental innovation is not possible on any field that has been exploited long enough.

Innovation and Business

Innovation is like business, you only profit if there is a market that hasn’t been taken. If you try to invent a new PC, you will fail. But if you produce a computer that has a niche that has never been exploited (even if it’s a known market, like in the Nokia’s smartphone case), you’re in for the money. If you want to build the next AI software, and it marginally works, you can make a lot of money, whether you open source your software or not. Since people will copy (copyright and patent laws are not the same in every country), your profit will diminish with time, proportional to the novelty and the difficulty in copying.

Rob’s point went further, “This isn’t just a matter of what people can or can’t do, is what people should or should not do”. Meaning, shouldn’t we aim for a world where people don’t copy other people’s ideas as a principle, instead of accepting the fact that people copy? My answer is a strong and sounding: NO! For the love of all that’s good, NO!

The first reason is simply because that’s not the world we live in and it will not be as long as humanity remains human. There is no point in creating laws that do not apply to the human race, though it seems that people get away with that very easy these days.

The second point is that it breaks our society. An example: try to get into a bank and ask for investment on a project that will take 10 years to complete (at the cost of $10M) and the return will come during the 70 years that follows it (at a profit of $100′sM a year). The manager will laugh at you and call security. This is, however, the time it takes (today) for copyright in Hollywood to expire (the infamous Mickey Mouse effect), and the kind of money they deal with.

Imagine that a car manufacturer develops a much safer way of building cars, say magical air bags. This company will be able to charge a premium, not just because of the development costs, but also for its unique position in the market. With time, it’ll save more lives that any other car and governments will want that to be standard. But no other company can apply that to their cars, or at least not without paying a huge premium to the original developer. In the end, cars will be much more expensive in general, and we end up paying the price.

Imagine if there were patents for the telephone, or the TV or cars (I mean, the concept of a car) or “talking to another person over the phone”, or “reminding to call your parents once in a while”. It may look silly, but this is better than most patent descriptions! Most of the cost to the consumer would be patents to people that no longer innovate! Did you know that Microsoft makes more money with Android phones than Google? Their contributions to the platform? Nothing. This was an agreement over dubious and silly patents that most companies accepted as opposed to being sued for billions of dollars.

Conclusion

In my opinion, we can’t just live in the 16th century with 21st century technology. You can’t expect to be famous or profit by building an in-house piece of junk or by spotting a new planet. Open source has nothing to do with it. The problem is not what you do with your code, but how you approach the market.

I don’t want to profit at the expense of others, I don’t want to protect my stupid idea that anyone else could have had (or probably already had, but thought it was silly), just because I was smart enough to market it. Difficult technology is difficult (duh), and it’s not up to a team of experts to create it and market it to make money. Science and technology will advance from now on on a steady, baby-steps way, and the tendency is for this pace to get even slower and smaller.

Another important conclusion for me is that, I’d rather live in a world where I cannot profit horrendously from a silly idea just because I’ve patented it than have monopolies like pharma/banking/tobacco/oil/media controlling our governments, or more than directly, our lives. I think that the fact that we copy and destroy property is the most liberating fact of humanity. It’s the Robin Hood of modern societies, making sure that, one way or another, the filthy rich won’t continue getting richer. Explosive growth, monopolies, cartels, free trade and protection of property are core values that I’d rather see dead as a parrot.

In a nutshell, open source does not hinder innovation, protection of property does.


Anarchy and Science
July 16th, 2012 under Life, Politics, rengolin, Science, World. [ Comments: none ]

If the world needed more proof that rational thinking is off the menu when concerning humans, we now have a so-called anarchist group attacking science. Bombs, shootings and sabotage, with one single goal: to stop science destroying our lives once and for all.

If you didn’t get it, you’re not alone. I’m still trying to understand the whole issue, but the more I read, the more I’m sure it’s just humanity reaching record levels of stupidity. Again.

Anarchy

First of all, the actions don’t make sense in the realms of anarchy. For ages, anarchism has been a non-violent banner. The anarchist is not tame, but a pacifist. Anarchists fight for freedom of everything, mainly from violence and oppression. Since every state, no matter controlled by whom, is oppressive, anarchists fight the very existence of any central form of coercion.

Bakunin once wrote:

“But the people will feel no better if the stick with which they are being beaten is labeled ‘the people’s stick’.” (Statism and Anarchy [1873])

This clearly means governments that base their choice on the people, such as democracies. For an anarchist, a democracy is as bad as dictatorship, as even in its purest form, it imposes the will of the average citizen onto the majority of the population. (If you thought it was the other way around, you clearly don’t understand democracy!).

In essence, anarchy is all about a long and non-violent migration to the total lack of central government, leaving the people (organised in local communities) to decide what’s best for themselves. If that works or not on a global level, I don’t know. But two key words pop out: non-violent and lack of central power.

Science

In Peter Kropotkin’s own words:

Anarchism is a world-concept based upon a mechanical explanation of all phenomena, embracing the whole of Nature–that is, including in it the life of human societies and their economic, political, and moral problems. Its method of investigation is that of the exact natural sciences, by which every scientific conclusion must be verified. Its aim is to construct a synthetic philosophy comprehending in one generalization all the phenomena of Nature–and therefore also the life of societies (…) [source]

Thus anarchy, as science, is the art of finding the best answer by an iterative and non-violent method, without centralised powers dictating what the answer should be, but finding the answers by experimentation and verification, where everyone should come to the same conclusions.

Science has no central power and doesn’t provide support to any government or controlling body. There isn’t any scientist or organization in the world, nor ever has, that can dictate what scientists believe or can prove. The scientific method is the most democratic method of all, where every one can repeat the same experiments and reach the same results, otherwise the hypothesis is plain wrong, and there is nothing anyone can do to force it to be true.

Science has been used by governments to impose lifestyles, borders and general ignorance, yes. Science has been used to develop unfathomably powerful bombs, yes. And used over and over again to control and dominate countries and continents, yes. But that was never a merit of science, but of governments. Every major blame on science is, actually, the people. Describing how science has made our lives better, would be boring and redundant.

The blame?

If some scientists are idiots, it doesn’t mean the whole science is. If governments abuse of the power, and science provide that power, it doesn’t mean science is to blame, but governments. If some bishops should burn in hell, it doesn’t mean religion is to blame, but what people make of it. The climate change fiasco, the US national health program criticisms and the whole “God Particle” boom in recent religious people has shown that people are still complete ignorants and prejudicial when evaluating external information.

Pen and paper have been much more harmful to the world than science, and over a much longer period. Pride and honour have wiped out entire civilizations for millennia, well before science was such embedded in our culture. Barons, kings and presidents don’t need science to destroy our lives, but it just happen to be available.

So, science and anarchy have two major points in common: non-violence and the lack of centralised government. Why on Earth would an anarchist group gratuitously attack scientists? Because they are not anarchists, they are just idiots. I truly hope this is an isolated incident. If anarchists of the world lose their minds like these ones, the only hope for humanity (in the long term) will be lost, and there will be no return.

Further reading:
Anarchy Archives
Anarchist science policy


Declaration of Internet Freedom
July 3rd, 2012 under Digital Rights, Life, Media, Politics, rengolin, rvincoletto, World. [ Comments: 1 ]

We stand for a free and open Internet.

We support transparent and participatory processes for making Internet policy and the establishment of five basic principles:

  • Expression: Don’t censor the Internet.
  • Access: Promote universal access to fast and affordable networks.
  • Openness: Keep the Internet an open network where everyone is free to connect, communicate, write, read, watch, speak, listen, learn, create and innovate.
  • Innovation: Protect the freedom to innovate and create without permission. Don’t block new technologies, and don’t punish innovators for their users’ actions.
  • Privacy: Protect privacy and defend everyone’s ability to control how their data and devices are used.

Don’t get it? You should be more informed on the power of the internet and what governments around the world have been doing to it.

Good starting places are: Avaaz, Ars Technica, Electronic Frontier Foundation, End Software Patents, Piratpartiet and the excellent Case for Copyright Reform.

Source: http://www.internetdeclaration.org/freedom


K-means clustering
June 20th, 2012 under Algorithms, Devel, rengolin. [ Comments: none ]

Clustering algorithms can be used with many types of data, as long as you have means to distribute them in a space, where there is the concept of distance. Vectors are obvious choices, but not everything can be represented into N-dimensional points. Another way to plot data, that is much closer to real data, is to allow for a large number of binary axis, like tags. So, you can cluster by the amount of tags the entries share, with the distance being (only relative to others) the proportion of these against the non-sharing tags.

An example of tag clustering can be viewed on Google News, an example of clustering on Euclidean spaces can be viewed on the image above (full code here). The clustering code is very small, and the result is very impressive for such a simple code. But the devil is in the details…

Each red dots group is generated randomly from a given central point (draws N randomly distributed points inside a circle or radius R centred at C). Each centre is randomly placed, and sometimes their groups collide (as you can see on the image), but that’s part of the challenge. To find the groups, and their centres, I throw random points (with no knowledge of the groups’ centres) and iterate until I find all groups.

The iteration is very simple, and consists of two steps:

  1. Assignment Step: For each point, assign it to the nearest mean. This is why you need the concept of distance, and that’s a tricky part. With Cartesian coordinates, it’s simple.
  2. Update Step: Calculate the real mean of all points belonging to each mean point, and update the point to be at it. This is basically moving the supposed (randomly guessed) mean to it’s rightful place.

On the second iteration, the means, that were randomly selected at first, are now closer to a set of points. Not necessarily points in the same cluster, but the cluster that has more points assigned to any given mean will slowly steal it from the others, since it’ll have more weight when updating it on step 2.

If all goes right, the means will slowly move towards the centre of each group and you can stop when the means don’t move too much after the update step.

Many problems will arise in this simplified version, for sure. For instance, if the mean is exactly in between two groups, and both pull it to their centres with equally strong forces, thus never moving the mean, thus the algorithm thinks it has already found its group, when in fact, it found two. Or if the group is so large that it ends up with two or more means which it belongs, splitting it into many groups.

To overcome these deficiencies, some advanced forms of K-means take into account the shape of the group during the update step, sometimes called soft k-means. Other heuristics can be added as further steps to make sure there aren’t two means too close to each other (relative to their groups’ sizes), or if there are big gaps between points of the same group, but that kind of heuristics tend to be exponential in execution, since they examine every point of a group in relation to other points of the same group.

All in all, still an impressive performance for such a simple algorithm. Next in line, I’ll try clustering data distributed among many binary axis and see how k-means behave.


Tough decision
May 10th, 2012 under rengolin, Stories. [ Comments: none ]

Peter wasn’t the most eclectic person, especially when the subject was musical styles. So it was a surprise for him when the alien that had landed in his livingroom (over all other places on Earth) started telling him that they were going to erase from the minds of all people, any memory of the best songs of every band that has performed on Earth.

This was an odd domination plan, to be honest, it looked more like some intergalactic prank, but hey, they’re aliens, right? You can never predict what aliens will do to your planet until they finally arrive and do, well, whatever they do when they arrive on new planets. And this was no exception.

According to the little alien, that was the first time that anyone from his species had landed on Earth and it was his duty to initiate Earthlings into the galactic customs. Peter tried to argue that Earth was on this very galaxy and that is not part of our customs, but the little alien did not reconsider. After all, it’s not like Earth is a central planet or anything.

The more Peter tried to argue, the more he was convinced that the alien was not fooling around. He was actually quite serious, stating that this is the norm for the initiation of any planet into the galactic fellowship, something that all other planets had done, too. There was no escape. The little guy got into his spaceship (or whatever that was, it didn’t look like it could fly in space but Peter was no rocket scientist), and disappear in mid-air, just as quickly and mysteriously as he had shown up.

There was one last thought that Peter should consider until the next morning (GMT), and it was that a single human could stop the initiation ceremony by killing himself. It was like an escape clause in the galactic contract. Either one being sacrifices himself (not killed by others) in the name of the fellowship, or all humans would have the best songs of all bands erased from memory. Forever.

Peter put the kettle on and sat on the dirty sofa of his small London flat. Was that a dream? Nope, he was well awaken, as proved by watching Rupert Murdoch on the telly. He was not drunk or intoxicated, so that shouldn’t be it, either. The kettle popped. He got up to get the tea bag and saw a business card laying on the kitchen sink, written: “You have until midnight of today, Peter. To kill yourself in the name of the Fellowship, tear this card in half.” Ok, now that was the confirmation he was waiting for. It was definitely not a dream.

But what is the problem with it? They’re not erasing all songs, just a few. The best ones, yes, but according to which criteria? For him, Bohemian Rhapsody, Lazy and War Pigs were the greatest songs ever, but there were people that liked Abba, and Beatles and, even those that did like Queen, could prefer Under Pressure instead. How is that even possible to choose? Peter put the tea bag in the cup and poured water in it. The vapour lifted the bitter smell of green tea, that would have to brew for a few more minutes until perfect.

Ok, so they can get the average of all favourite songs, or maybe a top500 list and remove duplicate songs per band. But that still doesn’t have all songs of all bands. They must have a way to traverse all songs in history, including those that were never recorded by humans. But how can they judge quality on them if no one knows they exist? So, they must have a different way to measure quality, an algorithm to judge by rhythm and choice of instruments and scales. Something that can be applied to virtually any audio signal to analyse the quality to a given set of standards, human standards. They must also understand perfectly the auditory system in humans, and human emotions, to know precisely what is good and what is just ok.

In that case, it doesn’t matter what he did like, but it was songs that were practically and theoretically good, no, the best! Wow, that changed things to a whole new level. All the songs he liked were just a handful, but all good songs, ever? That’s a different story. Erasing all good songs is much worse than erasing a single band from history, now matter how good this band is. It’s erasing everything that is good, and keeping a mediocre culture, it’s reducing the cultural richness of humanity to what shows on television or youtube. It’s making a sad world even sadder!

That is something he could not allow to happen! In his own mind, he was now beginning to believe of himself the same he though about the greatest band in the world. It’s better to lose the best band, than the best song of all bands, and him, well, it was better to lose him, even for himself, than to plunger humanity to even lower standards than today!

Peter looked at the tea cup, it was ready. The last green tea he’d ever have. He threw the bag in the sink and gave it a good sip. Burnt his tongue a bit, but no worries, that tongue wouldn’t care in a few minutes anyway. Got the card, and sat at the sofa, with the tea cup in one hand and the card in another. One more sip. This one was perfect, no burning. He put the cup away, held the card with his two hands and started ripping it apart, very slowly. Hearing the sound if it was making his hart stop, or at least beat slower. Much slower.

When suddenly it hit him. No, not death, Lady Gaga.

With the quality TV is these days, Murdoch and Lady Gaga is pretty much all you see without cable, and she was in all her glory (or whatever that is) on the screen. Peter had a revelation. Since the only way to precisely define what is good music is through a set of experiments outside the human mind, based on auditory and emotional systems, as well as the components that music is built from, it was, therefore, impossible to find a good song from Lady Gaga. QED.

Not just Lady Gada, mind you, a lot of what has been produced lately, pushed by the media companies including television. There was so much rubbish in the arts that it’d be impossible to find good music in more than half of what was produced in the last 3 decades! And, to not ignore alternative science, if they consider opinions, there would be a lot of songs that people wouldn’t even know exist.

The card was half-ripped, his tea was still warm. He put the card back where he got it from, sat on the sofa and finished his tea with the knowledge that, whatever that was, dream or bad trip, it was over. When he finished his tea it was Paris Hilton on the telly, doing something stupid, as usual. Peter felt somehow good watching that, knowing that those girls have saved humanity’s art history!


« Previous entries 


License
Creative Commons License
We Support

WWF

DefectiveByDesign.org

End Software Patents

Avaaz.org

See Also
Disclaimer

The information in this weblog is provided “AS IS” with no warranties, and confers no rights.

This weblog does not represent the thoughts, intentions, plans or strategies of our employers. It is solely our opinion.

Feel free to challenge and disagree, and do not take any of it personally. It is not intended to harm or offend.

We will easily back down on our strong opinions by presentation of facts and proofs, not beliefs or myths. Be sensible.

Recent Posts