Tale of The Water |
| October 20th, 2013 under Digital Rights, Media, Politics, rengolin, Stories. [ Comments: 1 ]
In a village, far from any big city, there lived a family which had access to clean water from a nearby river. With the rain from many spring and autumn months being abundant, the family never had any trouble to wash clothes, cook and drink, or even have a good long bath. But the village, as any good village in the world, grew along that river, and each family had access to clean and fresh water.
As times pass, the legend of good water spread across the land, and more and more people joined the thriving community of the water village. But with growth, there’s lack of space, and not everyone had direct access to the river, but had to cross the original settlers’ gardens to get to water. Some fights and some profits later, the community, that now extended across several rows of houses on both sides of the river, as far as the eye could see, had a meeting to decide what would be done about the “water problem”.
The eldest, and self-elected leader of the community, had many friends among the first settlers. He wasn’t himself living by the river, since he got there not long ago, but with a few favours (especially helping increasing the profits of the original settlers to share their water with the newcomers), he got himself in a pretty good spot, and had enough contacts on both sides of the river to reign almost unimpeded.
To no surprise, he was the first to speak: “Friends of the Water Village, we gather today to decide what to do with the water.” Half-way through the sentence, every body had stopped talking, so he proceeded: “We all know that the water in this village is of the best quality in all the land”, and a chorus in the background said “yeah!”. “We all know that the first settlers have the rights in accessing and distributing the water, which you all know I am not part of, nor I profit from their enterprise, I only help to see that their profits and rights are guaranteed.” There was silence, for most knew that it was a lie, but they either didn’t want to oppose (at least not publicly), or didn’t care.
“But recent events called for a special gathering. So many of you hear that there are people accessing the river via the bridge, which blocks the crossing and put the bridge, which is not of the best quality, in danger!”. “Not to mention that this is a disrespect with the original settlers, that fought so hard to build our thriving community, and gave us the bless of such good water, and have helped us in reaching the water in such beautiful and useful buckets of their own creation.” “We owe them the right to share with us their water, the right to charge for the tireless efforts to provide our homes with the best water, carefully selected and cared for.” There was a faint ovation from the bench where the original settlers were, with many of them only shrugging, or not even that.
“Some of you reported the efforts of our friend that decided to pass a pipe through his land to make it easier to other villagers to have access to water, and that was already dealt with. We destroyed his pipe, and let that be a warning of anyone trying to pervert the art of the original settlers, as we owe them our delicious water!”. “Now, as with any democracy, I open the floor for comments, on how are we going to solve this problems.”
With this, some of the original settlers mentioned how the town should restrict the access to the bridge, and to charge a fee to cross, so that people that uses the bridge have the intention to cross the bridge, not to collect water. Others mentioned that it still wouldn’t stop collectors, but, as some said, they could restrict the validity of the tickets to a short period of time, in which a new charge would be collected.
About the pipe “problem”, many suggested that it should be made illegal to have pipes in any house, not just on the original settles, because connecting pipes between houses was not technically difficult, and it would be hard to solve the problem in case many houses ended up connecting to each other, as it was already happening in the north area.
When all the citizens were heard, and all the votes were taken, most of the ideas were unanimously approved. When the final hammer stroke down, finishing the meeting, one citizen, who was not one of the original settlers rose up: “This is outrageous! It doesn’t make sense, the water comes from the rain, and there is no innate right of the original settlers to charge anything for it!”. As he was saying this, one of the man standing behind the bench left in silence.
To that, not much was done from the central bench, where the eldest was sitting in the middle. He slowly rose is head, adjusted his glasses and smiled. “Friend, we’d be happy to hear your pledge, but as you all know, you don’t have the right to address the council. Only original settlers, and those appointed by them, can speak at the council. If you want to voice your concerns, I suggest you talk to your representative.” To which the man responded: “But my representative is an original settler, and I can’t vote for anyone that is not one, so they don’t represent me, they never had!”. “I’m sorry friend, but this is how democracy works, we can’t change the world just because of you.”.
The villager’s face was red, his eyes twitched slightly. The despair in his mind was clear, but he didn’t have much time to fall into it, for the silent men returned to the settlers’ bench and whispered something to the eldest’s ear only. The eldest turned his head again to the nonconformist villager. “Dear sir, we hear stories that you have been consistently using the bridge in the past days, is that true?”. “Well, yes, my sister lives on the other side, and I go visit her every day.”. “The reports also say that you take a bucket with you, and that you fill it with water, do you agree?”. “Well, yes, of course, I take the water for my sick sister, she needs it to aid her recovery.”. “And you haven’t paid a single settler for more than a month, how much water do you have stored at your house, dear sir?”.
It didn’t take long for the strong men behind the bench take the poor villager into a closed room, and he was never heard of ever again. Even though the water is a resource from nature, and despite the fact that water is essential to every living creature, the innate right of ownership of basic needs is common place in many parts of the world.
Creativity is a gift we received from evolution, as a way to save ourselves from more powerful foes. Creativity has a large proportion of imitation, since other living beings have different ideas, equally effective, against our common foes, and those that copy and share ideas, survive for longer. And yet, out society believes, for some serious distortion of natural reality, that the right to own something is more important than the right to survive.
If you read this story again, but replacing “water” with “music”, and making the appropriate changes, you’ll see that it makes as much sense as the original tale. And yet, a huge empire is built on the presumption that creativity can be owned by anyone. Who was the first to play certain tune? How many completely separate cultures have the same beat on their millenarian songs? There are infinite ways of combining words, but only a few actually make sense, and a lot less than that ends up beautiful.
Songs, poems, tales, videos, films, theatre are all forms of expressing the same feelings in different ways, but some people have the luxury of owning the rights of a particular way of expression, mainly because the law is written to favour them, than because they have actually created something truly new. No one has.
We all copy ideas. That’s called survival. That’s genetic. That’s what define us.
Why are we so ashamed of our own past? Why do we accept that the rich gets richer on our own account? Why do we agree that paying millions of dollars to an already filthy rich actors, directors and producers makes sense, for them to give us the benefit of watching the “Hangover III”, when it’s an absolute copy of itself for the second time, when the original was a pout-pourri of many other films and stories? Why do we accept a law that makes us criminals by sharing creativity, a basic instinct of the human race?
What has come of the human race to accept this as “normal”?
Open Source and Profit |
| July 8th, 2013 under Corporate, Devel, Digital Rights, OSS, rengolin, World. [ Comments: 2 ]
I have written extensively about free, open source software as a way of life, and now reading back my own articles of the past 7 years, I realize that I was wrong on some of the ideas, or in the state of the open source culture within business and around companies.
I’ll make a bold statement to start, trying to get you interested in reading past the introduction, and I hope to give you enough arguments to prove I’m right. Feel free to disagree on the comments section.
The future of business and profit, in years to come, can only come if surrounded by free thoughts.
By free thoughts I mean free/open source software, open hardware, open standards, free knowledge (both free as in beer and as in speech), etc.
I began my quest to understand the open source business model back in 2006, when I wrote that open source was not just software, but also speech. Having open source (free) software is not enough when the reasons why the software is free are not clear. The reason why this is so is that the synergy, that is greater than the sum of the individual parts, can only be achieved if people have the rights (and incentives) to reach out on every possible level, not just the source, or the hardware. I make that clear later on, in 2009, when I expose the problems of writing closed source software: there is no ecosystem in which to rely, so progress is limited and the end result is always less efficient, since the costs to make it as efficient are too great and would drive the prices of the software too high up to be profitable.
In 2008 I saw both sides of the story, pro and against Richard Stallman, on the views of the legitimacy of propriety control, being it via copyright licenses or proprietary software. I may have come a long way, but I was never against his idea of the perfect society, Richard Stallman’s utopia, or as some friends put it: The Star Trek Universe. The main difference between me and Stallman is that he believes we should fight to the last man to protect ourselves from the evil corporations towards software abuse, while I still believe that it’s impossible for them to sustain this empire for too long. His utopia will come, whether they like it or not.
Finally, in 2011 I wrote about how copying (and even stealing) is the only business model that makes sense (Microsoft, Apple, Oracle etc are all thieves, in that sense) and the number of patent disputes and copyright infringement should serve to prove me right. Last year I think I had finally hit the epiphany, when I discussed all these ideas with a friend and came to the conclusion that I don’t want to live in a world where it’s not possible to copy, share, derive or distribute freely. Without the freedom to share, our hands will be tied to defend against oppression, and it might just be a coincidence, but in the last decade we’ve seen the biggest growth of both disproportionate propriety protection and disproportional governmental oppression that the free world has ever seen.
Can it be different?
Stallman’s argument is that we should fiercely protect ourselves against oppression, and I agree, but after being around business and free software for nearly 20 years, I so far failed to see a business model in which starting everything from scratch, in a secret lab, and releasing the product ready for consumption makes any sense. My view is that society does partake in an evolutionary process that is ubiquitous and compulsory, in which it strives to reduce the cost of the whole process, towards stability (even if local), as much as any other biological, chemical or physical system we know.
So, to prove my argument that an open society is not just desirable, but the only final solution, all I need to do is to show that this is the least energy state of the social system. Open source software, open hardware and all systems where sharing is at the core should be, then, the least costly business models, so to force virtually all companies in the world to follow suit, and create the Stallman’s utopia as a result of the natural stability, not a forced state.
This is crucial, because every forced state is non-natural by definition, and every non-natural state has to be maintained by using resources that could be used otherwise, to enhance the quality of the lives of the individuals of the system (being them human or not, let’s not block our point of view this early). To achieve balance on a social system we have to let things go awry for a while, so that the arguments against such a state are perfectly clear to everyone involved, and there remains no argument that the current state is non-optimal. If there isn’t discomfort, there isn’t the need for change. Without death, there is no life.
Of all the bad ideas us humans had on how to build a social system, capitalism is probably one of the worst, but it’s also one of the most stable, and that’s because it’s the closest to the jungle rule, survival of the fittest and all that. Regulations and governments never came to actually protect the people, but as to protect capitalism from itself, and continue increasing the profit of the profitable. Socialism and anarchy rely too much on forced states, in which individuals have to be devoid of selfishness, a state that doesn’t exist on the current form of human beings. So, while they’re the product of amazing analysis of the social structure, they still need heavy genetic changes in the constituents of the system to work properly, on a stable, least-energy state.
Having less angry people on the streets is more profitable for the government (less costs with security, more international trust in the local currency, more investments, etc), so panis et circenses will always be more profitable than any real change. However, with more educated societies, result from the increase in profits of the middle class, more real changes will have to be made by governments, even if wrapped in complete populist crap. One step at a time, the population will get more educated, and you’ll end up with more substance and less wrapping.
So, in the end, it’s all about profit. If not using open source/hardware means things will cost more, the tendency will be to use it. And the more everyone uses it, the less valuable will be the products that are not using it, because the ecosystem in which applications and devices are immersed in, becomes the biggest selling point of any product. Would you buy a Blackberry Application, or an Android Application? Today, the answer is close to 80% on the latter, and that’s only because they don’t use the former at all.
It’s not just more expensive to build Blackberry applications, because the system is less open, the tools less advanced, but also the profit margins are smaller, and the return on investment will never justify. This is why Nokia died with their own App store, Symbian was not free, and there was a better, free and open ecosystem already in place. The battle had already been lost, even before it started.
But none of that was really due to moral standards, or Stallman’s bickering. It was only about profit. Microsoft dominated the desktop for a few years, long enough to make a stand and still be dominant after 15 years of irrelevance, but that was only because there was nothing better when they started, not by a long distance. However, when they tried to flood the server market, Linux was not only already relevant, but it was better, cheaper and freer. The LAMP stack was already good enough, and the ecosystem was so open, that it was impossible for anyone with a closed development cycle to even begin to compete on the same level.
Linux became so powerful that, when Apple re-defined the concept of smartphones with the iPhone (beating Nokia’s earlier attempts by light-years of quality), the Android system was created, evolved and dominated in less than a decade. The power to share made possible for Google, a non-device, non-mobile company, to completely outperform a hardware manufacturer in a matter of years. If Google had invented a new OS, not based on anything existent, or if they had closed the source, like Apple did with FreeBSD, they wouldn’t be able to compete, and Apple would still be dominant.
Do we need profit?
So, the question is: is this really necessary? Do we really depend on Google (specifically) to free us from the hands of tyrant companies? Not really. If it wasn’t Google, it’d be someone else. Apple, for a long time, was the odd guy in the room, and they have created an immense value for society: they gave us something to look for, they have educated the world on what we should strive for mobile devices. But once that’s done, the shareable ecosystem learns, evolves and dominate. That’s not because Google is less evil than Apple, but because Android is more profitable than iOS.
Profit here is not just the return on investment that you plan on having on a specific number of years, but adding to that, the potential that the evolving ecosystem will allow people to do when you’ve long lost the control over it. Shareable systems, including open hardware and software, allow people far down in the planing, manufacturing and distributing process to still have profit, regardless of what were your original intentions. One such case is Maddog’s Project Cauã.
By using inexpensive RaspberryPis, by fostering local development and production and by enabling the local community to use all that as a way of living, Maddog’s project is using the power of the open source initiative by completely unrelated people, to empower the people of a country that much needs empowering. That new class of people, from this and other projects, is what is educating the population of the world, and what is allowing the people to fight for their rights, and is the reason why so many civil uprisings are happening in Brazil, Turkey, Egypt.
All that creates instability, social unrest, whistle-blowing gone wrong (Assange, Snowden), and this is a good thing. We need more of it.
It’s only when people feel uncomfortable with how the governments treat them that they’ll get up their chairs and demand for a change. It’s only when people are educated that they realise that oppression is happening (since there is a force driving us away from the least-energy state, towards enriching the rich), and it’s only when these states are reached that real changes happen.
The more educated society is, the quicker people will rise to arms against oppression, and the closer we’ll be to Stallman’s utopia. So, whether governments and the billionaire minority likes or not, society will go towards stability, and that stability will migrate to local minima. People will rest, and oppression will grow in an oscillatory manner until unrest happens again, and will throw us into yet another minimum state.
Since we don’t want to stay in a local minima, we want to find the best solution not just a solution, having it close to perfect in the first attempt is not optimal, but whether we get it close in the first time or not, the oscillatory nature of social unrest will not change, and nature will always find a way to get us closer to the global minimum.
Is it possible to stay in this unstable state for too long? I don’t think so. But it’s not going to be a quick transition, nor is it going to be easy, nor we’ll get it on the first attempt.
But more importantly, reaching stability is not a matter of forcing us to move towards a better society, it’s a matter of how dynamic systems behave when there are clear energetic state functions. In physical and chemical systems, this is just energy, in biological systems this is the propagation ability, and in social systems, this is profit. As sad as it sounds…
Amazon loves to annoy |
| June 27th, 2013 under Digital Rights, Gadgtes, rengolin, Software, Unix/Linux, Web. [ Comments: none ]
It’s amazing how Amazon will do all in their power to annoy you. They will sell you DRM-free MP3 songs, and even allow you to download on any device (via their web interface) the full version, for your own personal use, in the car, at home or when mobile. But, not without a cost, no.
For some reason, they want to have total control of the process, so if they’ll allow you to download your music, it has to be their way. In the past, you had to download the song immediately after buying, with a Windows-only binary (why?) and you had only one shot. If the link failed, you just lost a couple of pounds. To be honest, that happened to me, and customer service were glad to re-activate my “license” so I could download it again. Kudos for that.
Question 1: Why did they need an external software to download the songs when they had a full-featured on-line e-commerce solution?
It’s not hard to sell on-line music, other people have been doing it for years and not in that way, for sure. Why was it so hard for Amazon, the biggest e-commerce website on Earth, to do the same? I was not asking for them to revolutionise the music industry (I leave that for Spotify), just do what others were doing at the time. Apparently, they just couldn’t.
Recently, it got a lot better, and that’s why I started buying MP3 songs from Amazon. They now had a full-featured MP3 player on the web! They also have the Android version of it that is a little confusing but unobtrusive. The web version is great, once you buy an album you go directly to it and you can already start listening to songs and all.
Well, I’m a control freak, and I want to have all songs I own on my own server (and its backup), so I went to download my recently purchased songs. Well, it’s not that simple: you can download all your songs, on Windows and Mac… not Linux.
Question 2: Why on Earth can’t they make it work on Linux?
Undeterred, I knew the Android app would let me download, and as an added bonus, all songs downloaded by AmazonMP3 would be automatically added to the Android music playlists, so that both programs could play the same songs. That was great, of course, until I wanted to copy them to my laptop.
When running (the fantastic) ES File Explorer, I listed the folders consuming most of the SDCARD, found the amazonmp3 folder and saw that all my songs were in there. Since Android changed the file-system, and I can’t seem to mount it correctly via MTP (noob), I decided to use the ES File Explorer (again) to select all files and copy to my server via its own interface, that is great for that sort of thing, and well, found out that it’s not that simple. Again.
Question 3: Why can I read and delete the songs, but not copy them?
What magic Linux permission let me listen to a song (read) and delete the file (write) but not copy to another location? I can’t think of a way to natively do that on Linux, it must be a magic from Android, to allow for DRM crap.
At this time I was already getting nervous, so I just fired adb shell and navigated to the directory, and when I listed the files, adb just logged out. I tried again, and it just exited. No error message, no log, no warning, just shut down and get me back to my own prompt.
This was getting silly, but I had the directory, so I just ran adb pull /sdcard/amazonmp3/ and found that only the temp directory came out. What the hell is this sorcery?!
Question 4: What kind of magic stops me from copying files, or even listing files from a shell?
Well, I knew it was something to do with the Amazon MP3 application itself, if couldn’t be something embedded on Android, or the activists would crack on until they ceded, or at least provided means for disabling DRM crap from the core. To prove my theory, I removed the AmazonMP3 application and, as expected, I could copy all my files via adb to my server, where I could then, back them up.
So, if you use Linux and want to download all your songs from Amazon MP3 website, you’ll have to:
- Buy songs/albuns on Amazon’s website
- Download them via AmazonMP3 Android app (click on album, click on download)
- Un-install the AmazonMP3 app
- Get the files via: adb pull /sdcard/amazonmp3/
- Re-install the AmazonMP3 app (if you want, or to download more songs)
As usual, Amazon was a pain in the back with what should be really, really simple for them to do. And, as usual, a casual user finds its way to getting what they want, what they paid for, what they deserve.
If you know someone at Amazon, please let them know:
Uno score keeper |
| March 31st, 2013 under Devel, OSS, rengolin, Software. [ Comments: none ]
With the spring not coming soon, we had to improvise during the Easter break and play Uno every night. It’s a lot of fun, but it can take quite a while to find a piece of clean paper and a pen that works around the house, so I wondered if there was an app for that. It turns out, there wasn’t!
There were several apps to keep card game scores, but every one was specific to the game, and they had ads, and wanted access to the Internet, so I decided it was worth it writing one myself. Plus, that would finally teach me to write Android apps, a thing I was delaying to get started for years.
Card Game Scores
The app is not just a Uno score keeper, it’s actually pretty generic. You just keep adding points until someone passes the threshold, when the poor soul will be declared a winner or a loser, depending on how you set up the game. Since we’re playing every night, even the 30 seconds I spent re-writing our names was adding up, so I made it to save the last game in the Android tuple store, so you can retrieve it via the “Last Game” button.
It’s also surprisingly easy to use (I had no idea), but if you go back and forth inside the app, it cleans the game and start over a new one, with the same players, so you can go on as many rounds as you want. I might add a button to restart (or leave the app) when there’s a winner, though.
I’m also thinking about printing the names in order in the end (from victorious to loser), and some other small changes, but the way it is, is good enough to advertise and see what people think.
If you end up using, please let me know!
Download and Source Code
The app is open source (GPL), so rest assured it has no tricks or money involved. Feel free to download it from here, and get the source code at GitHub.
Distributed Compilation on a Pandaboard Cluster |
| February 13th, 2013 under Devel, Distributed, OSS, rengolin. [ Comments: 2 ]
This week I was experimenting with the distcc and Ninja on a Pandaboard cluster and it behaves exactly as I expected, which is a good thing, but it might not be what I was looking for, which is not.
Long story short, our LLVM buildbots were running very slow, from 3 to 4.5 hours to compile and test LLVM. If you consider that at peak time (PST hours) there are up to 10 commits in a single hour, the buildbot will end up testing 20-odd patches at the same time. If it breaks in unexpected ways, of if there is more than one patch on a given area, it might be hard to spot the guilty.
We ended up just avoiding the make clean step, which put us around 15 minutes build+tests, with the odd chance of getting 1 or 2 hours tops, which is a great deal. But one of the alternatives I was investigating is to do a distributed build. More so because of the availability of cluster nodes with dozens of ARM cores inside, we could make use of such a cluster to speed up our native testing, even benchmarking on a distributed way. If we do it often enough, the sample might be big enough to account for the differences.
So, I got three Pandaboards ES (dual Cortex-A9, 1GB RAM each) and put the stock Ubuntu 12.04 on them and installed the bare minimum (vim, build-essential, python-dev, etc), upgraded to the latest packages and they were all set. Then, I needed to find the right tools to get a distributed build going.
It took a bit of searching, but I ended up with the following tool-set:
- distcc: The distributed build dispatcher, which knows about the other machines in the cluster and how to send them jobs and get the results back
- CMake: A Makefile generator which LLVM can use, and it’s much better than autoconf, but can also generate Ninja files!
- Ninja: The new intelligent builder which not only is faster to resolve dependencies, but also has a very easy way to change the rules to use distcc, and also has a magical new feature called pools, which allow me to scale job types independently (compilers, linkers, etc).
All three tools had to be compiled from source. Distcc’s binary distribution for ARM is too old, CMake’s version on that Ubuntu couldn’t generate Ninja files and Ninja doesn’t have binary distributions, full stop. However, it was very simple to get them interoperating nicely (follow the instructions).
You don’t have to use CMake, there are other tools that generate Ninja files, but since LLVM uses CMake, I didn’t have to do anything. What you don’t want is to generate the Ninja files yourself, it’s just not worth it. Different than Make, Ninja doesn’t try to search for patterns and possibilities (this is why it’s fast), so you have to be very specific on the Ninja file on what you want to accomplish. This is very easy for a program to do (like CMake), but very hard and error prone for a human (like me).
To use distcc is simple:
- Replace the
compiler command by
distcc compiler on your Ninja rules;
- Set the environment variable
DISTCC_HOSTS to the list of IPs that will be the slaves (including localhost);
- Start the distcc daemon on all slaves (not on the master):
distccd --daemon --allow <MasterIP>;
- Run ninja with the number of CPUs of all machines + 1 for each machine. Ex:
ninja -j6 for 2 Pandaboards.
A local build, on a single Pandaboard of just LLVM (no Clang, no check-all) takes about 63 minutes. With distcc and 2 Pandas it took 62 minutes!
That’s better, but not as much as one would hope for, and the reason is a bit obvious, but no less damaging: The Linker! It took 20 minutes to compile all of the code, and 40 minutes to link them into executable. That happened because while we had 3 compilation jobs on each machine, we had 6 linking jobs on a single Panda!
See, distcc can spread the compilation jobs as long as it copies the objects back to the master, but because a linker needs all objects in memory to do the linking, it can’t do that over the network. What distcc could do, with Ninja’s help, is to know which objects will be linked together, and keep copies of them on different machines, so that you can link on separate machines, but that is not a trivial task, and relies on an interoperation level between the tools that they’re not designed to accept.
And that’s where Ninja proved to be worth its name: Ninja pools! In Ninja, pools are named resources that bundle together with a specific level of scalability. You can say that compilers scale free, but linkers can’t run more than a handful. You simply need to create a pool called linker_pool (or anything you want), give it a depth of, say, 2, and annotate all linking jobs with that pool. See the manual for more details.
With the pools enabled, a distcc build on 2 Pandaboards took exactly 40 minutes. That’s 33% of gain with double the resources, not bad. But, how does that scale if we add more Pandas?
How does it scale?
To get a third point (and be able to apply a curve fit), I’ve added another Panda and ran again, with 9 jobs and linker pool at 2, and it finished in 30 minutes. That’s less than half the time with three times more resources. As expected, it’s flattening out, but how much more can we add to be profitable?
I don’t have an infinite number of Pandas (nor I want to spend all my time on it), so I just cheated and got a curve fitting program (xcrvfit, in case you’re wondering) and cooked up an exponential that was close enough to the points and use the software ability to do a best fit. It came out with
86.806*exp(-0.58505*x) + 14.229, which according to Lybniz, flattens out after 4 boards (about 20 minutes).
Distcc has a special mode called pump mode, in which it pushes with the C file, all headers necessary to compile it solely on the node. Normally, distcc will pre-compile on the master node and send the pre-compiled result to the slaves, which convert to object code. According to the manual, this could improve the performance 10-fold! Well, my results were a little less impressive, actually, my 3-Panda cluster finished in just about 34 minutes, 4 minutes more than without push mode, which is puzzling.
I could clearly see that the files were being compiled in the slaves (distccmon-text would tell me that, while there was a lot of “preprocessing” jobs on the master before), but Ninja doesn’t print times on each output line for me to guess what could have slowed it down. I don’t think there was any effect on the linker process, which was still enabled in this mode.
Simply put, both distcc and Ninja pools have shown to be worthy tools. On slow hardware, such as the Pandas, distributed builds can be an option, as long as you have a good balance between compilation and linking. Ninja could be improved to help distcc to link on remote nodes as well, but that’s a wish I would not press on the team.
However, scaling only to 4 boards will reduce a lot of the value for me, since I was expecting to use 16/32 cores. The main problem is again the linker jobs working solely on the master node, and LLVM having lots and lots of libraries and binaries. Ninja’s pools can also work well when compiling LLVM+Clang on debug mode, since the objects are many times bigger, and even on above average machine you can start swapping or even freeze your machine if using other GUI programs (browsers, editors, etc).
In a nutshell, the technology is great and works as advertised, but with LLVM it might not be yet the thing. It’s still more profitable to get faster hardware, like the Chromebooks, that are 3x faster than the Pandas and cost only marginally more.
Would also be good to know why the pump mode has regressed in performance, but I have no more time to spend on this, so I leave as a exercise to the reader.
LLVM Vectorizer |
| February 12th, 2013 under Algorithms, Devel, rengolin. [ Comments: 2 ]
Now that I’m back working full-time with LLVM, it’s time to get some numbers about performance on ARM.
I’ve been digging the new LLVM loop vectorizer and I have to say, I’m impressed. The code is well structured, extensible and above all, sensible. There are lots of room for improvement, and the code is simple enough so you can do it without destroying the rest or having to re-design everything.
The main idea is that the loop vectorizer is a Loop Pass, which means that if you register this pass (automatically on
-O3, or with
-loop-vectorize option), the Pass Manager will run its
runOnLoop(Loop*) function on every loop it finds.
The three main components are:
- The Loop Vectorization Legality: Basically identifies if it’s legal (not just possible) to vectorize. This includes checking if we’re dealing with an inner loop, and if it’s big enough to be worth, and making sure there aren’t any conditions that forbid vectorization, such as overlaps between reads and writes or instructions that don’t have a vector counter-part on a specific architecture. If nothing is found to be wrong, we proceed to the second phase:
- The Loop Vectorization Cost Model: This step will evaluate both versions of the code: scalar and vector. Since each architecture has its own vector model, it’s not possible to create a common model for all platforms, and in most cases, it’s the special behaviour that makes vectorization profitable (like 256-bits operations in AVX), so we need a bunch of cost model tables that we consult given an instruction and the types involved. Also, this model doesn’t know how the compiler will lower the scalar or vectorized instructions, so it’s mostly guess-work. If the vector cost (normalized to the vector size) is less than the scalar cost, we do:
- The Loop Vectorization: Which is the proper vectorization, ie. walking through the scalar basic blocks, changing the induction range and increment, creating the prologue and epilogue, promote all types to vector types and change all instructions to vector instructions, taking care to leave the interaction with the scalar registers intact. This last part is a dangerous one, since we can end up creating a lot of copies from scalar to vector registers, which is quite expensive and was not accounted for in the cost model (remember, the cost model is guess-work based).
All that happens on a new loop place-holder, and if all is well at the end, we replace the original basic blocks by the new vectorized ones.
So, the question is, how good is this? Well, depending on the problems we’re dealing with, vectorizers can considerably speed up execution. Especially iterative algorithms, with lots of loops, like matrix manipulation, linear algebra, cryptography, compression, etc. In more practical terms, anything to do with encoding and decoding media (watching or recording videos, pictures, audio), Internet telephones (compression and encryption of audio and video), and all kinds of scientific computing.
One important benchmark for that kind of workload is Linpack. Not only Linpack has many examples of loops waiting to be vectorized, but it’s also the benchmark that defines the Top500 list, which classifies the fastest computers in the world.
So, both GCC and Clang now have the vectorizers turned on by default with
-O3, so comparing them is as simple as compiling the programs and see them fly. But, since I’m also interested in seeing what is the performance gain with just the LLVM vectorizer, I also disabled it and ran a clang with only
-O3, no vectorizer.
On x86_64 Intel (Core i7-3632QM), I got these results:
This is some statement! The GCC vectorizer exists for a lot longer than LLVM’s and has been developed by many vectorization gurus and LLVM seems to easily beat GCC in that field. But, a word of warning, Linpack is by no means representative of all use cases and user visible behaviour, and it’s very likely that GCC will beat LLVM on most other cases. Still, a reason to celebrate, I think.
This boost mean that, for many cases, not only the legality if the transformations are legal and correct (or Linpack would have gotten wrong results), but they also manage to generate faster code at no discernible cost. Of course, the theoretical limit is around 4x boost (if you manage to duplicate every single scalar instruction by a vector one and the CPU has the same behaviour about branch prediction and cache, etc), so one could expect a slightly higher number, something on the order of 2x better.
It depends on the computation density we’re talking about. Linpack tests specifically the inner loops of matrix manipulation, so I’d expect a much higher ratio of improvement, something around 3x or even closer to 4x. VoIP calls, watching films and listening to MP3 are also good examples of densely packet computation, but since we’re usually running those application on a multi-task operating system, you’ll rarely see improvements higher than 2x. But general applications rarely spend that much time on inner loops (mostly waiting for user input and then doing a bunch of unrelated operations, hardly vectorizeable).
Another important aspect of vectorization is that it saves a lot of battery juice. MP3 decoding doesn’t really matter if you finish in 10 or 5 seconds, as long as the music doesn’t stop to buffer. But taking 5 seconds instead of 10 means that on the other 5 seconds the CPU can reduce its voltage and save battery. This is especially important in mobile devices.
What about ARM code?
Now that we know the vectorizer works well, and the cost model is reasonably accurate, how does it compare on ARM CPUs?
It seems that the grass is not so green on this side, at least not at the moment. I have reports that on ARM it also reached the 40% boost similar to Intel, but what I saw was a different picture altogether.
On a Samsung Chromebook (Cortex-A15) I got:
The performance regression can be explained by the amount of scalar code intermixed with vector code inside the inner loops as a result of shuffles (movement of data within the vector registers and between scalar and vector registers) not being lowered correctly. This most likely happens because the LLVM back-end relies a lot on pattern-matching for instruction selection (a good thing), but the vectorizers might not be producing the shuffles in the right pattern, as expected by each back-end.
This can be fixed by tweaking the cost model to penalize shuffles, but it’d be good to see if those shuffles aren’t just mismatched against the patterns that the back-end is expecting. We will investigate and report back.
Got results for single precision floating point, which show a greater improvement on both Intel and ARM.
On x86_64 Intel (Core i7-3632QM), I got these results:
On a Samsung Chromebook (Cortex-A15) I got:
Which goes on to show that the vectorizer is, indeed, working well for ARM, but the costs of using the VFP/NEON pipeline outweigh the benefits. Remember than NEON vectors are only 128-bits wide and VFP only 64-bit wide, and NEON has no double precision floating point operations, so they’ll only do one double precision floating point operations per cycle, so the theoretical maximum depends on the speed of the soft-fp libraries.
So, in the future, what we need to be working is the cost model, to make sure we don’t regress in performance, and try to get better algorithms when lowering vector code (both by making sure we match the patterns that the back-end is expecting, and by just finding better ways of vectorizing the same loops).
Without further benchmarks it’s hard to come to a final conclusion, but it’s looking good, that’s for sure. Since Linpack is part of the standard LLVM test-suite benchmarks, fixing this and running it regularly on ARM will at least avoid any further regressions… Now it’s time to get our hands dirty!
Hypocrite Internet Freedom |
| December 11th, 2012 under Digital Rights, Politics, rengolin, Web, World. [ Comments: none ]
Last year, the Internet has shown its power over governments, when we all opposed to the SOPA and PIPA legislations in protests across the world, including this very blog. Later on, against ACTA and so on, and we all felt very powerful indeed. Now, a new thread looms over the Internet, the ITU is trying to take over the Internet.
To quote Ars Technica:
Some of the world’s most authoritarian regimes introduced a new proposal at the World Conference on International Telecommunications on Friday that could dramatically extend the jurisdiction of the International Telecommunication Union over the Internet.
Or New Scientist:
This week, 2000 people have gathered for the World Conference on International Telecommunications (WCIT) in Dubai in the United Arab Emirates to discuss, in part, whether they should be in charge.
And stressing that:
WHO runs the internet? For the past 30 years, pretty much no one.
When in reality, the Internet of today is actually in the precise state the US is trying to avoid, only that now they’re in control, and the ITU is trying to change it to an international organization, where more countries have a say.
Today, the DNS and the main IP blocks are controlled by the ICANN, however, Ars Technica helps us reminding that ICANN and IANA are:
the quasi-private organizations that currently oversee the allocation of domain names and IP addresses.
But the ICANN was once a US government operated body, still with strong ties with Washington, localized solely on the US soil, operating on US law jurisdiction. They also failed on many accounts to democratize their operations, resulting in little or no impact for international input. Furthermore, all top level domains that are not bound to a country (like .com, .org, .net) are also within American jurisdiction, even if they’re hosted and registered in another country.
But controlling the DNS is only half the story. The control that the US has on the Internet is much more powerful. First, they hold (for historical and economical reasons), most of the backbone of the Internet (root DNS servers, core routers, etc). That means the traffic between Europe and Japan will probably pass through them. In theory, this shouldn’t matter and it’s actually an optimization of the self-structuring routing tables, but in fact, the US government has openly reported that they do indeed monitor all traffic that goes within their borders and they do reserve the right to cut it, if they think this presents a risk of national security.
Given the amount of publicity the TSA had since 2001 for their recognition of what poses a security threat, including Twitter comments from British citizens, I wouldn’t trust them, or their automated detection system to care for my security. Also, given the intrusion that they have on some governments like the case of Dotcom in January, where national security operations in New Zealand were shared inappropriately with the American government, I never felt safe when crossing American soil, physically or through the Internet.
Besides, Hollywood has shown in Scandinavia and in UK that they hold a strong leash on European governments when related to (US) copyright laws, forcing governments, once liberals, to abide to American rules, arresting their own citizens, when content is being distributed over the Internet. It’s also interesting to remember than SOPA, PIPA and ACTA, mainly driven by Hollywood, were all created within closed doors.
So, would ITU control be better?
No. Nothing could be further from the truth. Although, in theory, it’s more democratic (more countries with decision power), this decision power has been sought for one main purpose: to enforce more strict laws. I generally agree that the ITU would not be a good controlling body, but believing that nobody controls the Internet is, at least, naive, and normally a pretentious lie.
A legal control of many countries over something as free as the Internet would impose the same dangers as having it free of legal control, since it leaves us with indirect control from the strongest player, which so far, has been the US. The other countries are only so strongly minded about the ITU because the US won’t let them have their voices, and the ITU is a way to create an UN for the Internet.
In that sense, the ITU would be a lot like the UN. Worthless. A puppet in the hands or the strong players. Each country would have more control over their borders, and that would impact almost nothing in the US, but the general rules would stop being valid, and the US (and other countries) would have to do a lot more work than they do today. One example is the stupid rule in the UK where the sites, including international ones, have to warn users that they are using cookies.
Don’t be fooled, the US government is not really worried about your safety and security, nor your freedom. They’re trying to avoid a lot of work, and a big loss in market in the Middle East and South Asia. With countries (that they like to say are authoritarian regimes) imposing stricter rules on traffic, including fees, taxes and other things that they have on material goods, the commerce with those governments will be a lot more expensive.
Ever since the second world war, the US economy is based mainly on military activities. First, helping Europe got them out of the big depression, then they forced rebellions throughout Latin America to keep the coins clinking and currently, it’s the Middle East. With the climate change endangering their last non-war resources (oil), they were betting on the Internet to spread the American Way Of Life to the less fortunate, with the off chance of selling a few iPads on the process, but now, that profit margin is getting dangerously thin.
Not to mention the military threat, since a lot of the intelligence is now being gathered through the Internet, and recent attacks on Iranian nuclear power plants via the Stuxnet worm, would all become a lot harder. The fact that China is now bigger and more powerful than they are, in every possible aspect (I dare say even military, but we can’t know for sure), is also not helping.
What is then, the solution? Is it possible to really have nobody running the Internet? And, if at all possible, is it desirable?
Mad Max Internet
I don’t think so.
It’s true that IPv6 should remove completely the need for IP allocation, but DNS is a serious problem. Letting DNS registration to an organic self-organized process would lead to widespread malicious content being distributed and building security measures around it would be much harder than they already are. The same is true with SSL certificates. You’d expect that, on a land with no rules, trusted bodies would charge a fortune and extort clients for a safe SSL certificate, if they actually produce a good one, that is, but this is exactly what happens today, on ICANN rule.
Routing would also be affected, since current algorithms rely on total trust between parties. There was a time when China had all US traffic (including governmental and military) through its routers, solely done via standard BGP rules. On a world where every country has its own core router, digitally attacking another country would be as easy as changing one line on a router.
We all love to think that the Internet is a free world already, but more often than ever, people are being arrested for their electronic behaviour. Unfortunately, because there isn’t a set of rules, or a governing body, the rules that get people arrested are the rules of the strongest player, which in our current case, is Hollywood. So, how is it possible to reconcile security, anonymity and stability without recurring to governing bodies?
The simple answer is, it’s not. The Internet is a land with no physical barriers, where contacting people over 1000s of miles is the same as the one besides you, but we don’t live in a world without borders. It’s not possible to reconcile the laws of all countries, with all the different cultures, into one single book. As long as the world keeps its multiculturalism, we have to cope with different rules for different countries, and I’m not in favour of losing our identity just to make the Internet a place comfortable to the US government.
It is my opinion that we do, indeed, need a regulating body. ICANN, ITU, it doesn’t matter, as long as the decisions are good for most.
I don’t expect that any such governing body would come up with a set of rules that are good for everybody, nor that they’ll find the best rules in the first N iterations (for large N), but if the process is fair, we should reach consensus (when N tends to infinity). The problem with both ICANN and ITU is that neither are fair, and there are other interests at play that are weighted much more than the interests of the people.
Since no regulating body, governmental or not, will ever account for the interests of the people (today or ever), people tend to hope that no-rule is the best rule, but I hope I have shown that this is not true. I believe that instead, a governing multi-body is the real solution. It’s hypocrite to believe that Russia will let the US create regulations within its borders, so we can’t assume that will ever happen from start, if we want it to work in the long run. So this multi-body, composed by independent organizations in Europe, Asia, Oceania, Africa and Americas would have strong powers on their regions, but would have to agree on very general terms.
The general terms would be something like:
- There should be no cost associated with the traffic to/from/across any country to any other country
- There should be no filtering of any content across countries, but filtering should be possible to/from a specific country or region based on religious or legal grounds
- It should be possible for countries to deny certain types of traffic (as opposed to filtering above), so that routing around would be preferred
- Misuse of Internet protocols (such as BGP and DNS spoofing) on root routers/DNS servers should be considered an international crime with the country responsible for the server in charge of the punishments or sanctions against that country could be enforced by the UN
- Legal rights and responsibilities on the Internet should be similar (but not identical) as they are on the physical world, but each country has the right and duty to enforce their own rules
Rule 1 is fundamental and would cut short most of the recent ITU’s proposals. It’s utter nonsense to cross-charge the Internet as it is to do it with telecoms around the world, and that is probably the biggest problem of the new proposal.
Rules 2 and 3 would leave control over regional Internet with little impact on the rest. It’d also foment creation of new routes around problematic countries, which is always beneficial to the Internet reliability as a whole. It’s hypocrite to assume that the US government has the right to impose Internet rules on countries like Iran or China, and it’s up to the people of China and Iran to fight their leaders on their own terms.
It’s extremely hypocrite, and very common, in the US to believe that their system (the American Way of Life) is the best for every citizen of the world, or that the people of other countries have no way of choosing their own history. It’s also extremely hypocrite to blame authoritarian governments on Internet regulations and at the same time provide weapons and support local authoritarian groups. Let’s not forget the role of the US on Afghanistan and Iraq prior to the Gulf War, as opposition to Russia and Iran (respectively), and their pivot role on all major authoritarian revolution in Latin America.
Most countries, including Russia and the ones in Middle East would probably be fine with rules 2 and 3, with little impact on the rest of the world. Which leaves us with rule 4, to account for the trust-worthiness of the whole system. Today, there is a gang of a few pals who control the main routers and giving more control over less trust-worthy pals over DNS and BGP routes would indeed be a problem.
However, in fact, this rule is in vigour today, since China routed US traffic for only 18 minutes. It was more a show of power than a real attack, but had China been doing this for too long, the US would think otherwise and with very strong reasons. The loose control is good, but the loose responsibility is not. Countries should have the freedom to structure their Internet backbones but also do it responsibly, or be punished otherwise.
Finally, there’s rule 5. How to account when a citizen of one country behaves in another country’s website as it’s legal for his culture, but not the other? Strong religious and ethical issues will arise from that, but nothing that there isn’t already on the Internet. Most of the time, this problem is identical to what already happens on the real world, with people from one country that commit crimes on another country. The hard bit is to know what are the differences between physical and logical worlds and how to reconcile the differences in interpretation of the multiple groups that will take part on such governing multi-body.
ITU’s proposal is not good, but ICANN’s is neither. The third alternative, to lack complete control is only going to make it worse, so we need a solution that is both viable and general enough, so that most countries agree to it. It also needs to relinquish control of internal features to their own governments in a way to not affect the rest of the Internet.
I argue that one single body, being it ITU or ICANN, is not a good model, since it’s not general enough nor they account for specific regions’ concerns (ICANN won’t listen to the Middle East and ITU won’t regard the US). So, the only solution I can see possible is one that unites them all into a governing multi-body, with very little in global agreement, but with general rules powerful enough to guarantee that the Internet will be free forever.
The American constitution is a beautiful piece of writing, but in reality, over the years, their government have destroyed most of its beauty. So, long term self-check must also be a core part of this multi-body, with regular review and democratic decisions (sorry authoritarian regimes, it’s the only way).
In a nutshell, while it is possible to write the Internet Constitution and make it work in the long term, humanity is very likely not ready to do that yet, and we’ll probably see the destruction of the Internet in the next 10 years.
Open Source and Innovation |
| September 13th, 2012 under Corporate, OSS, rengolin, Technology. [ Comments: 1 ]
A few weeks ago, a friend (Rob) asked me a pertinent question: “How can someone innovate and protect her innovation with open source?”. Initially, I scorned off with a simple “well, you know…”, but this turned out to be a really hard question to answer.
The main idea is that, in the end, every software (and possibly hardware) will end up as open source. Not because it’s beautiful and fluffy, but because it seems to be the natural course of things nowadays. We seem to be moving from profiting on products, to giving them away and profiting on services. If that’s true, are we going to stop innovating at all, and just focus on services? What about the real scientists that move the world forward, are they also going to be flipping burgers?
Open Source as a business model
The reason to use open source is clear, the TCO fallacy is gone and we’re all used to it (especially the lawyers!), that’s all good, but the question is really what (or even when) to open source your own stuff. Some companies do it because they want to sell the value added, or plugins and services. Others do because it’s not their core business or they want to form a community, which would otherwise use the competitors’ open source solution. Whatever the reason is, more and more we seem to be open sourcing software and hardware at an increasing speed, some times it comes off as open source on its first day in the wild.
Open source is a very good cost sharing model. Companies can develop a third-party product, not related to their core areas (where they actually make money), and still claim no responsibility or ownership (which would be costly). For example, the GNU/Linux and FreeBSD operating systems tremendously reduce the cost of any application developer, from embedded systems to big distributed platforms. Most platforms today (Apple’s, Androids, set-top boxes, sat-navs, HPC clusters, web-servers, routers, etc) have them at their core. If each of these products had to develop their own operating system (or even parts of it), it wouldn’t be commercially viable.
Another example is the MeshPotato (in Puerto Rico) box, which uses open software and hardware initially developed by Village Telco (in South Africa). They can cover wide areas providing internet and VoIP telephony over the rugged terrain of Puerto Rico for under $30 a month. If they had to develop their hardware and software (including the OS), it’d cost no less than a few hundred pounds. Examples like that are abundant these days and it’s hard to ignore the benefits of Open Source. Even Microsoft, once the biggest closed-source zealot, who propagated the misinformation that open source was hurting the American Way of Life is now one of the biggest open source contributors on the planet.
So, what is the question then?
If open source saves money everywhere, and promotes incremental innovation that wouldn’t be otherwise possible, how can the original question not have been answered? The key was in the scope.
Rob was referring, in fact, to real chunky innovations. Those that take years to develop, many people working hard with one goal in mind, spending their last penny to possibly profit in the end. The true sense of entrepreneurship. Things that might profit from other open source technologies, but are so hard to make that even so it takes years to produce. Things like new chips, new medicines, real artificial intelligence software and hardware, etc. The open source savings on those projects are marginal. Furthermore, if you spend 10 years developing a software (or hardware) and open source it straight away, how are you ever going to get your investment money back? Unless you charge $500 a month in services to thousands of customers on day one, you won’t see the money back in decades.
The big misunderstanding, I think, it’s that this model no longer applies, so the initial question was invalid to begin with. I explain.
Science and Tecnology
300 years ago, if you were curious about something you could make a name for yourself very easily. You could barely call what they did science. They even called themselves natural philosophers, because what they did was mostly discovering nature and inquiring about its behaviour. Robert Hooke was a natural philosopher and a polymath, he kept dogs with their internals in the open just to see if it’d survive. He’d keep looking at things through a microscope and he named most of the small things we can see today.
Newton, Liebniz, Gauss, Euler and few others have created the whole foundation of modern mathematics. They are known for fundamentally changing how we perceive the universe. It’d be preposterous to assume that there isn’t a person today as bright as they were, but yet, we don’t see people changing our perception of the universe that often. The last spree was more than a hundred years ago, with Maxwell, Planck and Einstein, but still, they were corrections (albeit fundamental) to the model.
Today, a scientist contents in scratching the surface of a minor field in astrophysics, and he’ll probably get a Nobel for that. But how many of you can name more than 5 Nobel laureates? Did they really change your perception of the universe? Did they invent things such as real artificial intelligence or did they discover a better way of doing politics? Sadly, no. Not because they weren’t as smart as Newton or Leibniz, but because the easy things were already discovered, now we’re in for the hard and incremental science and, like it or not, there’s no way around it.
Today, if you wrapped tin foil around a toilet paper tube and played music with it, people would, at best, think you’re cute. Thomas Edison did that and was called a Wizard. Nokia was trying to build a smartphone, but they were trying to make it perfect. Steve Jobs made is almost useless, people loved it, and he’s now considered a genius. If you try to produce a bad phone today, people will laugh at you, not think you’re cute, so things are getting harder for the careless innovators, and that’s the crucial point. Careless and accidental innovation is not possible on any field that has been exploited long enough.
Innovation and Business
Innovation is like business, you only profit if there is a market that hasn’t been taken. If you try to invent a new PC, you will fail. But if you produce a computer that has a niche that has never been exploited (even if it’s a known market, like in the Nokia’s smartphone case), you’re in for the money. If you want to build the next AI software, and it marginally works, you can make a lot of money, whether you open source your software or not. Since people will copy (copyright and patent laws are not the same in every country), your profit will diminish with time, proportional to the novelty and the difficulty in copying.
Rob’s point went further, “This isn’t just a matter of what people can or can’t do, is what people should or should not do”. Meaning, shouldn’t we aim for a world where people don’t copy other people’s ideas as a principle, instead of accepting the fact that people copy? My answer is a strong and sounding: NO! For the love of all that’s good, NO!
The first reason is simply because that’s not the world we live in and it will not be as long as humanity remains human. There is no point in creating laws that do not apply to the human race, though it seems that people get away with that very easy these days.
The second point is that it breaks our society. An example: try to get into a bank and ask for investment on a project that will take 10 years to complete (at the cost of $10M) and the return will come during the 70 years that follows it (at a profit of $100′sM a year). The manager will laugh at you and call security. This is, however, the time it takes (today) for copyright in Hollywood to expire (the infamous Mickey Mouse effect), and the kind of money they deal with.
Imagine that a car manufacturer develops a much safer way of building cars, say magical air bags. This company will be able to charge a premium, not just because of the development costs, but also for its unique position in the market. With time, it’ll save more lives that any other car and governments will want that to be standard. But no other company can apply that to their cars, or at least not without paying a huge premium to the original developer. In the end, cars will be much more expensive in general, and we end up paying the price.
Imagine if there were patents for the telephone, or the TV or cars (I mean, the concept of a car) or “talking to another person over the phone”, or “reminding to call your parents once in a while”. It may look silly, but this is better than most patent descriptions! Most of the cost to the consumer would be patents to people that no longer innovate! Did you know that Microsoft makes more money with Android phones than Google? Their contributions to the platform? Nothing. This was an agreement over dubious and silly patents that most companies accepted as opposed to being sued for billions of dollars.
In my opinion, we can’t just live in the 16th century with 21st century technology. You can’t expect to be famous or profit by building an in-house piece of junk or by spotting a new planet. Open source has nothing to do with it. The problem is not what you do with your code, but how you approach the market.
I don’t want to profit at the expense of others, I don’t want to protect my stupid idea that anyone else could have had (or probably already had, but thought it was silly), just because I was smart enough to market it. Difficult technology is difficult (duh), and it’s not up to a team of experts to create it and market it to make money. Science and technology will advance from now on on a steady, baby-steps way, and the tendency is for this pace to get even slower and smaller.
Another important conclusion for me is that, I’d rather live in a world where I cannot profit horrendously from a silly idea just because I’ve patented it than have monopolies like pharma/banking/tobacco/oil/media controlling our governments, or more than directly, our lives. I think that the fact that we copy and destroy property is the most liberating fact of humanity. It’s the Robin Hood of modern societies, making sure that, one way or another, the filthy rich won’t continue getting richer. Explosive growth, monopolies, cartels, free trade and protection of property are core values that I’d rather see dead as a parrot.
In a nutshell, open source does not hinder innovation, protection of property does.
Declaration of Internet Freedom |
| July 3rd, 2012 under Digital Rights, Life, Media, Politics, rengolin, rvincoletto, World. [ Comments: 1 ]
We stand for a free and open Internet.
We support transparent and participatory processes for making Internet policy and the establishment of five basic principles:
- Expression: Don’t censor the Internet.
- Access: Promote universal access to fast and affordable networks.
- Openness: Keep the Internet an open network where everyone is free to connect, communicate, write, read, watch, speak, listen, learn, create and innovate.
- Innovation: Protect the freedom to innovate and create without permission. Don’t block new technologies, and don’t punish innovators for their users’ actions.
- Privacy: Protect privacy and defend everyone’s ability to control how their data and devices are used.
Don’t get it? You should be more informed on the power of the internet and what governments around the world have been doing to it.
Good starting places are: Avaaz, Ars Technica, Electronic Frontier Foundation, End Software Patents, Piratpartiet and the excellent Case for Copyright Reform.
K-means clustering |
| June 20th, 2012 under Algorithms, Devel, rengolin. [ Comments: none ]
Clustering algorithms can be used with many types of data, as long as you have means to distribute them in a space, where there is the concept of distance. Vectors are obvious choices, but not everything can be represented into N-dimensional points. Another way to plot data, that is much closer to real data, is to allow for a large number of binary axis, like tags. So, you can cluster by the amount of tags the entries share, with the distance being (only relative to others) the proportion of these against the non-sharing tags.
An example of tag clustering can be viewed on Google News, an example of clustering on Euclidean spaces can be viewed on the image above (full code here). The clustering code is very small, and the result is very impressive for such a simple code. But the devil is in the details…
Each red dots group is generated randomly from a given central point (draws N randomly distributed points inside a circle or radius R centred at C). Each centre is randomly placed, and sometimes their groups collide (as you can see on the image), but that’s part of the challenge. To find the groups, and their centres, I throw random points (with no knowledge of the groups’ centres) and iterate until I find all groups.
The iteration is very simple, and consists of two steps:
- Assignment Step: For each point, assign it to the nearest mean. This is why you need the concept of distance, and that’s a tricky part. With Cartesian coordinates, it’s simple.
- Update Step: Calculate the real mean of all points belonging to each mean point, and update the point to be at it. This is basically moving the supposed (randomly guessed) mean to it’s rightful place.
On the second iteration, the means, that were randomly selected at first, are now closer to a set of points. Not necessarily points in the same cluster, but the cluster that has more points assigned to any given mean will slowly steal it from the others, since it’ll have more weight when updating it on step 2.
If all goes right, the means will slowly move towards the centre of each group and you can stop when the means don’t move too much after the update step.
Many problems will arise in this simplified version, for sure. For instance, if the mean is exactly in between two groups, and both pull it to their centres with equally strong forces, thus never moving the mean, thus the algorithm thinks it has already found its group, when in fact, it found two. Or if the group is so large that it ends up with two or more means which it belongs, splitting it into many groups.
To overcome these deficiencies, some advanced forms of K-means take into account the shape of the group during the update step, sometimes called soft k-means. Other heuristics can be added as further steps to make sure there aren’t two means too close to each other (relative to their groups’ sizes), or if there are big gaps between points of the same group, but that kind of heuristics tend to be exponential in execution, since they examine every point of a group in relation to other points of the same group.
All in all, still an impressive performance for such a simple algorithm. Next in line, I’ll try clustering data distributed among many binary axis and see how k-means behave.
« Previous entries