As well as any other software process, the development in bioinformatics need to follow some basic rules to achieve minimum levels of quality. There are many definitions of quality even when restricted to software development. Quality Assurance is the process of auditing programs and procedures in order to ensure not only quality results but also consistent development techniques, decent code writing and good use of standards. Having good results is clearly not enough.
Quality control in academical environment is fundamental because students come and go but their programs and scripts endure for decades in production environment. When there is no quality control and developers have the freedom to create not only the algorithm they like but to code on their own style, using their own version control software, obscure hand made libraries when there were clear standards and so on. When there is no quality assurance process, those developers rarely write any documentation at all, not even comments in the code and when they left there isn't a single person that can fix a simple problem without digging all over the code base. In academia, the flow of developers is so fast (most students have 6 months contracts) that this scenario tends to chaos in less than a few years and it's required several experienced software engineers to actually make the whole thing work.
In the past there was a class of hackers called Heroes that not only had to do everything, from hardware to software to selling the product but had also to solve every problem that no one else could solve. Although this class still exists, most of them were converted to the quality side of the force, the Paladins. For hackers, it's not enough to solve a problem, you need to solve elegantly and fast enough and today dirty hacks are not considered elegant at all. A Paladin is a hacker that solve the problem the most elegant and efficient way possible, testing and documenting every step.
One good definition of bad quality in software development is a state that you need a Hero to run and maintain the code. Following the same line, the definition of complete chaos is when you need Paladins. The bioinformatics community need as many Paladins as the world can offer, strong leaders to enforce quality, define policies and a do a huge amount of hard work.
For all problems described so far there are some broad guidelines to be considered as a start:
There are some very well known development techniques like RUP and XP and each have its strengths and weaknesses but the basic principles are the same:
o -+---> Plan ---> Sketch/Tests --. ^- Test&Deploy <- Development -'
Documentation is as fundamental as testing in software quality and must be done in every step of every cycle. Outdated documentation is not as bad as not having any documentation at all.
First, if you maintain your documentation with version control (Wiki-like) you'll only show the user the latest documents but still keep everything as history. Second, bugs tend to reappear, especially on environments where refactoring is frequent and keeping old documentation might shed a light on how to solve it again under the new structure. Last, keeping history is good to evaluate the developer's performance and analyse how to improve it in the future.
Documentation is not only paper or text editors. Documentation is everything written and structurally stored such as wiki-like pages, bug-tracking systems, emails, meeting minutes, etc. Everything should be accessible from the same place, better if organized in a tree or tag-cloud structure with a good full-text search.
Once the program is stable and running, serious optimizations can be done. Though, if you have designed it wisely in the beginning, predicting obvious bottlenecks, it should be a fairly trivial task to isolate most problems. But even the best programmers can make mistakes in early stages so refactoring is not optional, it's a compulsory step of maintenance.
Also, it's generally accepted that test cases should be written before the real code. This practice is fundamental to understand what the user expects from the program and therefore helps you not only optimize in the right way but also build your software shaped to what is needed instead of what you think it's needed.
The extra value test cases brings is the consistency on asserting your software is correct. If, for every modification you run all tests you'll always know which side effects the change had before users start picking it up by eye or the program blows and scraps your entire system. There is nothing worse than rolling back changes or fixing weird bugs you don't know where they came from in the middle of the night, therefore there is nothing better than a recursive and extensive test case suite to bring you peace of mind when deploying new releases of your software.
So, with the new additions to the cycle, a sketch would be similar to this:
o -> Plan (doc) -> Sketch (doc) -+-> Test Cases (doc) -> Development (doc) -> Test&Deploy (doc) ---.
^------ Bottleneck analysis / Bug-report / Requests (doc) <-------'
Even with a good development cycle there are still some mistakes that could destroy the whole effort of ordering chaos. You don't need to be a Paladin to do it right, just follow the golden rules and you'll seldom need help from Heroes or Paladins:
Some very specific cases (such as disaster recovery) might force you not to comply with those rules and that's OK as long as you do it as soon as possible given the priority of things. The most difficult part is to define where to draw the line but in 99% of the time since I started programming (a very long way ago) if I didn't comply to those rules it was my fault or laziness.
So, the rule of thumb is: Comply as often as possible given your priorities and remember that the less you comply, the worse the chaos will be.
Now that you are walking down the path to quality and following guidelines and rules not only from this text but also from what you've learnt from experience and books, how to know when your code base is good enough? What is good enough anyway?
This is when software quality measurements come into play. Software engineering techniques to assure and measure quality were developed during the last decades and there are some standards such as CMMI (Capability and Maturity Model Integration, former CMM) is one of the most cited. There are some other books about project management to help you to get most of your team like the PMBOK (Project Management Book of Knowledge) and all of them define quality standards.
I'm well aware that such standards and expenditures are too far away from bioinformatic institutes and that most of the development is done in a quite different philosophy, which is incompatible with most of these standards so we need to use the same measurements but filtering what's not appropriate for the field.
External reliability means that the results from a program should always be compatible. It doesn't mean they should be identical, for lots of academical algorithms use heavy statistics but they must fall within a range of required scientific quality, which is how biologists will measure the results and is not directly related to software quality.
The internal reliability is about how well internal libraries behave when different inputs are provided and how well errors are dealt with and reported. This quality must be defined and controlled by an experienced software engineer
In order to assure internal reliability, developers should use standard libraries as often as possible, check restrictively every input (allow list instead of block list), enable multiple levels of logging and keeping up-to-date recursive test cases are the most important practices to raise the level of reliability.
These techniques are fundamental even when the software is already stable in the case of input change or new rules being applied to the data set. Also, focusing on some of them will raise reliability but only caring about all aspects with the same effort will bring a stable reliability to your programs.
Reliability is not something you achieve just by having logs, test cases and using standards, you must tie them together to give you power to understand problems better, fix them quickly, document the changes, test recursively to assure any other part was affected and deploy safely. It's something that grows together with your program and focusing too much in reliability in the first phases of development could delay too much its deliver.
Programs should be able to deal with its own mistakes. Basic checks on input before actually running the internal methods helps but it's impossible to predict every detail.
The first level of stability is to print meaningful warning message upon errors instead of breaking the process. Some situations can be dealt with locally and instead of dying, programs should fix the input or try another approach, always warning the user of its decisions and corrections, and continue with the process. That's the second level of stability.
Also, when the problem is just too big to be handled internally the program must clean itself and quit nicely with error codes, messages, warnings and logs explaining everything and letting the system be aware that it failed. Most programs in complex environments run within shell scripts, cron jobs and other programs and error codes are fundamental to stop the process as soon as needed and not to continue with wrong inputs afterwards.
Some problems are internal to the software, normally when poorly written (Memory leak, buffer overflow, segmentation fault etc.) and are completely not accepted as normal behaviour. They normally have nothing to do with the input and output, quite often these errors happen when less experienced programmers use techniques they don't totally understand (such as pointers).
Avoiding redundancy means doing each job only once, and it's the first step to achieve consistency. The second step is to avoid doing different things with the same code. It seems foolish but in an environment where people are recycled so often and where there is no policy or quality control on software development it's far more easier to find inconsistent methods and programs than the opposite.
Even within big projects developed by fewer people is difficult to keep track of what has been done and how. Sometimes it's not easy to re-use the same method for the same task because the context or the requirements are different, sometimes it's just laziness to look for the right method and sometimes it's more efficient that way.
It's not an easy task to define when should you duplicate code and that's why it's so important to have policies and quality assurance and to strongly encourage constant refactoring and documentation.
Efficient programs are not necessarily good programs, writing understandable code is paramount to maintainability. Sometimes you must penalise the performance of a program to increase it's understandability. Think of programs as poems, they must be concise, self explanatory, beautiful, efficient (for reading too) and consistent.
Not every time programmers have time to look the code time enough to understand it completely, when bugs appear or a new feature must be added on-the-fly for an emergency run you need to find the spot and understand it very quickly to avoid reducing consistency. An understandable program is consistent for longer, even when several programmers maintain it.
Other factors, not specifically on the code level, help a lot the maintainability. Having a good structure for source code in a version control system (CVS, SVN, Mercurial and Git are the most known) is also fundamental. Tagging your tree whenever you have a release or a big milestone is also fundamental for backward maintenance and reference. Splitting into branches whenever you have concurrent development is important, thus choosing a version control software that strongly supports branching (such as Mercurial and Git) is recommended.
At the end, clearer code is easier to optimize and you can let the bottleneck optimizations very well hidden from the main implementation. As Tony Hoare stated and Donald Knuth reinforced: ”premature optimization is the root of all evil”. It's true for almost all cases but that does not mean that you shouldn't think about performance problems from the beginning, just that you spend time profiling and fine tuning after the program is running and you have real information on where the bottlenecks are.
Most people think that tests should be done after the program is running, some even write tests only when there is a problem they can't understand, most just don't write them at all.
Test cases are not only tests, they are the formal representation of an user and should do everything that a user is able to do. It means that, when you're planning the states your software can be (use case UML diagram), you must write a test for every change in every state possible. Writing tests before actual code is important to understand how the data will flow, how methods will be called and will help you a lot when defining the structure of classes, objects, structures and so on.
Good programs are testable programs. If your program is a monolith you can just test it final result, but if you have it well structured you can test each method separated from the others and know exactly where the problem is when it comes to life. Classes and methods interact in a non-linear way on most big projects, changing one method can blow up another completely unrelated method in another class and the only way to catch this type of errors before the user is to run extensive and recursive tests before every deployment. If you deploy in more than one architecture (x86, Alpha, Sparc, etc) you must test in each one of them as well.
There are official test suites available for most programming languages and developers should be highly encouraged to use them instead of in-house suites. It'll assure that external people (new developers) will most likely know it instead of having to learn one more thing. It'll also most likely be very similar in all languages so if you use more than one you don't have to learn lots of different testing paradigms in order to be safe.
The following measurements are not fundamental to bioinformatics but very good to be achieved: