The preliminary results of the ESA investigation of the Schiaparelli Mars Lander crash are now published. At its heart are faults in the real-time software design. It seems more than likely that a software error, which took a small sample of faulty sensor data, calculated that the craft was at negative altitude (which is not possible). Based on this single measurement it therefore detached the parachute whilst the actual altitude was still several kilometres above the ground. There was no recovery from such a catastrophic error. See https://spaceflightnow.com/2016/11/25/esa-says-doomed-mars-lander-succumbed-to-bad-altitude-reading/ for the full report.
Whilst hindsight is a wonderful thing, this is a pretty basic error for a real-time coding exercise. Where any inputs are read from the outside world by a micro controller, it is just good practice to check that the reading is within acceptable limits. Wherever possible, and it was in this case, there should be secondary measurements to confirm the primary is in agreement by an acceptable margin. Equally, calculated results should be similarly sanity checked before taking actions. Of course, the question comes as to what do you do if the results are in error but these questions should be asked during peer review of code well before deployment and a fall back plan of action decided upon. Often, just a simple rejection of the suspect measurement and sample again will correct the problem. This is so much easier and cheaper to do prior to lift-off than afterwards.
The lessons learned are that fail-safe mechanisms should be built into real time control code right from the outset. This was a recoverable problem if the code had allowed for the fact that it might get noisy input data.
At Arrow Technical, we are often asked why does software takes so long to write when the primary objective may seem quite simple on first inspection. If all goes to plan then all this redundancy will never be seen to work, and could be considered a waste of time (and money). In fact, even when it does work it is often just to nudge the control system back on course and unless you employ sophisticated data-logging you will not know it is working at all. However, systems that do not employ such redundancy in their real-time firmware design will usually get found out on the day you really need it, but of course, on that day it will be too late.
Rest assured, here at Arrow Technical, we do employ multiple layers of system redundancy, however simple the product may be, because it leads to solid reliable products that our clients are happy with. You won’t see it but its there like a guardian angel overseeing the reliability of your product and your company reputation. That’s why clients come to us again and again and this is what sets us apart from so many of our competitors. 21 years in the business designing products from aerospace to consumer electronic applications speaks for itself.
If this is the level of attention to detail that you want to see in your next product design then get in touch for a no obligation discussion at:-