Notes on Software

There are a lot of opinions on the internet about writing software, and very little in the way of science. I’ve been writing software professionally for north of 10 years now, mainly in software products (as opposed to enterprise software or internal systems). These are my opinions based on that experience. I hope they offer some alternative view points to people who’s occupation is writing books or delivering conference talks.

General

Think about failure and overloads. Especially with long running processes or servers. How do we shed load? Is it stable under overload or will performance fold back?

Programs transform inputs into outputs, but they transform errors too. For example TCP transforms a datagram link into a reliable stream, but it also transforms packet loss or corruption into increased latency.

It you are talking to a remote system, then you have a distributed system. Make sure all the interfaces are clean

A bad sign is when you are browsing code and can’t find any meat. If all the methods you find just seem to delegate to somewhere else or have trivial implementations, then it is likely that you have too many layers. For example, here is the Linux driver for the eepro/100 series of Ethernet network cards from intel:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/e100.c

I really like the concept of ‘teaching the code to do something’. Note that nearly every line teaches the system about something, and almost every definition would have an equivalent section in the device’s datasheet which is written to teach a human how the device works.

Keep names consistent. Use the same word to mean the same thing throughout the project. Make that name exactly the name that a domain expert would use. Names will also be used as sign-posts for future maintainers to place new code: if you call a class ‘Engine’ then almost every new piece of code will have a claim to live there. Expect that class to grow into one of those 10kloc monsters.

Interfaces and Abstractions

Interfaces are hard, so don’t make them subtle. The internet protocols are simple. Put the complexity inside components like ravoli, not around the components like fusilli.

Interfaces should encapsulate questions that is it reasable for a API user ask, not abstractions over world. This is because as the software grows and morphs, those abstractions are likely to become invalid.

Abstractions should be meal-sized. Byte-size abstractions that add build up itty-bitty extra functionality tend not to work, because getting abstractions right is really hard. They should each make a good solid step forward. Building MultiThreadedFileUploader on top of SingleThreadedFileUploader is likely to not work, because the abstractions are not distinct enough and implementation details will end up leaking through.

Consider the HURD vs Linux. The cost of tiny layers is that you need many more of them, and building layers is harder than writing code. The solution is to make the layers as thick as possible within the contraints of the team’s ability to stop it all falling to pieces.

Good abstractions/interfaces should be like a keyhole. It should be possible to hide a lot of complexity behind an interface that is much simpler than the implementation. For example std::map hides the implementation of a RB tree. The x86 hides the implementation of a superscalar processor

In ‘Safer C’ Les Hatton did some analysis on the bug rates on a large FORTRAN library that (almost uniquely) is now very stable and has a complete bug history for every single function. The surprising result was that while small functions has less bugs, they had more bugs per line of code. This means that splitting a function up into two smaller units will increase the expected bug count! This can be explained by observing that bugs often happen at interface boundaries, and splitting a unit in two adds a new interface, but keeps the LOC count the same. Therefore the gain comes not from reducing function size but from abstraction: reuse reduces the total number of lines of code required.