Modules in C99

Coming from languages such as OCaml and Python, one thing I missed in C is the ability to use the concept of modules.

For those unfamiliar with the concept, modules are a way to hierarchically group functions and tools by the context – and target – they work on. In other words, if you had functions working on strings and arrays, you would put them in a String and an Array module, respectively.

Note that the following is possible in C++ with some adaptation, but let’s be serious, C++ already has namespaces for that kind of thing.

See at the end for a TL;DR.

The vanilla way

Usually, in C, the pattern goes as follow:

my_module.h:

1
2
3
4
5
6
7
#ifndef MY_MODULE_H_
# define MY_MODULE_H_

void my_module_function1(void);
void my_module_function2(int, int);

#endif /* !MY_MODULE_H_ */

If you want to call those functions from another unit, you just include the header file, and call the prefixed functions.

Now, what modular languages have, and what I strive to get, would be calling functions like so:

1
2
3
4
5
6
#include "my_module.h"

int main(void) {
    MyModule.function1();
    MyModule.function2();
}

Some of you might argue that this new syntax does not bring anything new – and they might be correct. However, I did spot some positive traits that these might entail, that I will expand further down.

Let there be Modules

To emulate a module, we need, in theory, two things:

a container
function members

Fortunately, C can have both with structures and function pointers, and with the designated initializer in C99, we can have readable assignations.

Let’s try our hand with a string module, provided with the length and concat operations. In the usual way, we would make the following header file:

1
2
3
4
5
6
7
#ifndef STRING_H_
# define STRING_H_

size_t string_length(const char *str);
void string_concat(char *dst, size_t, const char *head, const char *tail);

#endif /* !STRING_H_ */

Now, let us change the above to use a module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// string.h
#ifndef STRING_H_
# define STRING_H_

# include <stddef.h>

struct m_string {
  size_t (*length)(const char *);
  void (*concat)(char *, size_t, const char *, const char *);
};

extern const struct m_string String;

#endif /* !STRING_H_ */

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// string.c
#include <stdlib.h>

#include "string.h"

static size_t length(const char *s) {
	size_t len = 0;
	for (; *s; ++s, ++len);
	return len;
}

static void concat(char *dst, size_t size, const char *head, const char *tail) {
	size_t i = 0;
	for (; i < size && *head; ++head, ++dst, ++i)
		*dst = *head;
	for (; i < size && *tail; ++tail, ++dst, ++i)
		*dst = *tail;
	if (i < size)
		*dst = '\0';
}

const struct m_string String = {
	.length = length,
	.concat = concat
};

The change is, for me at least, pretty straight-forward. All function prototypes go into the module as function pointers, and the module is a const global variable.

The sweet part is that you are not even required to use your own functions. You could completely bind strlen to String.length, and voilà, you have a standard library function in your module.

The pros

First, let me say that I extensively use vim as my editor, and a lot (if not most) of my motions are word-based, and for those unaware, a word here means [a-zA-Z0-9_]+ (i.e. any combination of any alphanumeric character or underscore). This means that by splitting module and function names with a non-word character, it actually make it easier to use my editor/ide to manipulate those identifiers.
The magic does not end here, as you can add other modules into modules, since these are nothing but structures. Behold, I present to you my Math.Complex and Math.Matrix modules ! The only drawback I see to submodules is that you cannot do partial imports as far as I know, so you wouldn’t be able to import Math.Complex alone without Math.Matrix.
Modules will provide an unified interface for multiple implementation, regardless if they are static or dynamic, or in the same project, making swaps between them easy, which is really powerful and awesome. This has been extensively used in Quake 2 (many thanks to /u/areop for the link !), for instance.
As the implementation below will use function pointers, any respectable C compiler will be able to do the same optimisations on the functions inside a module, as it does with normal functions. On GCC 4.9.1, the -flto switch will enable link-time optimisation, and will turn all of the indirect calls spawned by this method into direct calls, and will even inline when possible.

The cons

The main turn-down that these have is that they require maintenance. A lot of maintenance. Writing function pointers prototypes (and understanding them!) can be a pain for the non-initiated.
Doxygen users will also have to provide the prefixed functions prototypes to get their documentation generated.
Last but not least, as this is pretty much non-standard, you will probably need to provide the prototypes regardless for those not wanting to deal with modules.

Last word

I had the occasion to play a bit with modules since then, and used them on many occasions in some collaborative C projects – my co-workers loved them, and I have yet do discover a major turn-down.

It seems to bring a bit of fresh air, and mostly help us avoid the_freaking_long_named_function.

TL;DR

Modules in C are great, and I still use them, but they might confuse other developers at first because of the syntax, and like any other workflow, they have their pros & cons:

Pros:

They split long functions names with a non-word character: this makes module names and function names separate entities in the eyes of text editors and IDEs.
They provide an unified interface for possibly multiple implementations without abusing preprocessing conditionals.
Optimisation and inlining still happens with function pointers.
The module hierarchy (tree structure).

Cons:

They require more maintenance: the function pointer declarations must be updated with the pointed function prototype, which can be a pain for the non-initiated.
The function pointer syntax decreases readability of the header file.
This is not directly compatible with doxygen documentation generation, you still have to provide some prototypes.

🝰