I was surprised to see, during the development of rst2ansi, that there were no simple python modules to pretty-print complex tables with ascii characters.

Most of them only supported simple tables with uni-line cells, and God forbid you want to print cells that span multiple columns or rows.

I went on and made a little algorithm to help me draw these tables.

The algorithm

We want to draw an arbitrary table, for instance:

+-----+-----+-----+-----+
|  A  |  B  |  C  |  D  |
+-----+-----+-----+-----+
|  E  |        F        |
+-----+-----+-----------+
|  G  |     |           |
+-----+  I  |     J     |
|  H  |     |           |
+=====+=====+===========+

The first thing we need to define is what exactly constitute a row in this table, and a traversal order for each cell:

Row 1
Row 2
Row 3
Row 4
1 2 3 4
5 6
7 8 9
10


Once done, we can straight-forwardly draw the table by iterating over the cells using the above order:

  1. Draw the top and left table borders
  2. For each cell draw its bottom and right borders.

Here is a step-by-step visualisation of the process for the above table:

Step 1

Step 4
A B C

Step 7
A B C D
E F

Step 10
A B C D
E F
G I J

Step 2
A

Step 5
A B C D

Step 8
A B C D
E F
G

Step 11
A B C D
E F
G I J
H

Step 3
A B

Step 6
A B C D
E

Step 9
A B C D
E F
G I

Step 12
A B C D
E F
G I J
H


(Note that this algorithm could be used to draw tables on any medium, not just for ascii pretty-printing.)

It is immediately apparent that for this algorithm to work, it needs precise information on the table:

  • It needs, of course, the number of rows and columns and the contents of the cells,
  • It needs the layout of each cell, i.e. which cells spans multiple rows and/or columns.
  • It needs the height of each row and the width of each column.

This means that we have to define a proper data structure to represent these informations.

The implementation

Setting up the stage

The first thing we have to define is the way to represent our data. Given a table like this:

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | Cells may span columns.          |
+------------------------+------------+---------------------+
| body row 3             | Cells may  | Cells may span both |
+------------------------+ span rows. | rows and columns.   |
| body row 4             |            |                     |
+========================+============+=====================+

What would be the best way to represent it?

One could think at first to use multi-dimensional list, but we get hit in the face when trying to deal with cells spaning multiple columns and rows.

It turns out the best format to represent a table with, albeit a little (and by that I mean “lot”) verbose, is to use a tree:

 1 TABLE = {
 2     'node': 'table',
 3     'children': [
 4       {
 5         'node': 'head',
 6         'children': [
 7             {
 8               'node': 'row',
 9               'children': [
10                 {'node': 'cell', 'data': 'Header row, column 1\n(header rows optional)'},
11                 {'node': 'cell', 'data': 'Header 2'},
12                 {'node': 'cell', 'data': 'Header 3'},
13                 {'node': 'cell', 'data': 'Header 4'},
14               ]
15             }
16           ]
17       },
18       {
19         'node': 'body',
20         'children': [
21           {
22             'node': 'row',
23             'children': [
24               {'node': 'cell', 'data': 'body row 1, column 1'},
25               {'node': 'cell', 'data': 'column 2'},
26               {'node': 'cell', 'data': 'column 3'},
27               {'node': 'cell', 'data': 'column 4'},
28             ]
29           },
30           {
31             'node': 'row',
32             'children': [
33               {'node': 'cell', 'data': 'body row 2'},
34               {'node': 'cell', 'data': 'Cells may span columns.', 'morecols': 2},
35             ],
36           },
37           {
38             'node': 'row',
39             'children': [
40               {'node': 'cell', 'data': 'body row 3'},
41               {'node': 'cell', 'data': 'Cells may span rows.', 'morerows': 1},
42               {'node': 'cell', 'data': 'Cells may span both rows and columns.', 'morerows': 1, 'morecols': 1},
43             ],
44           },
45           {
46             'node': 'row',
47             'children': [
48               {'node': 'cell', 'data': 'body row 4'},
49             ],
50           }
51         ]
52       }
53     ]
54   }

This doesn’t really look easy to make, but it matters not: we will only use this format as an intermediate structure, meant to be generated from another source.

Looks familiar? It just so happens that this structure looks a little bit like the one HTML uses:

 1 <table>
 2   <thead>
 3     <tr>
 4       <td>Header row, column 1<br>(header rows optional)</td>
 5       <td>Header 2</td>
 6       <td>Header 3</td>
 7       <td>Header 4</td>
 8     </tr>
 9   </thead>
10   <tbody>
11     <tr>
12       <td>body row 1, column 1</td>
13       <td>column 2</td>
14       <td>column 3/td>
15       <td>column 4/td>
16     </tr>
17     <tr>
18       <td>body row 2</td>
19       <td colspan="3">Cells may span columns.</td>
20     </tr>
21     <tr>
22       <td>body row 3</td>
23       <td rowspan="2">Cells may span rows.</td>
24       <td rowspan="2" colspan="2">Cells may span both rows and columns.</td>
25     </tr>
26     <tr>
27       <td>body row 4</td>
28     </tr>
29   </tbody>
30 </table>

We could make a function to transform these HTML tables into our intermediate format, for instance.

Sizing the table up

The other data we need for this algorithm to work is the dimensions of each row and columns.

The implementation of this function depends on the use case and the strategy you want. Some possible ideas:

  • Find the width of a column:
    • Use a fixed width
    • For each column, use the mean of widths of all cell contents in this specific column
    • Use the length of the column label
  • Find the height of a row:
    • Use the maximum number of lines the contents of each cell in a row can fill.

Implementing the algorithm

The major component of the algorithm is the iteration over the cells. Since we defined a tree-like data structure, a good iteration technique is walking over each node with a visitor.

We start by defining a Visitor class that implements visit. visit is a method that calls 'visit_' + node_name on the node, visits all of its children, then calls 'depart_' + node_name.

We also define a SkipChildren exception, which, if raised by a visit_ function, will tell the visitor not to visit the children of the current node.

 1 class SkipChildren(Exception):
 2   pass
 3 
 4 class Visitor(object):
 5 
 6   def visit(self, node):
 7     """
 8     Visit a generic node. Calls 'visit_' + node_name on the node,
 9     then visit its children, then calls 'depart_' + node_name.
10     """
11     def noop(node):
12       pass
13 
14     try:
15       visit_fn = getattr(self, 'visit_' + node['node'])
16     except AttributeError:
17       visit_fn = noop
18 
19     try:
20       depart_fn = getattr(self, 'depart_' + node['node'])
21     except AttributeError:
22       depart_fn = noop
23 
24     try:
25       visit_fn(node)
26       for n in node.get('children', []):
27         self.visit(n)
28     except SkipChildren:
29       return
30     finally:
31       depart_fn(node)

With our visitor class, we can finally define our table printer:

 1 # Note: for the sake of this article's brevity, the code below
 2 # is actually python pseudocode. A working implementation can
 3 # be found in the git repository linked at the end of this article.
 4 
 5 class TablePrinter(Visitor):
 6 
 7   def __init__(self):
 8     self.row = 0
 9     self.col = 0
10 
11   def visit_table(self, node):
12     _draw_initial_table(node['colspec'], node['rowspec'])
13 
14   def visit_row(self, node):
15     self.col = 0
16 
17   def depart_row(self, node):
18     self.row += 1
19 
20   def visit_cell(self, node):
21     # Get position & dimensions
22 
23     pos = self.row, self.col
24     dim = self._get_cell_dimensions(node, pos)
25 
26     spanned_cols, spanned_rows, width, height = dim
27 
28     # Draw the cell
29 
30     self._draw_bottom_border(dim, pos, node)
31     self._draw_right_border(dim, pos, node)
32 
33     self._draw_cell_contents(dim, pos, node)
34 
35     # Update the position & stop recursing
36 
37     self.col += spanned_cols
38     raise SkipChildren
39 
40   def _get_cell_dimensions(self, node, pos):
41     """
42     Returns the number of columns and rows this cell spans,
43     and its width in character columns and height in lines.
44 
45     Args:
46         node: the node of the cell.
47         pos: the row and column indices of this cell.
48     """
49     raise NotImplementedError
50 
51   def _draw_initial_table(self, widths, heights):
52     """
53     Draw the leftmost and upmost borders of the table,
54     and fills the defined rectangle with spaces.
55     Args:
56         widths: the widths of each columns
57         heights: the heights of each rows
58     """
59     total_width = sum(widths) + (len(widths) + 1)
60     total_height = sum(heights) + (len(heights) + 1)
61 
62     raise NotImplementedError
63 
64   def _draw_bottom_border(self, dim, pos, cell):
65     """
66     Draw a bottom border for this cell
67     Args:
68         dim: the dimensions of the cell
69         pos: the position of the cell in the table
70         cell: the cell properties
71     """
72     raise NotImplementedError
73 
74   def _draw_right_border(self, dim, pos, cell):
75     """
76     Draw a bottom border for this cell
77     Args:
78         dim: the dimensions of the cell
79         pos: the position of the cell in the table
80         cell: the cell properties
81     """
82     raise NotImplementedError
83 
84   def _draw_right_border(self, dim, pos, cell):
85     """
86     Draw the contents of this cell
87     Args:
88         dim: the dimensions of the cell
89         pos: the position of the cell in the table
90         cell: the cell properties
91     """
92     raise NotImplementedError

Wrapping up

I published on github an ascii table renderer using this algorithm as a proof of concept here. The code isn’t perfect, and needs to be cleaned up a bit, but should be overall understandable.

This algorithm is simple enough for purposes where there are no border styles. For broader ones, one could either change this algorithm to make it overwrite previous borders, or use the guidelines as described in the official w3c document on CSS2 properties of HTML tables.

I am not a fan of C’s stance on variadic function arguments. They are unsafe and their usage is fairly unorthodox.

I, however, will not claim to solve all of the issues with it — I will provide, at best, some techniques I have used that I think will make variadic functions more usable.

As such, be wary that there is no silver bullet, and each proposition has some drawbacks.

Well then, without further ado, let us first quicky review how variadic arguments work:

Standard variadic arguments

 1 #include <stdio.h>
 2 #include <stdarg.h>
 3 
 4 void func(int n, ...) {
 5     va_list args;
 6     va_start(args, n);
 7 
 8     for (int i = 0; i < n; ++i)
 9         printf("%s\n", va_arg(args, char *));
10 
11     va_end(args);
12 }
13 
14 int main(void) {
15     func(3, "1", "2", "3");
16     return 0;
17 }

Here, func is a variadic function that prints its parameters. What this simple program does is printing on the standard output the strings “1”, “2” and “3” on separate lines.

The inner workings are straight forward : since variadic parameters can only be last in a parameter list, they are pushed first on the stack when the function is called.

However, there is no way, as-is, to be able to access each variadic parameter without a point of reference, a marker to tell where to look, so we need an additional parameter, a sentinel.

This is what va_start just does, it initializes the variadic parameter list in a va_list, with n as a point of reference. Each parameter can then be pulled in sequence with va_arg, and when we are done, we just call va_end. Simple indeed, but incredibly unsafe.

A sword of Damocles

The first thing that you might have noticed is that the sentinel n indicates the number of variadic parameters: indeed, there is no way as-is to know how many of them there are, so we need a hint. Some functions like printf are blessed with parameters that convey both a meaning and the number of expected variadic arguments (i.e. the format string), but in the general case, that is not always possible.

So, would you guess what calling func(4, "1", "2", "3") does ? That’s right, Undefined Behavior. At best, your process will read whatever has been pushed before “3”, try do dereference it, and will crash, but worse, it could continue to run as if nothing happened – no need to tell how bad this is.

Let’s take another example:

 1 void func2(int n, ...) {
 2     va_list args;
 3     va_start(args, n);
 4 
 5     if (n > 0) printf("%d\n", va_arg(args, int));
 6     if (n > 1) printf("%f\n", va_arg(args, double));
 7     if (n > 2) printf("%s\n", va_arg(args, char *));
 8 
 9     va_end(args);
10 }

What would happen if I called func2(3, "1", 2, 3) ? Again, Undefined Behavior. va_arg is not typesafe, meaning that you have to care about the order and size of your parameters (this can cause horrible results if, for example, you pass 2 instead of 2.; subtle, but deadly).

Overall, I would say that the worst part is that your compiler will never complain about all that, the only special case being the printf function family, where the parameters are type checked against the format string (and this only happens because said functions have the format function annotation).

Improving the usability with sentinels

Let us ignore for the moment all the type safety concerns, and focus on the first horrible problem variadic parameters have : the length hint.

We are not, in 99% of the time, designing functions à la printf, where one of the mandatory parameters gives us the length hint, so usually we end up with functions like:

void func(param0, param1, int n, ...) {
    // function body
}

If we happen to have all our variadic parameters of the same type, we can use a null sentinel at the end of the parameter list:

 1 __attribute__ ((sentinel))
 2 void func(param0, param1, ...) {
 3     va_list args;
 4     va_start(args, param1);
 5 
 6     for (char *s; s = va_arg(args, char *);)
 7         printf("%s\n", s);
 8 
 9     va_end(args);
10 }
11 
12 int main(void) {
13     func(param0, param1, "1", "2", "3", NULL);
14     return 0;
15 }

The attribute here is of course GCC/Clang only, but can help to enforce the length safety by generating a compiler warning when no NULL is provided at the end of the parameter list.

This is better, but it only works when the parameters are of the same integer type and we still have to remember to insert NULL, plus we programmers hate writing redundant things like this. It is time to ask some help from the C preprocessor.

Using the preprocessor to generate the length hints

With sentinels

Wrapping the function to insert a trailing NULL is pretty straight forward:

// ANSI C
#define func(Param0, Param1, ...) func(Param0, Param1, __VA_ARGS__, NULL)

// GNU C
#define func(Param0, Param1, args...) func(Param0, Param1, ## args, NULL)

The only difference between the ANSI and GNU version is that func(a, b) will expand to func(a, b, NULL) in GNU C instead of func(a, b, , NULL) in ANSI C (making one of the variadic parameters mandatory).

With the length directly

Generating a similar macro when the length hint is explicit is a bit trickier. We first have to define a macro to count the variadic parameters passed to it:

 1 #if __STRICT_ANSI__
 2 # define ARG_LENGTH(...) __builtin_choose_expr(sizeof (#__VA_ARGS__) == 1,  \
 3         0,                                                                  \
 4         ARG_LENGTH__(__VA_ARGS__))
 5 #else /* !__STRICT_ANSI__ */
 6 # define ARG_LENGTH(...) ARG_LENGTH__(__VA_ARGS__)
 7 #endif /* !__STRICT_ANSI__ */
 8 
 9 # define ARG_LENGTH__(...) ARG_LENGTH_(,##__VA_ARGS__,                         \
10     63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45,\
11     44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26,\
12     25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6,\
13     5, 4, 3, 2, 1, 0)
14 # define ARG_LENGTH_(_, _63, _62, _61, _60, _59, _58, _57, _56, _55, _54, _53, \
15     _52, _51, _50, _49, _48, _47, _46, _45, _44, _43, _42, _41, _40, _39, _38, \
16     _37, _36, _35, _34, _33, _32, _31, _30, _29, _28, _27, _26, _25, _24, _23, \
17     _22, _21, _20, _19, _18, _17, _16, _15, _14, _13, _12, _11, _10, _9, _8,   \
18     _7, _6, _5, _4, _3, _2, _1, Count, ...) Count

(Snippet taken from libcsptr)

Here the snippet is GNU C only, but can be adapted to work with other compiler – the only difference being that ARG_LENGTH will have to take at least one parameter.

The #if __STRICT_ANSI__ block is here to have consistent results with different gcc/clang compiler options when ARG_LENGTH() is expanded: without this, the macro expands to 1 instead of 0 when compiled with -ansi, -std=c99, or -std=c89.

The macro itself relies on the fact that the expanded varargs will shift all the trailing numbers to the right in ARG_LENGTH_, making Count expand to the number of times the shift happens.

With this macro defined, it becomes trivial to wrap the variadic function:

#define func(Param0, Param1, args...) \\
    func(Param0, Param1, ARG_LENGTH(args), ## args)

Tackling the type safety problem

This is a technique I discovered while experimenting with the interfaces of libcsptr and criterion: it all goes down to the fact that designated initializers and compound literals both use commas as value separator, just like parameter lists – the one major difference being that types in those are checked.

Putting the concept into practice:

 1 struct func_params {
 2     int sentinel_;
 3     int a;
 4     double b;
 5     char *c;
 6 };
 7 
 8 void func(struct func_params *args) {
 9     printf("%d, %f, %s\n", args->a, args->b, args->c);
10 }
11 
12 #define func(...) \
13     func(&(struct func_params) { .sentinel_ = 0, __VA_ARGS__ })

sentinel_ is here to ensure that func called without parameters still produce a valid compound literal.

Edit: professorlava fairly pointed out that for those that use GNU extensions you can, in fact, ignore the sentinel. It will still produce a warning if -Wmissing-field-initializer is provided, so be sure to balance that out (is 4 bytes worth all this?).

What this brings on the table is a type safe interface for optional parameters (that can be extended to variadic parameters when using arrays instead of structs), that is 100% standard C99 (to make this work with C89, you would have to replace the compound literal with a local variable and a designated initializer).

Hence, this compiles:

func();
func(1);
func(1, 2.0);
func(1, 2.0, "3");

… and this does not:

func(1, "3"); // error: a pointer is not a double
func(1, 2.0, 3); // error: an integer is not a pointer

The neat side-effect that this technique spawns is that func has a python-like interface for keyword arguments:

// All of these expressions are equivalent
func(1, .b = 2.0, .c = "3");
func(1, .c = "3", .b = 2.0);
func(.a = 1, .c = "3", .b = 2.0);

Of course, one must also consider that this technique provides default values to unspecified fields: 0 (zero). This effect is sometimes desirable, but on other cases it is not.


TL;DR

Variadic functions are unsafe and weird to use, the following techniques try to provide a better interface:

Technique Usage Pros Cons
Sentinel f(param, "1", "2", NULL) Sentinel presence can be enforced Trailing NULL (easily forgettable)
    No length parameter Unique type parameters
      Not type safe
Sentinel + Macro f(param, "1", "2") Benefits of sentinel without trailing NULL Unique type parameters
      Not type safe
Length parameter + Macro f(1, 2.0, "3") Inferred length parameter Not type safe
    Parameters of different types  
Struct + Macro f(1, 2.0, "3") Type safe optional parameters Default values are always 0
  f(1, .c = "3", .b = 2.0) Python-like keyword arguments  
       

Conclusion

These techniques provide some sugar on top of the C language to provide safer interfaces to variadic parameters – I hope you find inspiration in those techniques.

I am a big fan of C, but some part of me always yearn to have just enough higher level constructs.

The impracticality of memory allocation in C is one of my pet peeves. Being very easily distracted, I tend to forget to free my memory, or close my resources, so you might guess that when I learned about smart pointers in C++, I was immediately hooked.

As a freshman in my IT engineering school, our first semester was mostly about programming in C. We were assigned pretty simple, solo projects, until the big project came in — we had to implement a POSIX bourne shell, as a team of 4 people, and we were warned that if we leaked, we would eat a pretty hardcore malus on our final grade. We also had to use gcc 4.9 as our compiler.

Because I knew that my teammates and I were certainly not perfect, I decided to try and make our life easier. I first thought about wrapping malloc to register all allocated data, and then free everything before exiting, but it wasn’t a satisfying solution.

I went and read a little more about gcc’s attributes, and found __attribute__ ((cleanup(f)): according to the documentation, this variable attribute would call the function f on the variable before exiting the current scope.
I was delighted.

Implementing an auto-cleaning variable

I went on and figured that I would simply make some kind of would-be type attribute to make the variable free itself after going out of scope:

#define autofree __attribute__((cleanup(free_stack)))

__attribute__ ((always_inline))
inline void free_stack(void *ptr) {
    free(*(void **) ptr);
}

The usage was pretty straight forward:

1 int main(void) {
2     autofree int *i = malloc(sizeof (int));
3     *i = 1;
4     return *i;
5 }

But while it worked well for these tests, when I started refactoring the existing code with the new keyword, it simply didn’t prove to be that useful, because most of the dynamically allocated data was complex data, with (also dynamically allocated) members, and since you can’t slap a cleanup attribute on a struct member, I had to do better.

I needed some kind of destructor function mechanism.

I decided to prepend some metadata to the allocated memory – this ought to be the most non-intrusive way to work my way to the new goal:

|------------------------|---------------------- // -----|
| metadata    | padding? | actual data                   |
|------------------------|---------------------- // -----|
^                        ^ returned address (word aligned)
 `- start of allocated
    block

Thou shalt have my metadata

I needed two functions: one to allocate the memory, and one to free it. Hence came into existence smalloc and sfree:

 1 struct meta {
 2     void (*dtor)(void *);
 3     void *ptr;
 4 };
 5 
 6 static struct meta *get_meta(void *ptr) {
 7     return ptr - sizeof (struct meta);
 8 }
 9 
10 __attribute__((malloc))
11 void *smalloc(size_t size, void (*dtor)(void *)) {
12     struct meta *meta = malloc(sizeof (struct meta) + size);
13     *meta = (struct meta) {
14         .dtor = dtor,
15         .ptr  = meta + 1
16     };
17     return meta->ptr;
18 }
19 
20 void sfree(void *ptr) {
21     if (ptr == NULL)
22         return;
23     struct meta *meta = get_meta(ptr);
24     assert(ptr == meta->ptr); // ptr shall be a pointer returned by smalloc
25     meta->dtor(ptr);
26     free(meta);
27 }

Both functions are pretty straight-forward: smalloc allocates memory to host both the requested data size and the metadata we need. It then initializes said metadata and stores the destructor, and returns the pointer to the start of the uninitialized user data.

sfree behaves exactly like free, in that it does nothing if NULL is passed, and otherwise deallocates the memory. The only difference is that it calls the destructor stored during the call to smalloc before the actual deallocation, so that the cleanup step can be performed.

Given these two functions, I could rewrite my autofree macro:

#define smart __attribute__((cleanup(sfree_stack)))

__attribute__ ((always_inline))
inline void sfree_stack(void *ptr) {
    sfree(*(void **) ptr);
}

I figured that this was kind of looking more and more like a smart pointer, so I went ahead and renamed autofree to smart.

One of the immediate consequences of sfree running a destructor was that sfree was the universal deallocator, akin to the delete keyword in C++.

This means that for any type managed by smalloc I could just call sfree on it without worrying about the real destructor function, and I would know that it would have been destroyed – this was a huge improvement.

One drawback here is that even for simple types, we had to specify a valid destructor, and this is not always desirable for simple types, so I allowed NULL to be a valid parameter to mean that there is no destructor.

I also took the time to add a way to put user-defined data into the metadata block, since it could have some nice applications, like length-aware arrays, or extending existing library structures.

On the path to unique_ptr and shared_ptr

With all of the above done, we pretty much have the needed foundations in place for unique_ptr and shared_ptr. In fact, we currently have unique_ptr.

The only thing left is shared_ptr then – easier said than done. Shared pointers rely on thread-safe reference counting. From there, two choices: locks, or atomics.

Using locks is easy, but it makes smart pointer impossible to use in signal handlers; not much of a problem, but still annoying.

A lock-free implementation would be ideal, but there is a big issue with that: lock-free implementations rely on a compare-and-swap mechanism on the wrapped pointer – here, the pointer is not wrapped. This sucks, but if we make the assumption that no destruction shall happen during an async context, it’s kind of fine (Oh boy, this will definitely make people jump out of their seat). On a side note, I have yet to see a proper solution to that, and I would like to avoid having double pointed types to solve the issue – if anyone has an idea, feel free to send me a message or send a pull request on the github repository.

For now, I included an atomic integer in the metadata, and a sref function: each time it is called on a shared pointer, the internal reference counter increments by 1, and each time sfree is called on said pointer, it is decremented by 1.

We still need a way to make a difference between unique and shared pointers, smalloc-wise. To do that, I added another parameter to specify the kind.

Oh, well. That’s a lot of parameters, now. We have the size, the destructor, the user data, the size of said data, and now the kind of resulting pointer. We started with a simple concept, but if I have to specify five parameters for each allocation, I’d rather go back to malloc and free.

Macros to the rescue

One of the first improvements we could do, is to make smalloc variadic — after all, the destructor and the user metadata are completely (and should be) optional. The issue with this is that we need to pass some kind of indicator of the number of variadic parameters. Some use NULL as a sentinel, but it’s not that practical, and some use a parameter to specify that number, but it’s not that better.

The solution here is to go with the additional parameter, and wrap everything with a macro. The macro would count the number of variadic parameters and dispatch them to the real smalloc function.

Add two other macros for unique_ptr and smart_ptr to call smalloc with the proper kind parameter, and we have a practical smart pointer library:

If I’d like to have an unique pointer with a destructor, I’d use unique_ptr(sizeof (T), dtor);.
For a shared pointer with no destructor but user data, shared_ptr(sizeof (T), &data, sizeof (data));… and this is, at last, absolutely satisfying.

Edit 15/01/2015: On the advice of /u/matthieum on reddit, I changed the size parameter of my unique_ptr and shared_ptr macros to use the actual type – this makes the usage even easier. On top of that, I implemented smart arrays, and managed to take in array type parameters like int[9] and infer the size of the element type and the length of the array.

Wrapping everything up…

There we have it, smart pointers for everyone. I had fun pushing the language a bit further to make it do things it normally would not, although the project itself needs to mature a bit more before being ready for some kind of production.

I have wrapped everything in a github repository, for everyone to toy with. Installation instruction & usage samples are included in the readme. Have fun reading through, curious developer!

Coming from languages such as OCaml and Python, one thing I missed in C is the ability to use the concept of modules.

For those unfamiliar with the concept, modules are a way to hierarchically group functions and tools by the context – and target – they work on. In other words, if you had functions working on strings and arrays, you would put them in a String and an Array module, respectively.

Note that the following is possible in C++ with some adaptation, but let’s be serious, C++ already has namespaces for that kind of thing.

See at the end for a TL;DR.

The vanilla way

Usually, in C, the pattern goes as follow:

my_module.h:

1 #ifndef MY_MODULE_H_
2 # define MY_MODULE_H_
3 
4 void my_module_function1(void);
5 void my_module_function2(int, int);
6 
7 #endif /* !MY_MODULE_H_ */

If you want to call those functions from another unit, you just include the header file, and call the prefixed functions.

Now, what modular languages have, and what I strive to get, would be calling functions like so:

1 #include "my_module.h"
2 
3 int main(void) {
4     MyModule.function1();
5     MyModule.function2();
6 }

Some of you might argue that this new syntax does not bring anything new – and they might be correct. However, I did spot some positive traits that these might entail, that I will expand further down.

Let there be Modules

To emulate a module, we need, in theory, two things:

  • a container
  • function members

Fortunately, C can have both with structures and function pointers, and with the designated initializer in C99, we can have readable assignations.

Let’s try our hand with a string module, provided with the length and concat operations. In the usual way, we would make the following header file:

1 #ifndef STRING_H_
2 # define STRING_H_
3 
4 size_t string_length(const char *str);
5 void string_concat(char *dst, size_t, const char *head, const char *tail);
6 
7 #endif /* !STRING_H_ */

Now, let us change the above to use a module:

 1 // string.h
 2 #ifndef STRING_H_
 3 # define STRING_H_
 4 
 5 # include <stddef.h>
 6 
 7 struct m_string {
 8   size_t (*length)(const char *);
 9   void (*concat)(char *, size_t, const char *, const char *);
10 };
11 
12 extern const struct m_string String;
13 
14 #endif /* !STRING_H_ */
 1 // string.c
 2 #include <stdlib.h>
 3 
 4 #include "string.h"
 5 
 6 static size_t length(const char *s) {
 7 	size_t len = 0;
 8 	for (; *s; ++s, ++len);
 9 	return len;
10 }
11 
12 static void concat(char *dst, size_t size, const char *head, const char *tail) {
13 	size_t i = 0;
14 	for (; i < size && *head; ++head, ++dst, ++i)
15 		*dst = *head;
16 	for (; i < size && *tail; ++tail, ++dst, ++i)
17 		*dst = *tail;
18 	if (i < size)
19 		*dst = '\0';
20 }
21 
22 const struct m_string String = {
23 	.length = length,
24 	.concat = concat
25 };

The change is, for me at least, pretty straight-forward. All function prototypes go into the module as function pointers, and the module is a const global variable.

The sweet part is that you are not even required to use your own functions. You could completely bind strlen to String.length, and voilà, you have a standard library function in your module.

The pros

  1. First, let me say that I extensively use vim as my editor, and a lot (if not most) of my motions are word-based, and for those unaware, a word here means [a-zA-Z0-9_]+ (i.e. any combination of any alphanumeric character or underscore). This means that by splitting module and function names with a non-word character, it actually make it easier to use my editor/ide to manipulate those identifiers.

  2. The magic does not end here, as you can add other modules into modules, since these are nothing but structures. Behold, I present to you my Math.Complex and Math.Matrix modules ! The only drawback I see to submodules is that you cannot do partial imports as far as I know, so you wouldn’t be able to import Math.Complex alone without Math.Matrix.

  3. Modules will provide an unified interface for multiple implementation, regardless if they are static or dynamic, or in the same project, making swaps between them easy, which is really powerful and awesome. This has been extensively used in Quake 2 (many thanks to /u/areop for the link !), for instance.

  4. As the implementation below will use function pointers, any respectable C compiler will be able to do the same optimisations on the functions inside a module, as it does with normal functions. On GCC 4.9.1, the -flto switch will enable link-time optimisation, and will turn all of the indirect calls spawned by this method into direct calls, and will even inline when possible.

The cons

  1. The main turn-down that these have is that they require maintenance. A lot of maintenance. Writing function pointers prototypes (and understanding them!) can be a pain for the non-initiated.

  2. Doxygen users will also have to provide the prefixed functions prototypes to get their documentation generated.

  3. Last but not least, as this is pretty much non-standard, you will probably need to provide the prototypes regardless for those not wanting to deal with modules.

Last word

I had the occasion to play a bit with modules since then, and used them on many occasions in some collaborative C projects – my co-workers loved them, and I have yet do discover a major turn-down.

It seems to bring a bit of fresh air, and mostly help us avoid the_freaking_long_named_function.


TL;DR

Modules in C are great, and I still use them, but they might confuse other developers at first because of the syntax, and like any other workflow, they have their pros & cons:

Pros:

  • They split long functions names with a non-word character: this makes module names and function names separate entities in the eyes of text editors and IDEs.
  • They provide an unified interface for possibly multiple implementations without abusing preprocessing conditionals.
  • Optimisation and inlining still happens with function pointers.
  • The module hierarchy (tree structure).

Cons:

  • They require more maintenance: the function pointer declarations must be updated with the pointed function prototype, which can be a pain for the non-initiated.
  • The function pointer syntax decreases readability of the header file.
  • This is not directly compatible with doxygen documentation generation, you still have to provide some prototypes.

Further readings

Back when I did not know anything about programing and started to learn C, I was first introduced to pointers (and other dreaded horrors that made me curl into a corner and cry) and dynamic memory in general.

I was baffled, troubled, yet fascinated by the basic explanation on how memory worked, and started to dread the time where I would need to manually create my char arrays for each and every sentences of my program; right before learning about string literals and feeling like an idiot.

It was then where I was learning about memory allocation and came upon a function that I would call for long the “magic function” : malloc. Magic, because at that point I didn’t know how it worked, let alone knew anything about memory other that it was a “chain of boxes for numbers”.

Time passes and I recently had as an assignment to code malloc. This article is meant to share my experience on recoding the function, as well as some kind of howto since some people asked me how it was done.

Preparing the journey

Recoding malloc is not the simplest of tasks, as its implementation will mostly depend on the kernel you are running – as such, I am going to assume that we are running a Linux system, although this article should apply for any POSIX-compliant UNIX system. I am not familiar enough with Windows NT’s heap functions to cover malloc’s implementation for Windows.

You will need GDB if you have bugs – no joking. You just don’t have access to any function of the standard library as most will rely on their implementation of malloc.

I will consider that you already know the memory layout of an UNIX process, as well as what the program break is. If you don’t, there is an addendum covering that at the end of the article.

Lastly, if you haven’t yet, you should really read the manual pages for malloc(3).

The dumb allocator

We know from sbrk(2) that we can manipulate the program break, asking for or returning dynamic memory to the operating system. Hence, the dumbest allocator does just that :

1 #include <unistd.h>
2 
3 void *malloc(size_t size) {
4     void *c = sbrk(size);
5     return c == (void*) -1 ? NULL : c;
6 }

Trouble ahead

That’s, however, not enough to run a program – the malloc man page mentions free, calloc, and realloc. We need to implement them first before testing out the allocator. free brings up our first issue with the allocator : we have no way in hell to know where the allocated segments are, and since our only mean to return memory to the system is to decrement the program break, free is just not feasible yet and will be empty.

1 void free(void* ptr) {}

calloc is pretty much straight forward : you call malloc and fill the memory section with zeroes.

realloc might cause some trouble, as we need to copy the old memory section to the newly allocated section, padding with zeroes if needed. Unfortunately, as for free, we have no way to know the size of the old segment. At that point, we are pretty much stuck with that model, and the best we might be able to do is to allocated fixed size blocks and pray that the running program will never want to allocate something bigger.

The whole code

 1 #include <unistd.h>
 2 #define SIZE 1024
 3 
 4 void *malloc(size_t size) {
 5     void *c = sbrk(SIZE);
 6     return c == (void*) -1 ? NULL : c;
 7 }
 8 
 9 void free(void *ptr) {}
10 
11 void *calloc(size_t nmemb, size_t size) {
12     void *ptr = malloc(size);
13     char *b = ptr;
14     for (size_t i = 0; i < SIZE; ++i, ++b) {
15         *b = 0;
16     }
17     return ptr;
18 }
19 
20 void *realloc(void *ptr, size_t size) {
21     if (!size) return NULL;
22     void *newptr = malloc(size);
23     if (ptr) {
24         char *b1 = ptr, *b2 = newptr;
25         for (size_t i = 0; i < SIZE; ++i, ++b1, ++b2) {
26             *b2 = *b1;
27         }
28     }
29     return newptr;
30 }

This should work, given that the running program never tries to allocate more than 1024 bytes – and here are the test results :

$ make debug
clang -Wall -Wextra -g -fPIC -c -o malloc.o malloc.c 
        && clang -shared -o libmalloc.so malloc.o
malloc.c:6:21: warning: unused parameter 'size' [-Wunused-parameter]
void *malloc(size_t size) {
                    ^
malloc.c:11:17: warning: unused parameter 'ptr' [-Wunused-parameter]
void free(void *ptr) {}
                ^
2 warnings generated.
$ LD_PRELOAD=./libmalloc.so /bin/ls
libmalloc.so  Makefile  malloc.c  malloc.o
$ LD_PRELOAD=./libmalloc.so /usr/bin/grep -R "" /
zsh: segmentation fault (core dumped)

As you noticed, /usr/bin/grep segfaulted, most likely because of the sizing issue.

Fiat Stuctura

We know from the above that we need to store some metadata – but where ? You can’t really expect to call malloc in malloc. There are actually two ways of doing that : since you are in control of the allocation process, it is completely possible to prepend the segment with some data, or to build a separate structure to store it all. One of the most straight forward way to store and order the segments is by using a linked list, and I will first be covering this simple implementation.

Since we are going to prepend all of these to the actual segment, we also need a security to check if the pointer passed to free/realloc is actually valid; we will then add a pointer to the beginning of the data to the chunk metadata.

Thus, each memory chunk must at least “know” the following about itself :

  • Its size
  • Its predecessor
  • Its successor
  • If the space is marked free
  • The pointer to the data following the chunk

We finally have a structure of the sort :

1 struct chunk {
2     struct chunk *next, *prev;
3     size_t size;
4     int free;
5     void *data;
6 };

The only thing left would be to set up the data structure. You start off with a sentinel to keep track of the start of the heap and deal with ease with chunk removal, then for each call of malloc you increment the program break by a word-aligned value of sizeof(struct chunk) + size, where size is the parameter passed to malloc.

This implementation is overall a bit trickier due to the pointer arithmetic galore you need to handle – It’s easy to get things wrong and corrupt memory. These bugs will typically backfire and cause unexpected behavior at any point of the program, and hence are really hard to track. This is why I would advise to start slow and incrementally add the new parts while making sure everything’s working fine.

Improving the model

Since memory allocation with sbrk is basically pushing further the program break, we might want to avoid calling sbrk when the call is, in fact, not needed : instead of always claiming dynamic memory, we iterate through the chunk list to find the first free chunk with sufficient space, and if none found then we allow ourselves to call sbrk.

As we reuse free spaces, the must here would be to split the reused chunks if they are oversized and insert a new chunk right after. In the same manner, it would be a shame to let free chunks congregate without merging them into one big free chunk, as it might cause issues with performance for programs that need to allocate large amounts of memory, as they would spawn a lot of chunks.

The resulting code with all of these improvements can be found on the github project.

Beyond the call of malloc

Huzzah ! We managed to have a somewhat acceptable memory allocator !

We could consider that the objective is met, that we succeeded in our quest and shall return victorious with the head of the slain beast as a trophy. Now that wouldn’t be fun, wouldn’t it ?

Truth to be told, this implementation sucks. There are some big issues with performance, and performance with the memory manager is critical as it could spawn a significant amount of overhead for all programs.

Changing the structure

The first issue we have is that we use a linked list, and we constantly iterate through the list. This is excruciatingly slow, as in the worst case, you would need to iterate through the whole damn list before concluding that you need to allocate new memory; this might not really be an issue for small programs, but when there has been thousands of allocated variables, the performance would drop over time.

There are alternatives to the linked list, with among them :

Working with pages instead of the program break.

The program break is a lie.

The operating system manages memory using pages rather than a program break – what sbrk does is simply increment the program break, and if it crosses the boundary of the current page, it shall ask to the kernel for a new page. Since your program break is somewhere on the page, you may access memory beyond that position, given that you do not cross the page boundary.

Instead of using the program break, we might as well directly ask for pages to the system using mmap(2).

Thread safety

As mentionned by smcameron on reddit, this implementation is absolutely not thread safe and will lead to undefined behavior without any synchronization.

Conclusion

The choice of implementation remains with the programmer, as there are a lot of trade-offs to consider when choosing a particular strategy over another.

Writing a custom memory manager was fun, and I will recommend anyone with a bit of C experience to do the same, as it gives you a better understanding of the inner mechanisms of your system. I would not advise the production use of these unless you know what you are doing; chances are that the standard library has a much more efficient memory manager than your custom one. You might also check some other implementations such as tcmalloc (a really interesting one indeed !)

I shall now return to my cave and do… things. Until then, bye.

Addendum

Debugging with GDB

Debugging with GDB might be tricky if you don’t set it up properly. You just can’t do something like LD_PRELOAD=./libmalloc.so gdb, because, well, gdb WILL use malloc. Fortunately, GDB comes with the set exec-wrapper function :

$ gdb -q /bin/ls
Reading symbols from /bin/ls...(no debugging symbols found)...done.
(gdb) set exec-wrapper env 'LD_PRELOAD=./libmalloc.so'
(gdb) run -q -a
.  ..  libmalloc.so  Makefile  malloc.c  malloc.o
[Inferior 1 (process 25201) exited normally]
(gdb) quit

With this, you will be able to properly debug the library.

The memory layout of an UNIX process

Typically, a process under a UNIX operating system will take 4GB of virtual memory space (3GB for the user, 1GB for the kernel, although some kernels support the hugemem feature that extends to 4GB for both spaces). I will not cover in detail how the memory is layed out on these 4 gigs, and will focus mainly on the stack & heap.

The stack and the heap are memory sections belonging to the 3GB of user memory space – both will expand if necessary, although the stack will start from a high address in memory and expand downwards while the heap will start from a low address and will expand upwards. These sections are bounded by a delimiter, marking where the heap and stack ends. This delimiter is called the program break (or brk) for the heap and the top of the stack pointer (or %esp, from the assembly notation) for the stack.

For the visual people out there, here is a diagram showing a simplified layout.

Further readings