A simple algorithm for drawing complex tables
I was surprised to see, during the development of rst2ansi, that there were no simple python modules to pretty-print complex tables with ascii characters.
Most of them only supported simple tables with uni-line cells, and God forbid you want to print cells that span multiple columns or rows.
I went on and made a little algorithm to help me draw these tables.
The algorithm
We want to draw an arbitrary table, for instance:
+-----+-----+-----+-----+
| A | B | C | D |
+-----+-----+-----+-----+
| E | F |
+-----+-----+-----------+
| G | | |
+-----+ I | J |
| H | | |
+=====+=====+===========+
The first thing we need to define is what exactly constitute a row in this table, and a traversal order for each cell:
Row 1 | |||
Row 2 | |||
Row 3 | |||
Row 4 |
1 | 2 | 3 | 4 |
5 | 6 | ||
7 | 8 | 9 | |
10 |
Once done, we can straight-forwardly draw the table by iterating over the cells using the above order:
- Draw the top and left table borders
- For each cell draw its bottom and right borders.
Here is a step-by-step visualisation of the process for the above table:
A | B | C | |
A | B | C | D |
E | F | ||
A | B | C | D |
E | F | ||
G | I | J | |
A | |||
A | B | C | D |
A | B | C | D |
E | F | ||
G | |||
A | B | C | D |
E | F | ||
G | I | J | |
H |
A | B | ||
A | B | C | D |
E | |||
A | B | C | D |
E | F | ||
G | I | ||
A | B | C | D |
E | F | ||
G | I | J | |
H |
(Note that this algorithm could be used to draw tables on any medium, not just for ascii pretty-printing.)
It is immediately apparent that for this algorithm to work, it needs precise information on the table:
- It needs, of course, the number of rows and columns and the contents of the cells,
- It needs the layout of each cell, i.e. which cells spans multiple rows and/or columns.
- It needs the height of each row and the width of each column.
This means that we have to define a proper data structure to represent these informations.
The implementation
Setting up the stage
The first thing we have to define is the way to represent our data. Given a table like this:
+------------------------+------------+----------+----------+
| Header row, column 1 | Header 2 | Header 3 | Header 4 |
| (header rows optional) | | | |
+========================+============+==========+==========+
| body row 1, column 1 | column 2 | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2 | Cells may span columns. |
+------------------------+------------+---------------------+
| body row 3 | Cells may | Cells may span both |
+------------------------+ span rows. | rows and columns. |
| body row 4 | | |
+========================+============+=====================+
What would be the best way to represent it?
One could think at first to use multi-dimensional list, but we get hit in the face when trying to deal with cells spaning multiple columns and rows.
It turns out the best format to represent a table with, albeit a little (and by that I mean “lot”) verbose, is to use a tree:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TABLE = {
'node': 'table',
'children': [
{
'node': 'head',
'children': [
{
'node': 'row',
'children': [
{'node': 'cell', 'data': 'Header row, column 1\n(header rows optional)'},
{'node': 'cell', 'data': 'Header 2'},
{'node': 'cell', 'data': 'Header 3'},
{'node': 'cell', 'data': 'Header 4'},
]
}
]
},
{
'node': 'body',
'children': [
{
'node': 'row',
'children': [
{'node': 'cell', 'data': 'body row 1, column 1'},
{'node': 'cell', 'data': 'column 2'},
{'node': 'cell', 'data': 'column 3'},
{'node': 'cell', 'data': 'column 4'},
]
},
{
'node': 'row',
'children': [
{'node': 'cell', 'data': 'body row 2'},
{'node': 'cell', 'data': 'Cells may span columns.', 'morecols': 2},
],
},
{
'node': 'row',
'children': [
{'node': 'cell', 'data': 'body row 3'},
{'node': 'cell', 'data': 'Cells may span rows.', 'morerows': 1},
{'node': 'cell', 'data': 'Cells may span both rows and columns.', 'morerows': 1, 'morecols': 1},
],
},
{
'node': 'row',
'children': [
{'node': 'cell', 'data': 'body row 4'},
],
}
]
}
]
}
This doesn’t really look easy to make, but it matters not: we will only use this format as an intermediate structure, meant to be generated from another source.
Looks familiar? It just so happens that this structure looks a little bit like the one HTML uses:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<table>
<thead>
<tr>
<td>Header row, column 1<br>(header rows optional)</td>
<td>Header 2</td>
<td>Header 3</td>
<td>Header 4</td>
</tr>
</thead>
<tbody>
<tr>
<td>body row 1, column 1</td>
<td>column 2</td>
<td>column 3/td>
<td>column 4/td>
</tr>
<tr>
<td>body row 2</td>
<td colspan="3">Cells may span columns.</td>
</tr>
<tr>
<td>body row 3</td>
<td rowspan="2">Cells may span rows.</td>
<td rowspan="2" colspan="2">Cells may span both rows and columns.</td>
</tr>
<tr>
<td>body row 4</td>
</tr>
</tbody>
</table>
We could make a function to transform these HTML tables into our intermediate format, for instance.
Sizing the table up
The other data we need for this algorithm to work is the dimensions of each row and columns.
The implementation of this function depends on the use case and the strategy you want. Some possible ideas:
- Find the width of a column:
- Use a fixed width
- For each column, use the mean of widths of all cell contents in this specific column
- Use the length of the column label
- Find the height of a row:
- Use the maximum number of lines the contents of each cell in a row can fill.
Implementing the algorithm
The major component of the algorithm is the iteration over the cells. Since we defined a tree-like data structure, a good iteration technique is walking over each node with a visitor.
We start by defining a Visitor
class that implements visit
.
visit
is a method that calls 'visit_' + node_name
on the node,
visits all of its children, then calls 'depart_' + node_name
.
We also define a SkipChildren
exception, which, if raised by a visit_
function, will tell the visitor not to visit the children of the current node.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class SkipChildren(Exception):
pass
class Visitor(object):
def visit(self, node):
"""
Visit a generic node. Calls 'visit_' + node_name on the node,
then visit its children, then calls 'depart_' + node_name.
"""
def noop(node):
pass
try:
visit_fn = getattr(self, 'visit_' + node['node'])
except AttributeError:
visit_fn = noop
try:
depart_fn = getattr(self, 'depart_' + node['node'])
except AttributeError:
depart_fn = noop
try:
visit_fn(node)
for n in node.get('children', []):
self.visit(n)
except SkipChildren:
return
finally:
depart_fn(node)
With our visitor class, we can finally define our table printer:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# Note: for the sake of this article's brevity, the code below
# is actually python pseudocode. A working implementation can
# be found in the git repository linked at the end of this article.
class TablePrinter(Visitor):
def __init__(self):
self.row = 0
self.col = 0
def visit_table(self, node):
_draw_initial_table(node['colspec'], node['rowspec'])
def visit_row(self, node):
self.col = 0
def depart_row(self, node):
self.row += 1
def visit_cell(self, node):
# Get position & dimensions
pos = self.row, self.col
dim = self._get_cell_dimensions(node, pos)
spanned_cols, spanned_rows, width, height = dim
# Draw the cell
self._draw_bottom_border(dim, pos, node)
self._draw_right_border(dim, pos, node)
self._draw_cell_contents(dim, pos, node)
# Update the position & stop recursing
self.col += spanned_cols
raise SkipChildren
def _get_cell_dimensions(self, node, pos):
"""
Returns the number of columns and rows this cell spans,
and its width in character columns and height in lines.
Args:
node: the node of the cell.
pos: the row and column indices of this cell.
"""
raise NotImplementedError
def _draw_initial_table(self, widths, heights):
"""
Draw the leftmost and upmost borders of the table,
and fills the defined rectangle with spaces.
Args:
widths: the widths of each columns
heights: the heights of each rows
"""
total_width = sum(widths) + (len(widths) + 1)
total_height = sum(heights) + (len(heights) + 1)
raise NotImplementedError
def _draw_bottom_border(self, dim, pos, cell):
"""
Draw a bottom border for this cell
Args:
dim: the dimensions of the cell
pos: the position of the cell in the table
cell: the cell properties
"""
raise NotImplementedError
def _draw_right_border(self, dim, pos, cell):
"""
Draw a bottom border for this cell
Args:
dim: the dimensions of the cell
pos: the position of the cell in the table
cell: the cell properties
"""
raise NotImplementedError
def _draw_right_border(self, dim, pos, cell):
"""
Draw the contents of this cell
Args:
dim: the dimensions of the cell
pos: the position of the cell in the table
cell: the cell properties
"""
raise NotImplementedError
Wrapping up
I published on github an ascii table renderer using this algorithm as a proof of concept here. The code isn’t perfect, and needs to be cleaned up a bit, but should be overall understandable.
This algorithm is simple enough for purposes where there are no border styles. For broader ones, one could either change this algorithm to make it overwrite previous borders, or use the guidelines as described in the official w3c document on CSS2 properties of HTML tables.