A simple algorithm for drawing complex tables

I was surprised to see, during the development of rst2ansi, that there were no simple python modules to pretty-print complex tables with ascii characters.

Most of them only supported simple tables with uni-line cells, and God forbid you want to print cells that span multiple columns or rows.

I went on and made a little algorithm to help me draw these tables.

The algorithm

We want to draw an arbitrary table, for instance:

+-----+-----+-----+-----+
|  A  |  B  |  C  |  D  |
+-----+-----+-----+-----+
|  E  |        F        |
+-----+-----+-----------+
|  G  |     |           |
+-----+  I  |     J     |
|  H  |     |           |
+=====+=====+===========+

The first thing we need to define is what exactly constitute a row in this table, and a traversal order for each cell:

Row 1
Row 2
Row 3
Row 4

1	2	3	4
5	6
7	8	9
10

Once done, we can straight-forwardly draw the table by iterating over the cells using the above order:

Draw the top and left table borders
For each cell draw its bottom and right borders.

Here is a step-by-step visualisation of the process for the above table:

Step 1

Step 4
A	B	C

Step 7
A	B	C	D
E	F

Step 10
A	B	C	D
E	F
G	I	J

Step 2
A

Step 5
A	B	C	D

Step 8
A	B	C	D
E	F
G

Step 11
A	B	C	D
E	F
G	I	J
H

Step 3
A	B

Step 6
A	B	C	D
E

Step 9
A	B	C	D
E	F
G	I

Step 12
A	B	C	D
E	F
G	I	J
H

(Note that this algorithm could be used to draw tables on any medium, not just for ascii pretty-printing.)

It is immediately apparent that for this algorithm to work, it needs precise information on the table:

It needs, of course, the number of rows and columns and the contents of the cells,
It needs the layout of each cell, i.e. which cells spans multiple rows and/or columns.
It needs the height of each row and the width of each column.

This means that we have to define a proper data structure to represent these informations.

The implementation

Setting up the stage

The first thing we have to define is the way to represent our data. Given a table like this:

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | Cells may span columns.          |
+------------------------+------------+---------------------+
| body row 3             | Cells may  | Cells may span both |
+------------------------+ span rows. | rows and columns.   |
| body row 4             |            |                     |
+========================+============+=====================+

What would be the best way to represent it?

One could think at first to use multi-dimensional list, but we get hit in the face when trying to deal with cells spaning multiple columns and rows.

It turns out the best format to represent a table with, albeit a little (and by that I mean “lot”) verbose, is to use a tree:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TABLE = {
    'node': 'table',
    'children': [
      {
        'node': 'head',
        'children': [
            {
              'node': 'row',
              'children': [
                {'node': 'cell', 'data': 'Header row, column 1\n(header rows optional)'},
                {'node': 'cell', 'data': 'Header 2'},
                {'node': 'cell', 'data': 'Header 3'},
                {'node': 'cell', 'data': 'Header 4'},
              ]
            }
          ]
      },
      {
        'node': 'body',
        'children': [
          {
            'node': 'row',
            'children': [
              {'node': 'cell', 'data': 'body row 1, column 1'},
              {'node': 'cell', 'data': 'column 2'},
              {'node': 'cell', 'data': 'column 3'},
              {'node': 'cell', 'data': 'column 4'},
            ]
          },
          {
            'node': 'row',
            'children': [
              {'node': 'cell', 'data': 'body row 2'},
              {'node': 'cell', 'data': 'Cells may span columns.', 'morecols': 2},
            ],
          },
          {
            'node': 'row',
            'children': [
              {'node': 'cell', 'data': 'body row 3'},
              {'node': 'cell', 'data': 'Cells may span rows.', 'morerows': 1},
              {'node': 'cell', 'data': 'Cells may span both rows and columns.', 'morerows': 1, 'morecols': 1},
            ],
          },
          {
            'node': 'row',
            'children': [
              {'node': 'cell', 'data': 'body row 4'},
            ],
          }
        ]
      }
    ]
  }

This doesn’t really look easy to make, but it matters not: we will only use this format as an intermediate structure, meant to be generated from another source.

Looks familiar? It just so happens that this structure looks a little bit like the one HTML uses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<table>
  <thead>
    <tr>
      <td>Header row, column 1<br>(header rows optional)</td>
      <td>Header 2</td>
      <td>Header 3</td>
      <td>Header 4</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>body row 1, column 1</td>
      <td>column 2</td>
      <td>column 3/td>
      <td>column 4/td>
    </tr>
    <tr>
      <td>body row 2</td>
      <td colspan="3">Cells may span columns.</td>
    </tr>
    <tr>
      <td>body row 3</td>
      <td rowspan="2">Cells may span rows.</td>
      <td rowspan="2" colspan="2">Cells may span both rows and columns.</td>
    </tr>
    <tr>
      <td>body row 4</td>
    </tr>
  </tbody>
</table>

We could make a function to transform these HTML tables into our intermediate format, for instance.

Sizing the table up

The other data we need for this algorithm to work is the dimensions of each row and columns.

The implementation of this function depends on the use case and the strategy you want. Some possible ideas:

Find the width of a column:
- Use a fixed width
- For each column, use the mean of widths of all cell contents in this specific column
- Use the length of the column label
Find the height of a row:
- Use the maximum number of lines the contents of each cell in a row can fill.

Implementing the algorithm

The major component of the algorithm is the iteration over the cells. Since we defined a tree-like data structure, a good iteration technique is walking over each node with a visitor.

We start by defining a Visitor class that implements visit. visit is a method that calls 'visit_' + node_name on the node, visits all of its children, then calls 'depart_' + node_name.

We also define a SkipChildren exception, which, if raised by a visit_ function, will tell the visitor not to visit the children of the current node.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class SkipChildren(Exception):
  pass

class Visitor(object):

  def visit(self, node):
    """
    Visit a generic node. Calls 'visit_' + node_name on the node,
    then visit its children, then calls 'depart_' + node_name.
    """
    def noop(node):
      pass

    try:
      visit_fn = getattr(self, 'visit_' + node['node'])
    except AttributeError:
      visit_fn = noop

    try:
      depart_fn = getattr(self, 'depart_' + node['node'])
    except AttributeError:
      depart_fn = noop

    try:
      visit_fn(node)
      for n in node.get('children', []):
        self.visit(n)
    except SkipChildren:
      return
    finally:
      depart_fn(node)

With our visitor class, we can finally define our table printer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# Note: for the sake of this article's brevity, the code below
# is actually python pseudocode. A working implementation can
# be found in the git repository linked at the end of this article.

class TablePrinter(Visitor):

  def __init__(self):
    self.row = 0
    self.col = 0

  def visit_table(self, node):
    _draw_initial_table(node['colspec'], node['rowspec'])

  def visit_row(self, node):
    self.col = 0

  def depart_row(self, node):
    self.row += 1

  def visit_cell(self, node):
    # Get position & dimensions

    pos = self.row, self.col
    dim = self._get_cell_dimensions(node, pos)

    spanned_cols, spanned_rows, width, height = dim

    # Draw the cell

    self._draw_bottom_border(dim, pos, node)
    self._draw_right_border(dim, pos, node)

    self._draw_cell_contents(dim, pos, node)

    # Update the position & stop recursing

    self.col += spanned_cols
    raise SkipChildren

  def _get_cell_dimensions(self, node, pos):
    """
    Returns the number of columns and rows this cell spans,
    and its width in character columns and height in lines.

    Args:
        node: the node of the cell.
        pos: the row and column indices of this cell.
    """
    raise NotImplementedError

  def _draw_initial_table(self, widths, heights):
    """
    Draw the leftmost and upmost borders of the table,
    and fills the defined rectangle with spaces.
    Args:
        widths: the widths of each columns
        heights: the heights of each rows
    """
    total_width = sum(widths) + (len(widths) + 1)
    total_height = sum(heights) + (len(heights) + 1)

    raise NotImplementedError

  def _draw_bottom_border(self, dim, pos, cell):
    """
    Draw a bottom border for this cell
    Args:
        dim: the dimensions of the cell
        pos: the position of the cell in the table
        cell: the cell properties
    """
    raise NotImplementedError

  def _draw_right_border(self, dim, pos, cell):
    """
    Draw a bottom border for this cell
    Args:
        dim: the dimensions of the cell
        pos: the position of the cell in the table
        cell: the cell properties
    """
    raise NotImplementedError

  def _draw_right_border(self, dim, pos, cell):
    """
    Draw the contents of this cell
    Args:
        dim: the dimensions of the cell
        pos: the position of the cell in the table
        cell: the cell properties
    """
    raise NotImplementedError

Wrapping up

I published on github an ascii table renderer using this algorithm as a proof of concept here. The code isn’t perfect, and needs to be cleaned up a bit, but should be overall understandable.

This algorithm is simple enough for purposes where there are no border styles. For broader ones, one could either change this algorithm to make it overwrite previous borders, or use the guidelines as described in the official w3c document on CSS2 properties of HTML tables.