rst tables with htmldjango / emoji two columns wide
For a project, we're using Django to generate a textual report. For readability, it is in monospace text. And we've done it in reStructuredText (RST) so we can generate an HTML document from it as well.
A table in RST might look like this:
+-----------+-------+
| car brand | users |
+===========+=======+
| Peugeot | 2 |
+-----------+-------+
| Saab | 1 |
+-----------+-------+
| Volvo | 4 |
+-----------+-------+
Transforming this to HTML with a rst2html(1) generates a table similar to this:
<table class="docutils" border="1">
<colgroup><col width="61%"><col width="39%"></colgroup>
<thead valign="bottom">
<tr><th class="head">car brand</th><th class="head">users</th></tr>
</thead>
<tbody valign="top">
<tr><td>Peugeot</td><td>2</td></tr>
<tr><td>Saab</td><td>1</td></tr>
<tr><td>Volvo</td><td>4</td></tr>
</tbody>
</table>
We can generate such a simple RST table using the Django template engine. Let the following Python list be the input:
cars = [('Peugeot', 2), ('Saab', 1), ('Volvo', 4)]
And as Django template, we'll use this:
+-----------+-------+
| car brand | users |
+===========+=======+
{% for car in cars %}| {{ car.0|ljust:9 }} | {{ car.1|rjust:5 }} |
+-----------+-------+
{% endfor %}
Sidenote, when generating text instead of html, we'd normally start
the Django template with {% autoescape off %}
. Elided here for
clarity.
Working example
Here's a working Python3 snippet, including a setup() hack so we can skip Django setup that would just clutter this example:
# Quick and dirty Django setup; tested with Django 2.1
import django.conf
django.conf.settings.configure(
DEBUG=True, TEMPLATES=[{
'BACKEND': 'django.template.backends.django.DjangoTemplates'}])
django.setup()
# Setting up the content
cars = [('Peugeot', 2), ('Saab', 1), ('Volvo', 4)]
TEMPLATE = '''\
+-----------+-------+
| car brand | users |
+===========+=======+
{% for car in cars %}| {{ car.0|ljust:9 }} | {{ car.1|rjust:5 }} |
+-----------+-------+
{% endfor %}'''
# Rendering the table
from django.template import Context, Template
tpl = Template(TEMPLATE)
context = Context({'cars': cars})
print(tpl.render(context), end='')
But now, let's say we wanted to add some emoji's. Like a 🏆 :trophy:
to the highest number. Because, in a big list, having some color can be
tremendously useful to direct attention to where it's due.
We'll replace the numbers with strings, optionally including an emoji:
cars = [('Peugeot', '2'), ('Saab', '1'), ('Volvo', '\U0001F3C6 4')]
Rerun, and we get this:
+-----------+-------+
| car brand | users |
+===========+=======+
| Peugeot | 2 |
+-----------+-------+
| Saab | 1 |
+-----------+-------+
| Volvo | 🏆 4 |
+-----------+-------+
Interesting... that one trophy character is taking up room for two.
You might be thinking that is just the display. But rst2html(1) agrees that this is wrong:
$ python3 cars.py | rst2html - cars.html
cars.rst:1: (ERROR/3) Malformed table.
So, what is the cause of this?
Emoji have East Asian width
On the Unicode section of the halfwidth and fullwidth forms page on Wikipedia we can read the following:
Unicode assigns every code point an "East Asian width" property.
[...
W
for naturally wide characters e.g. Japanese Hiragana ...
...Na
for naturally narrow characters, e.g. ISO Basic Latin ...]Terminal emulators can use this property to decide whether a character should consume one or two "columns" when figuring out tabs and cursor position.
And in the Unicode 12 standard, Annex #11 it reads:
In modern practice, most alphabetic characters are rendered by variable-width fonts using narrow characters, even if their encoding in common legacy sets uses multiple bytes. In contrast, emoji characters were first developed through the use of extensions of legacy East Asian encodings, such as Shift-JIS, and in such a context they were treated as wide characters. While these extensions have been added to Unicode or mapped to standardized variation sequences, their treatment as wide characters has been retained, and extended for consistency with emoji characters that lack a legacy encoding.
In short:
- characters can be narrow or wide (with some exceptions);
- emoji evolved from East Asian encodings;
- emoji are wide, in constrast to "normal" European characters.
Solving the justification
With that knowledge, we now know why the table is wrongly dimensioned
around the emoji. The rjust
counts three characters and adds 2
spaces. But it should count four columns (one wide emoji, a (narrow)
space and a (narrow) digit).
Luckily the Python Standard Library has the necessary prerequisites. We add this function:
from unicodedata import east_asian_width
def column_width(s):
"""Return total column width of the string s, taking into account
that some unicode characters take up two columns."""
return sum(column_width.widths[east_asian_width(ch)] for ch in s)
column_width.widths = {'Na': 1, 'H': 1, 'F': 2, 'W': 2, 'N': 2, 'A': 1}
While len('\U0001F3C6 4')
returns 3
, column_width('\U0001F3C6 4')
returns 4
.
All we have to do is create a new filter and apply it:
# By re-using the register from defaultfilters, we're adding it into
# the builtin defaults.
from django.template.defaultfilters import register, stringfilter
@register.filter(is_safe=True)
@stringfilter
def unicode_rjust(value, arg):
return value.rjust(int(arg) - (column_width(value) - len(value)))
Use the new unicode_just
filter:
TEMPLATE = '''\
+-----------+-------+
| car brand | users |
+===========+=======+
{% for car in cars %}| {{ car.0|ljust:9 }} | {{ car.1|unicode_rjust:5 }} |
+-----------+-------+
{% endfor %}'''
Result:
+-----------+-------+
| car brand | users |
+===========+=======+
| Peugeot | 2 |
+-----------+-------+
| Saab | 1 |
+-----------+-------+
| Volvo | 🏆 4 |
+-----------+-------+
It looks good and rst2html is now also happy to convert.