curious termios error / switching to asyncio serial
My Python code that interfaces with a serial port stopped working when
refactoring the code to use asyncio. It started raising
Invalid argument
exceptions from tcsetattr(3)
. Why would
asynchronous Python misbehave? Was there a bug in serial_asyncio
?
TL;DR: When interfacing with an openpty(3)
pseudoterminal —
which I used to emulate a real serial port — setting parity and
bytesize is not supported. But an error would only show up when
tcsetattr(3)
was called twice, which happened only in the asyncio
case.
I was working on some Python code to communicate with an IEC62056-21 device over a serial bus. I planned on using the UART of a Raspberry PI for this purpose.
The basics involve opening a serial.Serial('/dev/ttyACM0', baudrate)
Python serial object, reading from it and writing to it. But, for
serial communication, there is more than just the baud rate which needs
to be agreed upon. For this IEC62056-21 device, it is:
- baud rate 300 (initially),
- 1 start bit (common default),
- 7 bits per byte (only 7 bits ASCII is spoken),
- 1 even parity bit (making sure all eight bits add up to a zero),
- 1 stop bit (common default).
The Python serial object supports all that:
ser = serial.Serial(
'/dev/ttyACM0', baudrate=300,
parity=serial.PARITY_EVEN,
bytesize=7)
That should work, and it does. But...
Testing serial code using an openpty pair
Developing against the real device is awkward and fragile. So, instead of developing against the hardware device directly, I decided it was worth the effort to emulate the IEC62056-21 device, a fake server if you will.
For this purpose, I needed a bridge between two serial devices. This can be done using socat:
$ socat -dd pty,rawer,link=server.dev pty,rawer,link=client.dev
2022/01/22 12:59:34 socat[700243] N PTY is /dev/pts/28
2022/01/22 12:59:34 socat[700243] N PTY is /dev/pts/29
2022/01/22 12:59:34 socat[700243] N starting data transfer loop with FDs [5,5] and [7,7]
socat spawns a bridge between two openpty(3)
created
pseudoterminals and creates two symlinks for easy access:
$ readlink client.dev server.dev
/dev/pts/29
/dev/pts/28
We can now connect our fake server to one end, and the client we're developing to the other.
(In my case, I'm using a custom serialproxy.py that does what socat does, and more, because it also simulates baud rate slowness and checks speed compatibility, relevant because the IEC62056-21 protocol uses negotiable/changing transfer rates.)
A simple test script connecting to both the server and client end might look as follows:
# For this example, run the following in
# another terminal (in the same directory):
# $ socat -dd pty,rawer,link=server.dev pty,rawer,link=client.dev
# WARNING: Restart this socat to reset bridge state.
import serial
import time
SERIAL_OPTIONS = {
'baudrate': 300,
'parity': serial.PARITY_EVEN,
'bytesize': 7,
'stopbits': 1,
}
def main_serial(write_dev, read_dev):
reader = serial.Serial(
read_dev, **SERIAL_OPTIONS)
#reader.timeout = 0
writer = serial.Serial(
write_dev, **SERIAL_OPTIONS)
messages = [
b'foo\n', b'bar\n', b'baz\n']
for msg in messages:
writer.write(msg)
print('send', msg)
out = reader.read(4)
print('recv', out)
assert msg == out, (msg, out)
time.sleep(1)
main_serial('server.dev', 'client.dev')
This example prints:
send b'foo\n'
recv b'foo\n'
send b'bar\n'
recv b'bar\n'
send b'baz\n'
recv b'baz\n'
That's nice. We can use serial.Serial()
without requiring a hardware
UART during development.
(However, if you run the same script again without restarting socat,
you'll be greeted with an Invalid argument
exception. We'll see why
in a bit. Restarting socat resets the bridge state and makes the
script work again.)
Testing asyncio serial code using an openpty pair
Because it is 2022, and I don't want to use threads in Python (they're a poor combo), and I wanted to be able to run multiple subsystems at once, I decided to convert this script to use Python Asynchronous I/O.
Surely converting this script to asyncio is easy:
# For this example, run the following in
# another terminal (in the same directory):
# $ socat -dd pty,rawer,link=server.dev pty,rawer,link=client.dev
# WARNING: Restart this socat to reset bridge state.
import asyncio
import serial
from serial_asyncio import open_serial_connection
SERIAL_OPTIONS = {
'baudrate': 300,
'parity': serial.PARITY_EVEN,
'bytesize': 7,
'stopbits': 1,
}
def main_aioserial(write_dev, read_dev):
loop = asyncio.get_event_loop()
main_coro = aioserial(write_dev, read_dev)
loop.run_until_complete(main_coro)
loop.close()
async def aioserial(write_dev, read_dev):
async def send(ser, msgs):
for msg in msgs:
ser.write(msg)
print('send (asyncio)', msg)
await asyncio.sleep(1)
async def recv(ser):
for i in range(3):
out = await ser.readuntil(b'\n')
print('recv (asyncio)', out)
reader, _ = await open_serial_connection(
url=read_dev, **SERIAL_OPTIONS)
_, writer = await open_serial_connection(
url=write_dev, **SERIAL_OPTIONS)
messages = [b'foo\n', b'bar\n', b'baz\n']
send_coro = send(writer, messages)
recv_coro = recv(reader)
await asyncio.wait({
asyncio.create_task(coro)
for coro in (send_coro, recv_coro)})
main_aioserial('server.dev', 'client.dev')
This should print the following:
send (asyncio) b'foo\n'
recv (asyncio) b'foo\n'
send (asyncio) b'bar\n'
recv (asyncio) b'bar\n'
send (asyncio) b'baz\n'
recv (asyncio) b'baz\n'
But instead, it prints:
Traceback (most recent call last):
File "test_aioserial.py", line 45, in <module>
main_aioserial('server.dev', 'client.dev')
File "test_aioserial.py", line 20, in main_aioserial
loop.run_until_complete(coro)
File "asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "test_aioserial.py", line 35, in aioserial
reader, _ = await open_serial_connection(
File "serial_asyncio/__init__.py", line 438, in open_serial_connection
transport, _ = yield from create_serial_connection(
File "asyncio/coroutines.py", line 124, in coro
res = func(*args, **kw)
File "serial_asyncio/__init__.py", line 412, in create_serial_connection
transport = SerialTransport(loop, protocol, ser)
File "serial_asyncio/__init__.py", line 63, in __init__
self._serial.timeout = 0
File "serial/serialutil.py", line 368, in timeout
self._reconfigure_port()
File "serial/serialposix.py", line 435, in _reconfigure_port
termios.tcsetattr(
termios.error: (22, 'Invalid argument')
This had me stumped for quite some time. Why would serial_asyncio
misbehave?
Debugging Python tcsetattr
I decided to add some tracing code in
/usr/lib/python3/dist-packages/serial/serialposix.py
:
...
# activate settings
if force_update or [iflag, oflag, cflag, lflag, ispeed, ospeed, cc] != orig_attr:
# vvv-- one line of code added --vvv
import os; os.write(2, b'DEBUG serialposix.py tcsetattr() 0o%o\n' % (cflag,))
termios.tcsetattr(
self.fd,
termios.TCSANOW,
[iflag, oflag, cflag, lflag, ispeed, ospeed, cc])
...
Running the synchronous code snippet now showed:
$ python3 test_serial.py
DEBUG serialposix.py tcsetattr() 0o4657
DEBUG serialposix.py tcsetattr() 0o4657
send b'foo\n'
...
That is, two calls because of the two Serial objects being set up.
The output of the asynchronous version was now:
$ python3 test_aioserial.py
DEBUG serialposix.py tcsetattr() 0o4657
DEBUG serialposix.py tcsetattr() 0o4647
Traceback (most recent call last):
...
termios.error: (22, 'Invalid argument')
Here, the second call to tcsetattr(3)
is done when setting the
timeout to 0 (to make it "non-blocking"). It's still setting up the
first of the two Serial objects.
If we look at the first example, we can enable the
reader.timeout = 0
line and notice how the synchronous version now
also chokes immediately.
So, it's not a serial_asyncio
problem. But it is still a problem.
An strace then? Calling strace will show us which system calls are
being executed. Maybe insight in their failure can shed some light on
this. After all, the tcsetattr(3)
library call will get translated to
one or more system calls:
$ strace -s 2048 python3 test_serial.py 2>&1 |
grep -A15 'write(.*DEBUG serialposix'
We know the call comes right after our write(2)
.
write(2, "DEBUG serialposix.py tcsetattr() 0o4657\n", 40DEBUG serialposix.py tcsetattr() 0o4657
) = 40
There it is. And then immediately some ioctl(2)
calls.
ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(3, SNDCTL_TMR_START or TCSETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TIOCMBIS, [TIOCM_DTR]) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(3, TCFLSH, TCIFLUSH) = 0
There was one error, but this is during the first call, so it is
ignored. (I can tell you now that those first four ioctl
calls are
all caused by a single Python termios.tcsetattr()
call.)
pipe2([4, 5], O_CLOEXEC) = 0
pipe2([6, 7], O_CLOEXEC) = 0
fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
Some other code was called, and then, the second call to tcsetattr(3)
:
write(2, "DEBUG serialposix.py tcsetattr() 0o4647\n", 40DEBUG serialposix.py tcsetattr() 0o4647
) = 40
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, SNDCTL_TMR_START or TCSETS, {B300 -opost -isig -icanon -echo ...}) = 0
ioctl(3, TCGETS, {B300 -opost -isig -icanon -echo ...}) = 0
And after this point in the trace, Python starts collecting values to show a backtrace.
Now that is unexpected. The last system call returned 0 (SUCCESS), yet Python considered it failed and reports "Invalid argument".
I'm certain Python is not making that up, so there must be some other explanation.
Checking the library sources
The CPython library tcsetattr function looks like:
static PyObject *
termios_tcsetattr_impl(PyObject *module, int fd, int when, PyObject *term)
...
if (cfsetispeed(&mode, (speed_t) ispeed) == -1)
return PyErr_SetFromErrno(state->TermiosError);
if (cfsetospeed(&mode, (speed_t) ospeed) == -1)
return PyErr_SetFromErrno(state->TermiosError);
if (tcsetattr(fd, when, &mode) == -1)
return PyErr_SetFromErrno(state->TermiosError);
...
Pretty straight forward.
The CPython function calls the GNU glibc tcsettatr library function, which looks like:
int
__tcsetattr (int fd, int optional_actions, const struct termios *termios_p)
...
return INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
}
One syscall. Only one! But why is Python turning this into an exception then?
The answer turned out to be in local-tcsetaddr.diff — a distro-specific patch deployed on Debian/Linux distributions and derivatives (like Ubuntu):
# All lines beginning with `# DP:' are a description of the patch.
# DP: Description: tcsetattr sanity check on PARENB/CREAD/CSIZE for ptys
# DP: Related bugs: 218131.
...
--- a/sysdeps/unix/sysv/linux/tcsetattr.c
+++ b/sysdeps/unix/sysv/linux/tcsetattr.c
@@ -75,7 +80,55 @@
memcpy (&k_termios.c_cc[0], &termios_p->c_cc[0],
__KERNEL_NCCS * sizeof (cc_t));
- return INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
+ retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
+
+ /* The Linux kernel silently ignores the invalid c_cflag on pty.
+ We have to check it here, and return an error. But if some other
+ setting was successfully changed, POSIX requires us to report
+ success. */
...
+ /* It looks like the Linux kernel silently changed the
+ PARENB/CREAD/CSIZE bits in c_cflag. Report it as an
+ error. */
+ __set_errno (EINVAL);
+ retval = -1;
+ }
+ }
+ return retval;
}
According to that patch, the relevant debian bug report 218131, and other relevant communication, setting invalid/unsupported c_flags on a pseudoterminal (pty) should return EINVAL. Linux apparently just ignores the bad c_flags.
POSIX tcsetattr needs to handle three conditions correctly:
- If all changes are successful, return success (0).
- If some changes are successful and some aren't, return success.
- If no changes are successful, return error (-1, errno=EINVAL).
The problem occurs when setting certain flags (PARENB, CREAD, or one of the CSIZE parameters) on a pty. The kernel silently ignores those settings, so libc is responsible for doing the right thing.
And indeed, if we remove the PARENB (enable even parity) and CS7 (7
bit CSIZE), the Invalid argument
error goes away:
SERIAL_OPTIONS = {
'baudrate': 300,
#DISABLED# 'parity': serial.PARITY_EVEN,
#DISABLED# 'bytesize': 7,
'stopbits': 1,
}
$ python3 test_serial.py
DEBUG serialposix.py tcsetattr() 0o4267
DEBUG serialposix.py tcsetattr() 0o4277
send b'foo\n'
...
And now we can run the script consecutive times without restarting socat.
Setting parity or bytesize on a pseudoterminal is a big no
Apparently, pseudoterminals, as opened by openpty(3)
, do not grok
setting different parity or bytesize. This was silently ignored by the
tcsetattr(3)
library call, as long as something else was altered at
the same time.
Setting the baud rate works fine, it is propagated across to the slave pty. But while testing we'll need to avoid setting unsupported parity or bytesize.
After figuring all that out, going back to the manpage of tcsetattr(3)
revealed this:
RETURN VALUE
Note that tcsetattr() returns success
if any of the requested changes could
be successfully carried out. Therefore,
when making multiple changes it may be
necessary to follow this call with a
further call to tcgetattr() to check
that all changes have been performed
successfully.
So, Python might be setting multiple values at once. And, maybe, just
maybe, Python should then always call tcsetattr(3)
twice, like the
manpage suggests. That would cause consistent behaviour. And that
would've saved me hours of debugging as I would've identified the
custom SERIAL_OPTIONS
as culprits so much sooner.