sujal.dev

Work in progress.

Creating a Window on Wayland from Scratch in Pure Python.

This post is still an unfinished draft.

I wanted to see what kind of wizardry graphics toolkits like Qt or GTK perform when they create a window on a Wayland desktop. It takes surprisingly little code to achieve the bare minimum goal of creating a window and drawing a static image to it. In this article, I will walk through everything I learned about summoning a window on Wayland, using nothing but Python's standard library.

The final code we'll end up writing by the end of this article is also available in its entirety here.

What does Wayland do?

I was confused about what role Wayland plays in the graphics stack. As an end user, all I ever had to do was use a graphics toolkit and a window would appear. But what I hadn't considered was other windows. Something must be compositing all the things visible on the screen (the panels, multiple windows, the desktop wallpaper, etc.) into a single image that you can display on a monitor. This is the key role a Wayland compositor fulfills on the desktop among other important things (like handling input).

Wayland Architecture

However, Wayland itself is not a compositor; it's the protocol that a conforming compositor and a client use to talk to one another. But how do you talk to a Wayland compositor?

Making a Connection to the Wayland Compositor

A client can use a Unix domain stream socket to communicate with a Wayland compositor. These are local sockets that allow processes on the same host to talk to one another efficiently (they have another trick up their sleeve you'll see later on). Let's start writing our first lines of code. We can use the socket module to create a Unix domain socket:

1
2
3
4
5
6
7
import socket


def setup_socket():
    sock = socket.socket(family=socket.AF_UNIX, type=socket.SOCK_STREAM, proto=0)

    return sock

This creates the socket, but we haven't yet made a connection to the Wayland compositor. To find the path required to connect to the compositor, we can do the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import os
import socket


def setup_socket():
    sock = socket.socket(family=socket.AF_UNIX, type=socket.SOCK_STREAM, proto=0)

    name = os.getenv("WAYLAND_DISPLAY", default="wayland-0")
    if not name.startswith("/"):
        xdg_runtime_dir = os.getenv("XDG_RUNTIME_DIR", default=f"/run/user/{os.getuid()}")
        name = f"{xdg_runtime_dir}/{name}"

    sock.connect(name)

    return sock

First, we check if the WAYLAND_DISPLAY environment variable is set. If not, we'll assume a default value of "wayland-0". Next, if the WAYLAND_DISPLAY variable is not an absolute path, we prepend the XDG_RUNTIME_DIR environment variable to this value, with a default value set to "/run/user/<user-id>". We have the path we need, so we call connect() on the socket we created earlier.

We are now ready to talk to the compositor. We just need to figure out what to say.

Learning How to Speak Wayland

Wayland is an object-oriented protocol. Every message we'll send or receive, to or from the server will be associated to an object. Objects have an interface associated with them which is defined in wayland.xml. This file can typically be found inside /usr/share/wayland if you have the necessary package installed (wayland-devel on Fedora).

The interface for each object defines requests a client can invoke on the object, and events that the server can emit for that object. Events need not be emitted in response to a request, for example if the user tries to resize your window the server will spontaneously emit an event to let you know. Each request and event can also define arguments, each with an associated type that we'll come back to later. This is what the corresponding XML looks like:

<protocol name="wayland">
    <interface name="some_object">
        <request name="some_request">
            <arg name="some_arg" type="some_primitive"/>
            ...
        </request>
        ...
        <event name="some_event">
            <arg name="some_arg" type="some_primitive"/>
            ...
        </event>
        ...
    </interface>
</protocol>

Wire Format

Our conversation with the server will be about creating objects, invoking requests and receiving events associated with a particular object, and eventually destroying that object. So we need a way to serialize this object talk into a byte stream we can send over the socket we created earlier. Here's what that looks like:

Wire Format

The above depicts the structure of a message in the wayland protocol. All fields in the message are aligned to 32-bit words which are represented in the host's byte order. The first field in the header is the object ID. It is a 32-bit unsigned integer that we assign to an object upon its creation. Both sides will make a note of this mapping to allow identifying objects with a number.

The next field in the header is another 32-bit unsigned integer split into two 16-bit parts. The first part is the size of the entire message (header + payload) in bytes. The second part is the opcode for either the request if the message originates from the client, or the event if it originates from the server. The opcode is implicitly defined by the order in which requests or events appear inside the object's interface in the XML file. The first request defined in an interface corresponds to the opcode 0, the next to 1 and so on. Do note that requests and events are indexed separately.

Let's create a class to store the header:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import os
import socket
from dataclasses import dataclass


@dataclass
class Header:
    obj_id: int
    opcode: int
    size: int = 0


...

We should also add a way to serialize this header into bytes we can send on the wire:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import os
import socket
import struct
import sys
from dataclasses import dataclass


@dataclass
class Header:
    obj_id: int
    opcode: int
    size: int = 0

    def serialize(self):
        size_and_opcode = (self.size << 16) | self.opcode
        return struct.pack("=II", self.obj_id, size_and_opcode)


...

We're first combining the size and opcode fields into a single 32-bit integer as described by the wire format. If you were to treat them as two distinct 16-bit words, you'd have to flip the order of size and opcode for little-endian hosts. Then, we use struct.pack to convert our Python integer objects into bytes we can send over the wire. The first argument to struct.pack is the format string, which exactly matches the header format we discussed earlier: "=" declares that we want the resulting bytes to use the host's byte order and the "II" defines two 32-bit unsigned integers.

Next we'll add a method to instantiate the Header class from bytes received on the wire:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import os
import socket
import struct
import sys
from dataclasses import dataclass
from io import BytesIO


@dataclass
class Header:
    ...

    @staticmethod
    def frombytes(data: BytesIO) -> "Header | None":
        data = data.read(8)

        if len(data) != 8:
            return None

        obj_id, opcode_and_size = struct.unpack("=II", data)
        opcode = opcode_and_size & 0xFFFF
        size = opcode_and_size >> 16

        return Header(obj_id, opcode, size)


...

We are reading 8 bytes (the size of a header) from the received data and interpreting each 32-bit word (4 bytes) as an unsigned integer. The first is the object ID, and the second packs both the size and the opcode, which we have to split. We can extract the opcode by using a bit-mask of 0xFFFF, which extracts the lower 16-bits. Then the size field can be extracted by bit shifting the entire 32-bit integer to the right by 16-bits.

Next, we should store this Header in a new Message class that stores both the header and the payload:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
...


@dataclass
class Message:
    header: Header
    payload: bytes

    def serialize(self):
        self.header.size = 8 + len(self.payload)
        return self.header.serialize() + self.payload


...

The payload consists of arguments for the particular request or event referenced in the header. Arguments are also aligned to 32-bits, padding is added wherever required. We will