godot-screenreader
[Source Code] - [Download] - [Documentation]
Hey there, everyone!
I’ve been working on the start to a really cool project in the last few weeks that I wanted to show off. I know, I promised my next video would be more theoretically oriented but this is a really cool little thing. My next video will be on Pokémon and Gender, which I’m positive will not lead to any sort of negative feedback or upset comments. In the meantime, I want to show you what I built!
Huh?
Yeah, this is a tool called a “Screenreader” for the Godot game engine. Godot is a popular open source option for making video games, but Godot is infamous for having poor implementation of both Control elements and any baseline accessibility (seriously, the IDE is still not very compatible with screenreaders, although some recent work might be improving the situation). But what is a screenreader anyways? It’s a tool that reads off visual content on the screen to users and explains what it is, and organizes it in a way that makes sense to the user. This way, users who are low vision or have other kinds of disabilities that might make it hard for them to interact with a user interface like in a video game a bit easier to use. Basing my work off of a tool made by LightsOutGaming for Godot 3.x and an updated version by rodolpheh, I wanted to take his interface and build a much more customizable and developer and user-friendly screenreader that takes advantage of a few tricks I learned with Godot to better wrangle in its… questionable user interface. A strong motivation for this project is to improve the structural framework that LightsOutGaming and rodolpheh developed, while also providing an easier to use interface for developers, since with the previous implementations, adding it to your game could take quite a bit of work.
Why Develop a Screenreader From Scratch?
The way this script is designed, it doesn’t just call a screenreader which might be installed on the system, but rather builds its own interface that imitates the behavior of a screenreader. Now, those who work in web or application design may be wondering why I had to build a screenreader from scratch to install into a video game engine. That’s a very good question! There are a few reasons for this.
First, interface design in game development does not necessarily use an object structure. In fact, many times, items are highly customized with unique behaviors to fit the interface of the game. Unlike developing for, say, a program that processes tickets or navigates a database, a video game is a very open ended design question. This is part of the reason why many game developers can be hostile to embedded accessible architecture and accessible design practices that “take care of everything” than other kinds of developers - part of the expression of the game’s narrative and artistic representation is embedded in the user interface, and there needs to be some freedom and flexibility in its design. So, to adapt to game developer’s needs, it’s a good idea to integrate the screenreader’s interface at least somewhat in the engine itself, so it can be controlled by a developer. Not only this, but integrating the screenreader with the game engine allows the screenreader to be customized to process new kinds of Controls easily. Being in the game engine’s scripting language itself also makes it much easier to modify, as opposed to something like designing a plugin for every game to work with screenreaders. It’s already in a format the developer understands, so they don’t need to spend a lot of time researching the OS screenreaders directly.
Additionally, Godot’s user interface implementation leaves a lot to be desired, to say the least. It would be very difficult to make a 1-1 relationship between well recognized Control elements and Godot’s implementation of these elements, which makes an intermediate interface still necessary. For example, the MenuBar
Control does not keep track of selected index of what menu is currently open, outside of the visibility of its chidren. In fact, MenuBar
does not seem to have native keyboard functionality, but the dropdown menus it uses do. Another oddity is how LineEdit
, which is a single line of text, has no relation to TextEdit
, which is capable of multi-line edit, or how LinkButton
or TextureButton
are different than all other kinds of button, and share mutual characteristics through BaseButton
. Another example: MenuButton
and OptionButton
have similar user interface functions, to open a menu to select items, even if their use cases are different - but they have completely different signals. Everything is very inconsistent, so not only do you have to deal with high levels of diversion from the initial behavior, but you also have to deal with Godot’s inconsistent behavior as well.
The worst offender however is how Godot handles focus - which has been a thorn in my side ever since starting to use Godot. If you are not aware, Godot forces you to use a particular way of managing moving through nodes on the screen, unless you completely disable it. Instead of constructing an internal tree of Control nodes to navigate, it instead builds this completely fucked up linked list model which only grants access to the next nodes available. Not only this, but the functionality for this interaction is completely impossible to override or turn off. You can’t just override this behavior in Controls - that is, unless you turn the focus system completely off, which also disables any theme changes for being pressed, etc. An added bonus is that Godot gives no way to toggle the theme style used by the Control, which requires Controls to be redrawn manually in order to allow for these theme changes! This is a huge pain in the ass because in order for the screenreader to not interfere with user-implemented code, I have to contain all custom code within the screenreader itself - I can’t just force developers to use extension scripts just because Godot’s UI implementation is written like shit. In some cases, such the infamous MenuBar
Control, it is just plain impossible to completely override full control even despite my best efforts, and I had to design the UI around it - which, annoyingly, doesn’t even match the behavior the end user is expecting. High Contrast themes also have to deal with stupid design choices that make it impossible without violating this design restriction that I can’t override a control’s draw functions, leading to certain situations where the text is just plain not visible. This is a bug I’ll have to figure out in a future version. All in all, Godot’s management of user interface is just plain embarrassing, but its a very popular engine with open source users, so I guess someone has to wrestle that pig to put some lipstick on it. An interface to consolidate this erratic and unpredictable behavior into a logical way the end user expects is therefore necessary, to make it at least somewhat sane for something like a screenreader to navigate.
Finally, it’s best to have some implementation that works, and then work towards more difficult to implement tasks. After all, this is just a prototype. To be abundantly clear, full screenreader integration is a goal in the future, but having this topology first helps tremendously, because now there is a way to organize Control elements into a DOM-like structure, and there’s a way to control how it interfaces with input devices like controllers in the future. This allows me as a developer to have the scaffolding to then build the more technical parts of screenreader integration down the line. Plus, its better to have an asset that works in its core function that provides some usability to some players, and extend it later so that developers can just upgrade their systems and support more players. We can add support for libraries like Tolk later now that a prototype is functional - and since it is an open source project, contributions from others is incredibly valuable and has potential to extend and optimize the project.
So, in short, I designed it this way so that I could balance the design challenges of building an easy-to-use interface for Godot game developers with the continued development and improvement of in-game screenreader development and/or compatibility, while providing an important base for an open-source asset that is easy for anyone to use, targeting a popular open source game engine.
It’s very important that this project is free and open source - the impact of FOSS software on accessibility cannot be understated. As an example, JAWS by Freedom Scientific was and continues to be the most popular commercial Windows screenreader, but it’s pretty expensive to use and you are only limited to a 40 minute trial version without the license. Additionally, it historically has been expensive to develop for because of similar licensing restrictions. NVDA, a free and open source community-developed screenreader, on the other hand, is much more financially accessible to end users, because they just have to download a copy and install it to be able to freely use a computer. As a result, NVDA completely transformed the topology of blind accessibility since its introduction in 2006, both on the computers themselves but also as a critical social-political relationship that greatly improved the lives of blind people in developing nations, as well as making development for accessibility far more financially accessible for smaller developers. Seriously, if there’s one TEDx Talk you should ever watch, its the one of Michael Curran and James Teh discussing the social impact of NVDA. However, as it stands, many accessibility solutions, especially for video games, are still often not FOSS software, and those that are can be quite complicated to learn for someone looking to quickly add support in their projects, so introducing the open source value into these software ecosystems is critically important for the expansion of accessibility. Open source solutions also lower the barrier to entry for disabled people developing accessibility for themselves, which is an ultimate goal for this project.
Implementation
Now with all of that out of the way, we can discuss the design of the screenreader itself.
This actually isn’t the first attempt at a screenreader in Godot. LightsOutGames, an audiogame developer, developed his first screenreader for Godot 3.x around 2018. This screenreader was a really rough attempt to make the Editor more accessible since the Editor and even compilation were completely inaccessible without a plugin - note to game engine developers, there should always be a way to build and construct projects with just text editors and a command line. This plugin, like my own screenreader, doesn’t connect directly to the screenreader for organizing Controls, but reads strings through the screenreader instead, while navigating through Godot. Unlike my screenreader though, it does not attempt to override the function of the UI, and instead uses the in-game focus system to navigate. It is incomplete and only offers limited baseline compatibility. It was constructed both into a usable TTS kit (godot-tts) and an accessibility plugin (godot-accessibility). Later, github user rodolpheh created a fork that is compatible with Godot 4.x, but removes the direct screenreader connections and simply uses Godot’s embedded TTS functions. This is partly because Godot completely revamped how extensions like DLLs are integrated, which is something I still need to research (and another reason why the OS screenreader is not fully compatible with mine yet), but LightsOutGaming’s original work cannot just be directly imported.
What’s important to understand here is that a screenreader doesn’t just read text on the screen. It also makes selecting and navigating UI elements on a screen completely transformed into a new, more controlled format that allows for specialized players to navigate it. This way, players know exactly what element is selected and exactly how to navigate it, even without vision - but it could be useful for all sorts of situations potentially. This basically means that when interacting with the keyboard, Controls will be focused individually - this way, when I interact with inputs like the select or increment/decrement keys, it stays within the control and I know what Control I’m changing. I can use the up and down keys by default to navigate between controls. What’s nice is that it only navigates between controls that are currently visible, so if I use a control that changes the visibility of other controls, such as by scrolling through tabs, the nodes I can access changes to match what is available to sighted players.
So how does someone implement that in Godot anyways?
First, its important to build what Godot doesn’t - a sane navigation tree for Control nodes. Not all interfaces actually work well with navigation trees, so its important to allow for some flexibility here. So instead of trying to predict exactly all cases when and where developers will need the screenreader, I instead give them a one-line function that takes the root Control node as its argument. The idea is that by passing this root Control node, they only process nodes that are relevant to typical screen navigation. Using this function will recursively dig through the nodes and construct a tree-like structure that allows the nodes to be organized into various topologies that can later be navigated by the screenreader.
To isolate the screenreader functionality both as much as possible from the developer - thus reducing how many things the developer depends on to make it work - I had to make individual, separated functions to manage everything related to the screenreader system. They still work within the game’s engine, but the way the scripts are designed makes it so that their behavior is self contained. All the sound effects, themes and other assets used are all contained completely within the screenreader addon, and the developer doesn’t need to use any extra assets to make some baseline functionality work with the screenreader - however, they can use scripts attached or extended to their nodes to extend functionality and add features like alt text, which are not native to Godot’s UI system.
Then, it is important to disable the focus system completely. You might think there is an easy way to do this, like some kind of game wide setting to disable this system. Oh, sweet summer child - insofar as I understand, there is no way to disable this system without simply setting the focus_mode
variable of every Control to either FOCUS_MODE_NONE
, or if you want to still have mouse interactivity, FOCUS_MODE_MOUSE
. This is incredibly annoying, because this means I need to manage the focus node of every single node that is processed by the screenreader’s interface. Furthermore, it also means all signals, theme changes and other features normally available with Controls are disabled, so they must be manually controlled. Signals are supported, but honestly I said “fuck it” for the first implementation of the themes because it would require me to figure out how to redraw every single fucking control within the screenreader logic and I would simply rather not do that right now. Did I mention that Godot didn’t have the most well designed UI system?
Anyways, after you finally tame the beast that is input control, we can finally start designing the control interface. For now, I’ve only tested it with keyboard, but the way it’s designed, I should only need to add a few extra features to support controllers as well. To avoid any potential conflicts with any sneaky embedded behavior, I separated the controls for the screenreader from the standard ui
Control inputs. This means I made like 10 extra controls to manage the input actions, all starting with “DOM_``, making it easy to distinguish. To add to the barrels of fun I’m having here, because of the way that the Godot asset store works, it doesn’t seem to be possible to submit assets that require unique key bindings set in the Project Settings tab. So as a result, I also have to add the input actions manually too! As a result though, this requires less setup to get the screenreader running for developers, so I guess it’s a win.
Finally, I can start writing the inputs. I actually separated screenreader input into two groups - first, the Control input runs, and if there was no input, the screenreader navigation input runs. This way, whenever input is received, it first tries to navigate the Control itself, and if nothing happens, it will navigate the interface, having consistent behavior instead of this linked-list crap. This logically organizes my code in a way that allows me to easily organize the code for managing individual different Controls and their inputs. Additionally, its important that I write code to tell the screenreader how to read off every Control when its selected. Borrowing from LightsOutGames’ design, I insert string “tokens” into a list that is read off as a combined string later. This makes it easy to organize what order information is presented to the player, as well as making things like translations easier to manage. Its important to organize tokens so the most important information is presented first, so blind players aren’t waiting a million years to read their stat screens.
And don’t forget to add extra sound effects! They’re not always necessary but they help distinguish the UI even more, especially associating certain sounds with certain actions.
“But” - I hear the game developers cry - “what if I want to develop my OWN interfaces? You can’t possibly predict the accessibility behavior of ALL game Control interfaces!” And it’s true! It’s impossible to account for every possibility. But it is possible to account for the possibility that you can’t account for every possibility… or something. By that, I mean that my screenreader also supports a means of extending functionality to Controls through scripts, and through these scripts it is easy to not only override the function of any Control, but also to create your own Controls and manage their navigation schemes. The interface for building these is just as simple as the interface for building regular input code in your game. So really, instead of restricting developer creativity, it opens the doors for developers to easily experiment with a new kind of way to interface their games to players.
Another important part of accessibility is having the ability to customize the experience. Even though this screenreader is really bare bones in its current state, it still has some useful options that players can set to customize their experience. For example, sound effects can be disabled for the screenreader, and a mode called “verbose mode” can be toggled on or off - adjusting how much information is displayed to the reader while navigating an interface, which could be useful if a player is adept in using an interface and only needs to know critical information. A high contrast theme can be set, and subtitles and TTS Audio description can be enabled or disabled.
Oh yeah, I forgot to mention - it also has a high contrast theme changer and support for adding subtitles to videos. But I’m sleepy and don’t want to write any more. I literally slept all week like goddamn sleeping beauty. Just read the documentation, okay? I wrote like 6,000 words to make sure that things are explained clearly and that developers don’t get overwhelmed by this shit. Part of accessibility integration is also clear communication with developers.
Anyways, as a result of this, I had to write another system that may not even seem related to the screenreader - a menu manager that is designed to only be used specifically with the screenreader. It is used to display… well, its own menus. Things like options or extra tools that the user can use. But it can be used to display all sorts of controls I may want to use with the screenreader. This way, developers meddling around with their own menus won’t interfere with the core functionality of these menus and tutorials. That’s right, build your own menu manager! Either way, I used this design to allow for some additional extended functionality with the screenreader, such as the ability to find all the buttons in a user interface.
It took around 2-3 weeks to develop the whole thing, and its around 5,000 lines of code in the gdscript files. Not too shabby! Unfortunately by the end my dreaded sleep spells started to take over, making it hard to wrap things up for the initial version, but thankfully I was able to muster enough energy to complete the final lap.
Summary
This is an early release of this software, and it was a lot of fun to sprint out this first edition of a screenreader, but it can hardly be called complete. It currently doesn’t have much in the way of mouse and controller support, and it needs to be integrated further with the screenreader capabilities of the OS, such as through Tolk. But I think it’s definitely a step in the right direction. The dirty work of organizing UI Controls into a sane manner is now complete. My hope is that it can also be embedded as an Editor plugin, so that it can greatly improve accessibility for blind players to use the Godot game engine themselves, and perhaps take development control over screenreader development in the future. LightsOutGaming’s screenreader plugin does exist, and there has been an upgrade for version 4.x, so this will allow for an upgrade to the currently existing solutions. All in all, I don’t really think that Godot is a long term solution for audio gaming or accessible gaming, because frankly the way it interacts with user interface in general is trash, but creating a project like this has a lot of positive benefit to accessibility as a design philosophy through providing a free and open source model of how to approach this kind of problem in the future, and making those solutions more accessible for both players and developers alike. Feel free to contribute towards the repository, I am really tired and need to return to other responsibilities for a while.
Credits:
- Initial Godot Screenreader/accessibility addons made by LightsOutGames
- Native TTS Godot screenreader port for 4.x by rodolpheh
- Accessibility testing provided by Daniel Hawkins
posted on 12:39:05 PM, 12/27/24 filed under: game tech [top] [newer] | [older]