Have you ever wondered about running multiple tasks at a time on a simple microcontroller? In this article I show you that writing a small and simple task scheduler/switcher is not as difficult as you might think! Multitasking allows the programmer greater freedom, as it allows running different code routines almost simultaneously. Multitasking can be used for simple tasks such as blinking LEDs in different frequencies (as we will show in our first example), reading keyboards, writing to displays, processing data, all of this in an almost “parallel” way.

Basic Concepts

The focal point on multitasking is the hardware or OS triggered code flow change. This concept is most likely known by any firmware programmer, as interrupts do exactly the same kind of thing: interrupts are hardware-triggered events that change the code flow, that is, while functions and branches are programmer-controlled changes to the code flow, interrupts on the other hand can’t be foreseen (from the code flow point of view). That is especially true for asynchronous interrupts.

But what happens on an interrupt event? We can say that basically there is a context saving and a code flow change to a new context: the ISR (interrupt servicing routine). After processing the interrupt, the previous context is restored and code flow returns to the point before the interrupt event.


Figure 1 – An interrupt interrupts the main code

It is important clarifying the meaning of context in computational terms: it’s the state of all CPU registers regarded to ordinary processing. These are usually the accumulator, status register, stack pointer (SP), program counter (PC), and any other internal register that may be used for ordinary processing. It can be said that context saving is a taking a “snapshot” of the CPU, showing exactly what it was doing at a given time. From that “snapshot” it should be possible to completely restore the code flow exactly as it was.

It is important to notice that the context must be stored in memory and that is usually done by using software or hardware stacks.

Now that we saw how a primary task is switched for a secondary task (the ISR code), it is time to dive deeper (but not so much!) into some multitasking concepts.

Firstly let’s remember the obvious: a computational system with a single CPU and a single execution unit can’t run more than a task at a time. Even a CPU with more than one execution unit (thus able to run multiple instructions at a time, a superscalar CPU) can’t run more than a task at a time (on the other hand, multithreaded CPUs can run multiple threads at the same time, but they are not our focus today).

In order for a CPU being able to run more than a task at a time, a multitasking mechanism is needed and that is usually done by a time-division multiplexing, that is, CPU time is divided into time-slots and these slots are given to different tasks, thus allowing multiple tasks to share CPU time.

Multitasking can be split into two main categories: cooperative and preemptive multitasking. Cooperative multitasking is based on the principle that each task controls the amount of CPU-time it is going to use, spontaneously switching to another task when it is done. That means the whole system efficiency depends on each task cooperating with each other.

The preemptive multitasking on the other hand relies on a special program (the task scheduler) to control the amount of CPU-time given to each task. The task scheduler is responsible for selecting which task to run and when to switch to another task. There are several scheduling algorithms out there and round-robin is one of the simplest and most efficient one. The round-robin scheduler tries to split CPU-time equally among all available tasks and some variations may include task priority and other mechanisms.

Task schedulers usually make use of some task-status flags in order to portrait the current task state, some of them are: STARTING, READY, RUNNING, BLOCKED, I/O and FINISHED.

ULWOS is a round-robin preemptive task scheduler but right now we will not include any priority mechanism nor status flags. That means that once a task is created it will run forever (it cannot reach an end) and will still get a CPU time-slot as any other task. This is obviously far from ideal, but is good enough for demonstrating some basic concepts. In the near future we will include some more features a turn our task scheduler into a very small operating system (or an operating system aspirant!).

The following figure shows, in a simplified way, what we discussed so far.


Figure 2 – A task scheduler

Initially all context stacks point to each task’s entry-point. When the scheduler starts running task 1, it loads the task context and branches to it. After a while (the time-slot, quantum, or heart-beat) the scheduler interrupts task 1 (which was at point “b” at that time), saves its context, loads task 2 context and branches to it. After another time-slot, the scheduler interrupts task 2 (which was at point “d” at that time), saves its context, loads task 3 context and branches to it. It will keep doing that until all tasks are ran (task “n” is reached) and then will restart from task 1. Notice that task 1 will resume from point “b” as that was the point it was the last time before it was suspended.

That allows the CPU to run different tasks almost simultaneously!

Of course on a commercial operating system there is a very rigid control over which memory and I/O areas a task can access, thus avoiding a task from changing memory belonging to the kernel or to another task. This kind of control is usually done by dedicate hardware such as MMUs (Memory Management Units) and special CPU modes (such as supervisors or privileged execution modes). Moreover a real operating system must provide some way for inter-task communication (or inter-process communication – IPC).

The RL78 (as any other 8, 16 and even some 32-bit microcontrollers) does not provide any of these hardware modules and that means there is no way to avoid a task from changing memory belonging to another task or even worse, belonging to the system kernel. Regarding inter-task communication, we will get back into this in a future article.

RL78 Task Scheduler Implementation

Now it’s time to see how to implement a small task scheduler on the RL78. The Renesas RL78 line is derived from the 78K line, which is derived from the industry’s microcomputing trail-blazer, the Intel 8080.

RL78’s programming model includes a set of 8-bit general-purpose registers: A, X, B, C, D, E, H and L, which can be grouped on 16-bit registers: AX, BC, DE and HL (AX usually works as an accumulator while BC, DE and HL usually work as pointers). There are up to 4 register banks, resulting in up to 32 8-bit registers or 16 16-bit registers.

Along with the general-purpose registers there is also an 8-bit status register (PSW), a 16-bit stack-pointer (SP), two 8-bit segment registers (ES for data and CS for code) and a 20-bit program counter (PC). All those registers (except PC) are memory-mapped and can be accessed as any other RAM address.

The architecture also allows up to 1MB of memory and reserves the last 64KB for RAM and registers.

Regarding our task scheduler, the high number of general-purpose registers means the same amount of RAM for context storage and a slightly impact on the context-switch time, as the higher the number of registers to get saved, the slower is the register stacking/unstacking operation! Fortunately the highly optimized CISC instruction set is very fast (most instructions are only 1 or 2-byte large and run in 1 clock cycle thanks to the 3-stage pipeline) and the context switch time is very small.

The foundation of a task scheduler is the context saving (of the task being suspended) and restoring (of the task being resumed) operations. Context saving uses PUSH assembly instructions to store registers onto the stack (pointed to by SP) and context restoring uses POP instructions which restore register contents read from the stack (pointed to by SP). One important note: the stack implementation on the RL78’s programming model grows from top to bottom. Each stacking operation (PUSH) decreases SP by 2 and each unstacking operation increases SP by 2!

By default, the GCC compiler uses only the three first register banks for C code, while the last bank is reserved for ISR code. That means our context save/restoring code will have to preserve only the three first register banks and we will be able to use bank 3 for internal operations.

The code snippet below shows our round-robin task scheduler. It makes use of the interval timer (IT) interrupt available on all RL78. The timer is configured for 1ms intervals and this is our quantum or heart-beat. It means that every 1ms the IT interrupt is triggered, PC and PSW are stored (by hardware) onto the task stack and code flow branches to the INT_IT function.

Inside the ISR three main operations take place: context saving (save_context function), the current task pointer (ulwos_current_task) is increased in order to point to the next valid task and the next task context is restored. By the end of the ISR the stack is now the next task’s stack, the first 3 register-banks are restored with the value they previously had for that task and then a RETI instruction pops PC and PSW from the stack and the code flow resumes from the place (within the new task) where it was previously suspended!

Note that the INT_IT function is declared with the naked attribute. It tells the GCC compiler to not generate entry (prologue) and exit (epilogue) code for that function. Although GCC’s manual states that a naked function should not have mixed assembly nor extended assembly code on the inside, I believe it does not applies to this case as we are saving and restoring the whole context. Besides, the ISR code makes use of only the bank 3 registers and we are dealing with an ISR function, not a standard one.

It is easy to notice the code’s simplicity resulting from the use of the context saving and restoring functions. So what about taking a deeper look inside them? But before we can move forward, it is important to know all data structures used for context storage. ULWOS makes use of three basic structures: ulwos_task_context is an bidimensional array for storing general-purpose registers contents (AX, BC, DE and HL from banks 0, 1 and 2, right now we are not saving/restoring ES nor CS), ulwos_taskSP is an array that stores the SP for each task and ulwos_task_stack is a bidimensional array for storing the actual stack for each task.

Context saving is basically saving the task’s SP and CPU registers. The task stack does not need to be saved because each task has its own stack.

Unfortunately context and SP storage cannot be directly done in C, that’s because the compiler makes use of CPU registers and also SP for data manipulation and we can’t assure those register contents will be accurately preserved. So, instead of using C we are using assembly to guarantee all registers are stored accurately. Moreover, by using assembly we can achieve coding as efficient as possible (limited by the programmer’s ability, of course!).

We have the context saving function below. It is declared as inline because we want the compiler to actually insert the code instead of a function call, thus avoiding stack usage that would imply more changes to the code).

The assembly code is self-explanatory, for a deeper review on how to use inline assembly on GCC, take a look at my article “RL78’s  Inline Assembly on GCC“.

Note that save_regs() is not a real function, but a macro created to simplify and improve code readability, it is defined as follows:

After function save_context is called, all contents of register banks 0, 1 and 2 will be preserved onto the task’s ulwos_task_context stack and the task’s SP will be preserved onto ulwos_taskSP array. PSW and PC contents was automatically preserved onto the task’s stack by the interrupt itself, so we don’t need to worry about them!

The next step is selecting the next task to run. The round-robin scheduler simply increases the current task pointer (ulwos_current_task) and checks if it points to a valid task, it the task is not valid, it returns to zero and restart the task cycle.

After selecting the new task to run, the scheduler must restore its context, so that it can be resumed from the exact point it was when the task was suspended. Function restore_context is responsible for recovering the context. Again its assembly code is self-explanatory.

As well as for save_regs(), restore_regs() is not a function but a macro which unstacks registers from memory. It is defined as follows:

Now that our scheduler is able to switch contexts and tasks, it is necessary creating a new task for the scheduler. It probably looks like a trivial task, but the truth is it is not, as creating a new task implies on preparing (initializing) the task stack so that it has a previously known context. That context must also include the entry address of the task (and also a default value for the PSW) so that when the task’s context is restored, the code flow starts from the correct point. ULWOS uses 0x86 as the default value for PSW, this enables interrupts (IE=1), selects register bank 0 (RP0=RP1=0) and selects the lowest interrupt priority mode (ISP0=ISP1=1).

Now we got a working task scheduler and a way to create a new task, all we need to do is to start the system!

That means setting up the interval timer (IT), enable its interrupt, setting up the SP so that it points to the top of the first task’s stack and then run a return from interrupt (RETI) instruction!

Then the reader might ask: what? How are we going to return from an interrupt if we are not on an actual ISR? The truth is that by using the RETI instruction we make the CPU unstack the PSW and PC from the stack (that very same address we took care of storing when we created the task), thus making the code flow to branch to the task’s entry point and also enabling interrupts! Magic? No! Cheating? Maybe! We just used the CPU programming model in an unusual way! Did you noticed there is no EI instruction within the code? That is because when RETI is ran, the PSW is restored from the value we previously stored (0x86), thus enabling interrupts without needing any other instruction!

All these functions are part of the ulwos.c file. The ulwos.h header file has some basic settings for our task scheduler: ULWOS_TASK_STACK_SIZE defines the size (in bytes) for each task stack (the standard value is 128 bytes) and ULWOS_NUM_TASKS defines the maximum number of tasks. It is very important to set up these parameters according to your system and application needs.

While the maximum tasks number is very easy to define, setting up the stack size for each task is far more complex as it will depend on the task complexity, the number and size of local variables, number of function calls within the task, etc. It is possible to add a small mechanism to detect stack overflow, but that is not our goal by now.


Now that we saw all coding for this ULWOS stage, let’s take a look at two simple usage examples that can test our small task scheduler and operating system aspirant!

The first one is a simple dual LED blinker, each LED being controlled by an independent task. This example can be tested on the E2Studio Simulator or on the YRPBRL78G13 board (or any other board, as long two LEDs are connected to P76 and P77 pins or changes are made to the code).

Notice the tasks (task1 and task2) are actually functions declared with noreturn attribute. This makes the compiler not generating return code for these functions, slightly reducing code size!

The whole project for E2Studio and GCC can be downloaded on GITHUB.

The second example is a bit more sophisticated and was designed for the RL78/G13 Starter Kit (RSK). It creates two independent tasks, one blinks a LED wired to P63 and the other writes a counting up to the board display. The whole project for E2Studio and GCC can be downloaded on  GITHUB.

Before closing this article, it is important to make some relevant notes:

  1. On a multitasking environment there is always an issue regarding shared resources: imagine task 1 is reading a 32-bit variable (named X) and got suspended and task 2 resumes and changes X contents. What would result from that? The value read from X on task 1 would be probably corrupted and could lead to a larger failure. This issue is usually solved by using semaphores or disabling preemption during critical code execution. This is a very common issue on multitasking systems and care should be taken to prevent racing conditions that could lead to memory or I/O corruption;
  2. Multiple priority interrupts should be avoid at all cost as they can easily lead to a stack overflow;
  3. At the current ULWOS stage a task must not end or return ever! Remember: ULWOS tasks are functions but they are not called by the compiler, so they don’t have a returning address stored on stack. If a task reaches its end and returns, a stack underflow error will raise and system behavior is unpredictable! So you must always write the task within an infinite loop (while(1)) or place an infinite loop at the end of the task.

That’s all for now! I hope this article is useful for those who want to play with a small and free task scheduler. I have more features ready for ULWOS such as: an idle task with low power mode, an idle counter, task status and semaphores. See you soon!

Ps.: I would like to thank William Severino for helping me out with some translation issues! Thank you Will!

ULWOS – Multitasking on Renesas RL78
Tagged on:                             

Leave a Reply