ULWOS – Multitasking on Renesas RL78

Have you ever wondered about running multiple tasks at a time on a simple microcontroller? In this article I show you that writing a small and simple task scheduler/switcher is not as difficult as you might think! Multitasking allows the programmer greater freedom, as it allows running different code routines almost simultaneously. Multitasking can be used for simple tasks such as blinking LEDs in different frequencies (as we will show in our first example), reading keyboards, writing to displays, processing data, all of this in an almost “parallel” way.

Basic Concepts

The focal point on multitasking is the hardware or OS triggered code flow change. This concept is most likely known by any firmware programmer, as interrupts do exactly the same kind of thing: interrupts are hardware-triggered events that change the code flow, that is, while functions and branches are programmer-controlled changes to the code flow, interrupts on the other hand can’t be foreseen (from the code flow point of view). That is especially true for asynchronous interrupts.

But what happens on an interrupt event? We can say that basically there is a context saving and a code flow change to a new context: the ISR (interrupt servicing routine). After processing the interrupt, the previous context is restored and code flow returns to the point before the interrupt event.

ulwos_isr

Figure 1 – An interrupt interrupts the main code

It is important clarifying the meaning of context in computational terms: it’s the state of all CPU registers regarded to ordinary processing. These are usually the accumulator, status register, stack pointer (SP), program counter (PC), and any other internal register that may be used for ordinary processing. It can be said that context saving is a taking a “snapshot” of the CPU, showing exactly what it was doing at a given time. From that “snapshot” it should be possible to completely restore the code flow exactly as it was.

It is important to notice that the context must be stored in memory and that is usually done by using software or hardware stacks.

Now that we saw how a primary task is switched for a secondary task (the ISR code), it is time to dive deeper (but not so much!) into some multitasking concepts.

Firstly let’s remember the obvious: a computational system with a single CPU and a single execution unit can’t run more than a task at a time. Even a CPU with more than one execution unit (thus able to run multiple instructions at a time, a superscalar CPU) can’t run more than a task at a time (on the other hand, multithreaded CPUs can run multiple threads at the same time, but they are not our focus today).

In order for a CPU being able to run more than a task at a time, a multitasking mechanism is needed and that is usually done by a time-division multiplexing, that is, CPU time is divided into time-slots and these slots are given to different tasks, thus allowing multiple tasks to share CPU time.

Multitasking can be split into two main categories: cooperative and preemptive multitasking. Cooperative multitasking is based on the principle that each task controls the amount of CPU-time it is going to use, spontaneously switching to another task when it is done. That means the whole system efficiency depends on each task cooperating with each other.

The preemptive multitasking on the other hand relies on a special program (the task scheduler) to control the amount of CPU-time given to each task. The task scheduler is responsible for selecting which task to run and when to switch to another task. There are several scheduling algorithms out there and round-robin is one of the simplest and most efficient one. The round-robin scheduler tries to split CPU-time equally among all available tasks and some variations may include task priority and other mechanisms.

Task schedulers usually make use of some task-status flags in order to portrait the current task state, some of them are: STARTING, READY, RUNNING, BLOCKED, I/O and FINISHED.

ULWOS is a round-robin preemptive task scheduler but right now we will not include any priority mechanism nor status flags. That means that once a task is created it will run forever (it cannot reach an end) and will still get a CPU time-slot as any other task. This is obviously far from ideal, but is good enough for demonstrating some basic concepts. In the near future we will include some more features a turn our task scheduler into a very small operating system (or an operating system aspirant!).

The following figure shows, in a simplified way, what we discussed so far.

ulwos_tasks

Figure 2 – A task scheduler

Initially all context stacks point to each task’s entry-point. When the scheduler starts running task 1, it loads the task context and branches to it. After a while (the time-slot, quantum, or heart-beat) the scheduler interrupts task 1 (which was at point “b” at that time), saves its context, loads task 2 context and branches to it. After another time-slot, the scheduler interrupts task 2 (which was at point “d” at that time), saves its context, loads task 3 context and branches to it. It will keep doing that until all tasks are ran (task “n” is reached) and then will restart from task 1. Notice that task 1 will resume from point “b” as that was the point it was the last time before it was suspended.

That allows the CPU to run different tasks almost simultaneously!

Of course on a commercial operating system there is a very rigid control over which memory and I/O areas a task can access, thus avoiding a task from changing memory belonging to the kernel or to another task. This kind of control is usually done by dedicate hardware such as MMUs (Memory Management Units) and special CPU modes (such as supervisors or privileged execution modes). Moreover a real operating system must provide some way for inter-task communication (or inter-process communication – IPC).

The RL78 (as any other 8, 16 and even some 32-bit microcontrollers) does not provide any of these hardware modules and that means there is no way to avoid a task from changing memory belonging to another task or even worse, belonging to the system kernel. Regarding inter-task communication, we will get back into this in a future article.

RL78 Task Scheduler Implementation

Now it’s time to see how to implement a small task scheduler on the RL78. The Renesas RL78 line is derived from the 78K line, which is derived from the industry’s microcomputing trail-blazer, the Intel 8080.

RL78’s programming model includes a set of 8-bit general-purpose registers: A, X, B, C, D, E, H and L, which can be grouped on 16-bit registers: AX, BC, DE and HL (AX usually works as an accumulator while BC, DE and HL usually work as pointers). There are up to 4 register banks, resulting in up to 32 8-bit registers or 16 16-bit registers.

Along with the general-purpose registers there is also an 8-bit status register (PSW), a 16-bit stack-pointer (SP), two 8-bit segment registers (ES for data and CS for code) and a 20-bit program counter (PC). All those registers (except PC) are memory-mapped and can be accessed as any other RAM address.

The architecture also allows up to 1MB of memory and reserves the last 64KB for RAM and registers.

Regarding our task scheduler, the high number of general-purpose registers means the same amount of RAM for context storage and a slightly impact on the context-switch time, as the higher the number of registers to get saved, the slower is the register stacking/unstacking operation! Fortunately the highly optimized CISC instruction set is very fast (most instructions are only 1 or 2-byte large and run in 1 clock cycle thanks to the 3-stage pipeline) and the context switch time is very small.

The foundation of a task scheduler is the context saving (of the task being suspended) and restoring (of the task being resumed) operations. Context saving uses PUSH assembly instructions to store registers onto the stack (pointed to by SP) and context restoring uses POP instructions which restore register contents read from the stack (pointed to by SP). One important note: the stack implementation on the RL78’s programming model grows from top to bottom. Each stacking operation (PUSH) decreases SP by 2 and each unstacking operation increases SP by 2!

By default, the GCC compiler uses only the three first register banks for C code, while the last bank is reserved for ISR code. That means our context save/restoring code will have to preserve only the three first register banks and we will be able to use bank 3 for internal operations.

The code snippet below shows our round-robin task scheduler. It makes use of the interval timer (IT) interrupt available on all RL78. The timer is configured for 1ms intervals and this is our quantum or heart-beat. It means that every 1ms the IT interrupt is triggered, PC and PSW are stored (by hardware) onto the task stack and code flow branches to the INT_IT function.

Inside the ISR three main operations take place: context saving (save_context function), the current task pointer (ulwos_current_task) is increased in order to point to the next valid task and the next task context is restored. By the end of the ISR the stack is now the next task’s stack, the first 3 register-banks are restored with the value they previously had for that task and then a RETI instruction pops PC and PSW from the stack and the code flow resumes from the place (within the new task) where it was previously suspended!

void __attribute__ ((naked)) INT_IT(void) {
  save_context();       // save current context
  ulwos_current_task++; // go to next task
  // if it was the last one, returns to the first
  if (ulwos_current_task>=ulwos_num_tasks) ulwos_current_task=0;
  restore_context();    // restore the new task context
}

Note that the INT_IT function is declared with the naked attribute. It tells the GCC compiler to not generate entry (prologue) and exit (epilogue) code for that function. Although GCC’s manual states that a naked function should not have mixed assembly nor extended assembly code on the inside, I believe it does not applies to this case as we are saving and restoring the whole context. Besides, the ISR code makes use of only the bank 3 registers and we are dealing with an ISR function, not a standard one.

It is easy to notice the code’s simplicity resulting from the use of the context saving and restoring functions. So what about taking a deeper look inside them? But before we can move forward, it is important to know all data structures used for context storage. ULWOS makes use of three basic structures: ulwos_task_context is an bidimensional array for storing general-purpose registers contents (AX, BC, DE and HL from banks 0, 1 and 2, right now we are not saving/restoring ES nor CS), ulwos_taskSP is an array that stores the SP for each task and ulwos_task_stack is a bidimensional array for storing the actual stack for each task.

Context saving is basically saving the task’s SP and CPU registers. The task stack does not need to be saved because each task has its own stack.

Unfortunately context and SP storage cannot be directly done in C, that’s because the compiler makes use of CPU registers and also SP for data manipulation and we can’t assure those register contents will be accurately preserved. So, instead of using C we are using assembly to guarantee all registers are stored accurately. Moreover, by using assembly we can achieve coding as efficient as possible (limited by the programmer’s ability, of course!).

We have the context saving function below. It is declared as inline because we want the compiler to actually insert the code instead of a function call, thus avoiding stack usage that would imply more changes to the code).

The assembly code is self-explanatory, for a deeper review on how to use inline assembly on GCC, take a look at my article “RL78’s  Inline Assembly on GCC“.

void inline save_context(void){
  asm volatile (
    "sel RB3\n\t"        // select bank 3
    "mov X,%[ct]\n\t"    // X = ulwos_current_task
    "clrb A\n\t"         // A = 0 (AX = ulwos_current_task)
    "shlw AX,1\n\t"      // AX = ulwos_current_task *2
    "movw BC,%[tsp]\n\t" // BC = ulwos_taskSP
    // AX = ulwos_taskSP+(current_task*2) => ulwos_taskSP[ulwos_current_task]
    "addw AX,BC\n\t"
    "movw DE,AX\n\t"     // DE = ulwos_taskSP[ulwos_current_task]
    "movw AX,SP\n\t"     // AX = task SP
    "movw [DE],AX\n\t"   // ulwos_task_SP[ulwos_current_task] = task SP
    // now we configure the SP to point to the top of the task's context stack
    "mov A,#24\n\t"      // A = 24 (12 registers * 2 bytes)
    "mov X,%[ct]\n\t"    // X = ulwos_current_task
    // X = ulwos_current_task+1 (so it points to the top of the stack)
    "inc X\n\t"
    "mulu X\n\t"         // AX = 24 * (ulwos_current_task+1)
    "addw AX,%[tc]\n\t"  // AX = top of context stack address
    "movw SP,AX\n\t"     // SP now points to the top of the context stack
    :
    :[ct]"m"(ulwos_current_task),[tc]"i"(ulwos_task_context),[tsp]"i"(ulwos_taskSP)
  );
  // saving register banks 2, 1 e 0
  asm ("sel RB2");
  save_regs(); // save registers from bank 2
  asm ("sel RB1");
  save_regs(); // save registers from bank 1
  asm ("sel RB0");
  save_regs(); // save registers from bank 0
  asm ("sel RB3");
}

Note that save_regs() is not a real function, but a macro created to simplify and improve code readability, it is defined as follows:

#define save_regs() asm ("push AX\n\tpush BC\n\tpush DE\n\tpush HL\n\t")

After function save_context is called, all contents of register banks 0, 1 and 2 will be preserved onto the task’s ulwos_task_context stack and the task’s SP will be preserved onto ulwos_taskSP array. PSW and PC contents was automatically preserved onto the task’s stack by the interrupt itself, so we don’t need to worry about them!

The next step is selecting the next task to run. The round-robin scheduler simply increases the current task pointer (ulwos_current_task) and checks if it points to a valid task, it the task is not valid, it returns to zero and restart the task cycle.

After selecting the new task to run, the scheduler must restore its context, so that it can be resumed from the exact point it was when the task was suspended. Function restore_context is responsible for recovering the context. Again its assembly code is self-explanatory.

void inline restore_context(void){
  asm volatile (
    // configure the SP to point to the context stack
    "mov A,#24\n\t"     // A = 24
    "mov X,%[ct]\n\t"   // X = ulwos_current_task
    "mulu X\n\t"        // AX = 24 * (ulwos_current_task)
    // AX = ulwos_task_context[ulwos_current_task] address
    "addw AX,%[tc]\n\t"
    "movw SP,AX\n\t"    // SP now points to the top of the context stack
    :
    :[ct]"m"(ulwos_current_task),[tc]"i"(ulwos_task_context)
  );
  // now we restore register banks 0, 1 e 2
  asm ("sel RB0");
  restore_regs(); // restore bank 0
  asm ("sel RB1");
  restore_regs(); // restore bank 1
  asm ("sel RB2");
  restore_regs(); // restore bank 2
  asm volatile (
    "sel RB3\n\t"
    "mov X,%[ct]\n\t"    // X = ulwos_current_task
    "clrb A\n\t"         // A = 0 (AX = ulwos_current_task)
    "shlw AX,1\n\t"      // AX = ulwos_current_task*2
    "movw BC,%[tsp]\n\t" // BC = ulwos_taskSP
    "addw AX,BC\n\t"     // AX = ulwos_taskSP address + (ulwos_current_task*2) => ulwos_taskSP[ulwos_current_task]
    "movw DE,AX\n\t"     // DE = ulwos_taskSP[ulwos_current_task] address
    "movw AX,[DE]\n\t"   // AX = ulwos_taskSP[ulwos_current_task]
    "movw SP,AX\n\t"     // set SP to the current task
    "reti\n\t"           // return from interrupt and change the context
    :
    :[ct]"m"(ulwos_current_task),[tsp]"i"(ulwos_taskSP)
  );
}

As well as for save_regs(), restore_regs() is not a function but a macro which unstacks registers from memory. It is defined as follows:

#define restore_regs() asm ("pop HL\n\tpop DE\n\tpop BC\n\tpop AX\n\t")

Now that our scheduler is able to switch contexts and tasks, it is necessary creating a new task for the scheduler. It probably looks like a trivial task, but the truth is it is not, as creating a new task implies on preparing (initializing) the task stack so that it has a previously known context. That context must also include the entry address of the task (and also a default value for the PSW) so that when the task’s context is restored, the code flow starts from the correct point. ULWOS uses 0x86 as the default value for PSW, this enables interrupts (IE=1), selects register bank 0 (RP0=RP1=0) and selects the lowest interrupt priority mode (ISP0=ISP1=1).

ULWOS_TASKHANDLER ulwos_create_task(void * task_address){
  if (ulwos_num_tasks>=ULWOS_NUM_TASKS) return -1;
  ulwos_tempSP = (int) task_address;
  ulwos_current_task = ulwos_num_tasks;
  ulwos_num_tasks++;
  asm volatile (
    "sel RB3\n\t"
    "movw HL,SP\n\t"     // HL stores current SP
    "clrw AX\n\t"        // AX = 0
    "movw DE,%[tsz]\n\t" // DE = ULWOS_TASK_STACK_SIZE
    "clrb B\n\t"
    "mov C,%[ct]\n\t"    // BC = ulwos_current_task
    "inc C\n\t"
    // this is a simple 16-bit multiply
    "LOOP:"
    "addw AX,DE\n\t"     // AX = AX + ULWOS_TASK_STACK_SIZE
    "decw BC\n\t"        // BC = BC-1
    "cmp0 B\n\t"         // compares B to 0
    "bnz $LOOP\n\t"      // branch to LOOP if not zero
    "cmp0 C\n\t"         // compares C to 0
    "bnz $LOOP\n\t"      // branch to LOOP if not zero
    // AX = ULWOS_TASK_STACK_SIZE * (ulwos_current_task+1)
    "movw BC,AX\n\t"
    "movw AX,%[ts]\n\t"
    // AX points to ulwos_task_stack[ulwos_current_stack]
    "addw AX,BC\n\t"
    "movw SP,AX\n\t"     // set SP to point to the top of the new task stack
    "movw DE,AX\n\t"     // DE points to the top of the task stack
    "movw AX,#0x8600\n\t"
    "push AX\n\t"        // push PSW and higher PC (=0) onto the stack
    "movw AX,%[tsp]\n\t" // AX has the task entry address (16 bits)
    "push AX\n\t"        // push the address onto the stack
    :
    :[tsz]"i"(ULWOS_TASK_STACK_SIZE),[ct]"m"(ulwos_current_task),[ts]"i"(ulwos_task_stack),[tsp]"m"(ulwos_tempSP)
  );
  asm volatile (
    "mov X,%[ct]\n\t"    // X = ulwos_current_task
    "clrb A\n\t"         // A = 0 (AX = ulwos_current_task)
    "shlw AX,1\n\t"      // AX = ulwos_current_task*2
    "movw BC,%[tsp]\n\t" // BC = ulwos_taskSP
    "addw AX,BC\n\t"     // AX = ulwos_taskSP+(ulwos_current_task*2) => ulwos_taskSP[ulwos_current_task]
    "movw DE,AX\n\t"     // DE = ulwos_taskSP[ulwos_current_task]
    "movw AX,SP\n\t"     // AX = task's SP
    "movw [DE],AX\n\t"   // ulwos_task_SP[ulwos_current_task] = task's SP
    "movw AX,HL\n\t"
    "movw SP,AX\n\t"     // restore previous SP
    "sel RB0\n\t"
    :
    :[tsp]"i"(ulwos_taskSP),[ct]"m"(ulwos_current_task)
  );
  return ulwos_current_task;
}

Now we got a working task scheduler and a way to create a new task, all we need to do is to start the system!

That means setting up the interval timer (IT), enable its interrupt, setting up the SP so that it points to the top of the first task’s stack and then run a return from interrupt (RETI) instruction!

Then the reader might ask: what? How are we going to return from an interrupt if we are not on an actual ISR? The truth is that by using the RETI instruction we make the CPU unstack the PSW and PC from the stack (that very same address we took care of storing when we created the task), thus making the code flow to branch to the task’s entry point and also enabling interrupts! Magic? No! Cheating? Maybe! We just used the CPU programming model in an unusual way! Did you noticed there is no EI instruction within the code? That is because when RETI is ran, the PSW is restored from the value we previously stored (0x86), thus enabling interrupts without needing any other instruction!

void inline ulwos_start(void){
  ulwos_current_task = 0;
  OSMC = bWUTMMCK0;    // sets LOCO (15kHz) as IT/RTC clock source
  RTCEN = 1;           // enables RTC and IT modules
  ITMC = bRINTE | 14;  // IT enabled, interval = 1ms
  ITMK = 0;            // IT interrupt enabled
  asm volatile (
    "sel RB3\n\t"        // select bank 3
    "movw AX,%[tsp]\n\t" // AX = ulwos_taskSP[0]
    "movw SP,AX\n\t"     // SP = AX = ulwos_taskSP[0]
    "reti\n\t"           // restores PSW and branches to the first task
    "nop\n\t"            // this is here just for alignment (needed by the simulator)
    :
    :[tsp]"m"(ulwos_taskSP)
  );
}

All these functions are part of the ulwos.c file. The ulwos.h header file has some basic settings for our task scheduler: ULWOS_TASK_STACK_SIZE defines the size (in bytes) for each task stack (the standard value is 128 bytes) and ULWOS_NUM_TASKS defines the maximum number of tasks. It is very important to set up these parameters according to your system and application needs.

While the maximum tasks number is very easy to define, setting up the stack size for each task is far more complex as it will depend on the task complexity, the number and size of local variables, number of function calls within the task, etc. It is possible to add a small mechanism to detect stack overflow, but that is not our goal by now.

Examples

Now that we saw all coding for this ULWOS stage, let’s take a look at two simple usage examples that can test our small task scheduler and operating system aspirant!

The first one is a simple dual LED blinker, each LED being controlled by an independent task. This example can be tested on the E2Studio Simulator or on the YRPBRL78G13 board (or any other board, as long two LEDs are connected to P76 and P77 pins or changes are made to the code).

Notice the tasks (task1 and task2) are actually functions declared with noreturn attribute. This makes the compiler not generating return code for these functions, slightly reducing code size!

The whole project for E2Studio and GCC can be downloaded on GITHUB.

#include "iodefine.h"
#include "iodefine_ext.h"
#include "myrl78.h"
#include "interrupt_handlers.h"
#include "ulwos.h"

#define LED P7_bit.no7
#define LED2 P7_bit.no6

void __attribute__ ((noreturn)) task1(void){
  volatile long int count;
  while (1){
    LED = 0;
    for (count=0; count<100000;count++);
    LED = 1;
    for (count=0; count<100000;count++);
  }
}

void __attribute__ ((noreturn)) task2(void){
  volatile long int count;
  while (1){
    LED2 = 0;
    for (count=0; count<200000;count++);
    LED2 = 1;
    for (count=0; count<200000;count++);
  }
}

int main(void){
  PM7_bit.no7 = 0;
  PM7_bit.no6 = 0;
  LED = 0;
  ULWOS_TASKHANDLER tk1 = ulwos_create_task(&task1);
  ULWOS_TASKHANDLER tk2 = ulwos_create_task(&task2);
  ulwos_start();
  // the following code is never reached
  while (1);
}

The second example is a bit more sophisticated and was designed for the RL78/G13 Starter Kit (RSK). It creates two independent tasks, one blinks a LED wired to P63 and the other writes a counting up to the board display. The whole project for E2Studio and GCC can be downloaded on  GITHUB.

#include "iodefine.h"
#include "iodefine_ext.h"
#include "myrl78.h"
#include "interrupt_handlers.h"
#include "ulwos.h"
#include "lcd_8x2.h"

#define LED P6_bit.no3
#define LED_DIR PM6_bit.no3

void __attribute__ ((noreturn)) task1(void){
  volatile unsigned long count;
  LED_DIR = 0;
  while (1){
    LED = 0;
    for (count=0; count<100000;count++);
    LED = 1;
    for (count=0; count<100000;count++);
  }
}

void __attribute__ ((noreturn)) task2(void){
  volatile unsigned char aux;
  volatile unsigned long temp;
  LCD_init(DISPLAY_8x5|_2LINES,DISPLAY_ON|CURSOR_OFF|CURSOR_FIXED);
  while (1){
    LCD_write_char('\f');   // clear display
    LCD_pos_xy(0,0);
    LCD_write_string("Testing");
    for (aux=0;aux<100;aux++){
      LCD_pos_xy(0,1);
      LCD_print_char(aux);
      for (temp=0;temp<500000;temp++);
    }
  }
}

void main(void){
  ULWOS_TASKHANDLER tk1 = ulwos_create_task(&task1);
  ULWOS_TASKHANDLER tk2 = ulwos_create_task(&task2);
  ulwos_start();
  // the following code is never reached
  while (1);
}

Before closing this article, it is important to make some relevant notes:

  1. On a multitasking environment there is always an issue regarding shared resources: imagine task 1 is reading a 32-bit variable (named X) and got suspended and task 2 resumes and changes X contents. What would result from that? The value read from X on task 1 would be probably corrupted and could lead to a larger failure. This issue is usually solved by using semaphores or disabling preemption during critical code execution. This is a very common issue on multitasking systems and care should be taken to prevent racing conditions that could lead to memory or I/O corruption;
  2. Multiple priority interrupts should be avoid at all cost as they can easily lead to a stack overflow;
  3. At the current ULWOS stage a task must not end or return ever! Remember: ULWOS tasks are functions but they are not called by the compiler, so they don’t have a returning address stored on stack. If a task reaches its end and returns, a stack underflow error will raise and system behavior is unpredictable! So you must always write the task within an infinite loop (while(1)) or place an infinite loop at the end of the task.

That’s all for now! I hope this article is useful for those who want to play with a small and free task scheduler. I have more features ready for ULWOS such as: an idle task with low power mode, an idle counter, task status and semaphores. See you soon!

Ps.: I would like to thank William Severino for helping me out with some translation issues! Thank you Will!

Leave a Reply