C++ modelling of a custom RISC-V core
By Lucien Gheerbrant
Note
I have truncated a big part of my work. A lot of what I worked on was proprietary design owned by PROPHESEE. I only kept the part of the design that was open source and already public here, to avoid any problem.
Introduction
During an internship at PROPHESEE, I had the opportunity to work on a fascinating project involving the simulation of a RISC-V based system using the gem5 simulator. This article focuses on the C++ modelling aspects of the project, highlighting the challenges and achievements in creating a simulation model for an event-based camera system.
The primary goal of my internship was to simulate the SCR1 microcontroller for an event-based camera using gem5. This involved creating detailed models of the system’s components, including the SCR1 processor, memory, and peripherals like timers and a MailBox to use with FreeRTOS.
The SCR1 Microcontroller
The SCR1 by Syntacore is an open-source microcontroller. It is built on the RV32I architecture, a subset of the 32-bit RISC-V architecture. It comprises 47 instructions, supporting arithmetic, logical, load, and store operations. The SCR1 is designed to meet the needs of embedded systems and Internet of Things (IoT) applications, characterised by its energy efficiency and minimalistic design.
The SCR1, as depicted below, employs a pipeline architecture with stages ranging from two to four. It features an Integer Processing Unit (IPU) for instruction decoding and execution, along with a memory interface. The memory interfaces can be configured to use either the Advanced eXtensible Interface (AXI) or Advanced High-performance Bus (AHB) bus protocols in the RTL files. Additionally, the core includes an optional cache controller to optimise data access.
The processor also includes an Integrated Programmable Interrupt Controller (IPIC), which provides a rapid response to events from peripherals external to the core. Internally, the SCR1 has sources of interrupts, such as the timer, which operates according to RISC-V standards and is directly connected to the Interrupt Controller (IT CTRL).
The IPIC differs from the IT CTRL in that it manages all interrupt codes, both internal and external, but cannot differentiate between various external sources. The RISC-V architecture includes only one code for external interrupts, necessitating the IPIC to handle multiple external interrupt routines and prioritise them accordingly.
Lastly, the SCR1 includes a dedicated debug block and optionally uses a Tightly Coupled Memory (TCM) architecture. TCM integrates the Data Memory (DMEM) and Instruction Memory (IMEM) into a single unit close to the processor, reducing latency in accessing the main memory in systems using SRAM, effectively mimicking cache functionality.
gem5
the gem5 simulator
The gem5 simulator is an open-source project, maintained and utilised by numerous individuals and entities, including major tech companies like Google and ARM, as well as prominent universities worldwide. The project is continuously evolving, with its feature set expanding over the years. The gem5 simulator enables the modelling via C++ and interaction of computer system components such as processors, memory, and buses. Each component can be represented with varying levels of detail, offering a wide range of architectural choices.
Using gem5
To understand how gem5 is used, let’s simplify the process of creating a model within this simulator.
The gem5 simulator primarily uses C++, with some Python for configuration. It is object-oriented, with the C++ code consisting of a series of classes. These classes represent both abstract objects, such as the system clock frequency or instruction set architecture, and concrete objects like DDR3 RAM or a processor. These objects are known as SimObjects. The Python code acts as configuration files, linking and connecting the various C++ objects and initiating the simulation.
The first step in gem5 is to create the “system,” an object that defines the simulation environment:
=
Next, you can add other objects as attributes to the system. For example, let’s add a clock frequency:
=
=
This clock frequency will be used by other simulated objects. A processor model named RiscvMinorCpu
, which uses the RISC-V instruction set, is an example:
=
Now, let’s connect a memory bus. The gem5 project modifies the assignment operator =
to connect ports:
=
=
=
This model is illustrated below:
Modelling!
The gem5 simulator is designed for more generic computer systems. The processors it models often support multicore architectures, MMU, etc. It was necessary to modify the code to make it as close as possible to the MCU and remove complex functionalities from the “ready-to-use” models written in the gem5 project.
The core model is derived from the “Minor” CPU model. Minor is a processor model with configurable execution behaviour. For example, you can modify the execution delay of an integer-processing instruction or memory write.
To better model SCR1, several changes were applied, such as using the RV32I instruction set instead of RV64. Several processing units not present in the SCR1 but modelled in gem5’s Minor processor model were removed. These units handled floating-point operations, vectors, etc. The figure bellow shows the minimised instruction set of the SCR1 compared to the standard RV32 and RV64:
The simulator allows configuring numerous parameters in the model to achieve performance close to the real system. For example, the SCR1 has a latency of 34 cycles during the execution of an integer division. To reflect this in the model, it is sufficient to configure an attribute of a given Python class in the gem5 library. This class represents the integer division processing unit:
= 34
SCR1IntDivFU
will be reused in the core class definition, allowing the simulator to know how long a division should take.
IPIC and the CSR
To model these modules, it was necessary to create SimObjects
, some of which were already implemented in the gem5 library. The technical specifications of the SCR1 describe registers dedicated to the IPIC in the range [0xBF0:0xBF7]
.
This range [0xBF0:0xBF7]
does not correspond to a usual memory range, but to a zone within the Control and Status Registers (CSR). The CSRs are special registers in the RISC-V architecture. They are defined in the Zicsr extension. These are internal registers that control interrupts, exceptions, and other crucial aspects of the processor’s state and operation. These are already modelled in the gem5 simulator, but in a standard way as described in the RISC-V manual.
Address | Name |
---|---|
0x7E0 | MCOUNTEN |
0xBF0..0xBF7 | IPIC Registers |
However, if the “standard” CSRs were already modeled, the SCR1 processor contains its own registers dedicated to the IPIC. This is not accounted for by the gem5 models because it is a non-standard implementation, specific to the SCR1. Fortunately, thanks to the open-source philosophy of gem5, finding the commit that adds the part of the code modeling the CSRs was quite simple. A class SCR1ISA
was created to model this modification of the RISC-V instruction set. This class inherits from the RiscvISA
class provided by gem5 and adds the logic related to the IPIC. These registers contain the parameters of the external interrupts, such as activation on the rising or falling edge, or the activation of the interrupt itself.
Header file of the custom RISC-V ISA
// namespace gem5
/* DEV_SCR1_SCR1_ISA_HH */
C++ definitions of the custom RISC-V ISA
// namespace gem5
These registers allowed solving the problem described in the specifications of part 2. The execution of the program “Hello World” is now possible with this model.
;
Memory and memory bus
The SCR1 does not have a cache. It only includes DMEM and IMEM, which are directly part of the SRAM. There is no additional layer, such as an MMU. However, it is necessary to implement the cache hierarchy. But the model does not actually have one. This might seem confusing, but it is a side effect of gem5, which prefers generic systems. Most systems have a cache hierarchy with L1, L2, etc. Instead of a “real” cache, there is the SCR1NoCache
SimObject
. This is a memory bus that connects the core and memory to retrieve, load, and write directly to the various registers of the SCR1. SCR1NoCache
is a subclass of NoCache
in gem5. Some functionalities have been modified to match SCR1, such as the removal of an MMU. Additionally, if the bus receives an instruction to read or write to a non-modelled address, it will read or write a null value. This allows testing numerous functionalities without worrying about having an ideal model.
# […]
= # […]
Timers
The table below shows the registers used by a timer as defined in the RISC-V architecture of the SCR1.
Address | Name |
---|---|
TIMER_BASE + 0x00 | TIMER_CTRL |
TIMER_BASE + 0x04 | TIMER_DIV |
TIMER_BASE + 0x08 | MTIME |
TIMER_BASE + 0x0C | MTIMEH |
TIMER_BASE + 0x10 | MTIMECMP |
TIMER_BASE + 0x14 | MTIMECMPH |
However, the timer requires more detailed modelling. The MTIME
/MTIMEH
registers represent the timer’s counter value, which is incremented with each clock cycle. The TIMER_CTRL
register contains a bit that enables or disables this increment. TIMER_DIV
allows the counter to be incremented at a slower rate than the clock. Finally, the MTIMECMP
and MTIMECMPH
registers represent the value at which, when the combination of MTIME
and MTIMEH
exceeds this value, the timer sends a signal to the IT CTRL.
The modelling of these timers allowed the execution of another test firmware: a program named “PWM”. This program uses the timer increments to generate a PWM by periodically writing to a register in the MailBox.
Connecting everything
We have the blocks, now we need to solder them together:
# Creating the system where the model will run in
=
=
=
=
# CPU
=
# Memory Bus
=
=
=
=
# Connecting the CPU and the memory bus
=
=
# Memory
=
=
# Connecting the memory parts to the memory bus
= .
=
# IPIC
=
# Timers
=
=
# Mailbox
=
=
# System port
=
# Setup the workload to run
=
=
=
=
# Launching the simulation
=
=
Results
To ensure the model functions correctly, it is necessary to run gem5 with the configuration file scr1-fs.py
. This file initialises and connects all the elements required for the simulation of the SCR1. Among these elements, the binary file that the processor model should execute is essential. For the hello_world
program, the following function is used:
1 void
To simulate scr1 executing hello_world
, you must enter the following command:
The debug flags allow you to see the RISC-V instructions executed in real-time. The following result is obtained:
system.cpu: 0x200200 @start : csrrci zero, mstatus8
system.cpu: 0x200204 @start+4 : csrrwi zero, mie0
system.cpu: 0x200208 @start+8 : lui a0, 512
system.cpu: 0x20020c @start+12 : jalr zero, 528(a0)
system.cpu: 0x200210 @start_real : auipc gp, 256
system.cpu: 0x200214 @start_real+4 : addi gp, gp, 1520
[...]
system.cpu: 0x200300 @main : addi sp, sp, -32
system.cpu: 0x200304 @main+4 : lui a5, 513
[...]
system.cpu: 0x2003c4 @print_hello_world : lui a0, 2749
system.cpu: 0x2003c8 @print_hello_world+4 : addi a0, a0, -529
system.cpu: 0x2003cc @print_hello_world+8 : jal zero, 4892
system.cpu: 0x2016e8 @set_mbx_status_ptr : addi a1, a0, 0
Here, we see the model successfully executing a hello_world
.
Additionally, the interrupts are also functional:
system.mailbox: Write 0xabcdef into STATUS_PTR
system.mailbox: Posting interrupt
system.cpu.interrupts: Interrupt 3:0 posted
Finally, to define the model’s performance, it is interesting to compare the simulated time with the elapsed time on the host machine during the simulation:
Simulated Time | Elapsed Time on Host |
---|---|
1 s | 28.54 s |