^ Blog index    << Preprocessor C - Advanced Usage    >> Broadcom VideoCoreIV 3D, Architecture from GPGPU Perspective
System on Chip Broadcom BCM2835 (BCM2836, BCM2837) in Raspberry Pi
2017-07-19   Piotr Romaniuk, Ph.D.
Architecure in brief,
VPU and some history,
QPU and 3D pipeline,
Booting & bare metal,
Raspberry Pi board is built on interesting System on chip (SoC) manufactured by Broadcom company.
Consecutive versions of the board use very similar versions of the chip marked as BCM2835, BCM2836, BCM2837.
The chips are multmedia processors, i.e. they contain application processor and a lot of
hardware that supports multimedia processing (e.g. video de/compression). These Broadcom chips are part of VideoCore IV family.
Unfortunatelly, manufacturer does not provide detailed documentation of the chips (i.e. BCM283x), so searching by the chip symbol on Broadcom site ends with a message "No results found".
Buing the Raspberry-Pi, the user obtains working Linux on ARM core (application processor in BCM2835) and very misterious, multistage booting process, where binary parts are loaded from SD card and executed before system kernel is started.
More information about the chip may be found on Internet. The research resembles
archeological digging and puting together found parts of broken ancient vase.
Nevetheless, it is an opportunity to learn more about the chip.
Without any doubt, the best site about that is a site created by programmer that uses nickname Herman H. Hermitage. He was very determined and did a lot for
providing information about the chip.
On his github site, on profile page, there is a true photo of Konrad Zuse - real person that was german inventor and scientist in computer science during World War II.
It is interesting that, the current programmer created "his" profile on linkedin as well.
This site was created from available information on the Internet.
All information and links provided here may be useful for making some custom
booting tasks (e.g. when fast actions must be executed before Linux has been started).
Another area of application may be GPGPU - application of graphic processor for accelerating calculation.
It is also possible to start another microkernel on VPU for some specific purpose (it has
happened already, VPU executes ThreadX application that controls graphic part).
Architecture in brief
Architecture of the chip can be illustrated by following diagram:
Figure 1. Architecture of BCM283x chip
- CPU - Host processor - main, application processor (ARM architecture) that runs Linux
- VPU x2 - dual core vector processor for multimedia processing
- QPU x4x4- a set of cores, a part of 3D graphic pipeline.
- Peripherals - external interfaces like: gpio, spi, uart, pwm etc.
- LPDDR2 - SDRAM memory controller
- DMA - DMA controller
- ISP - camera and lens correction
- Cache L2 - cache memory, level 2
- AXI/ARB System Bus - the bus that interconnects all moduels of the chip
VPU and some history
The chip BCM283x is multimedia processor. It contains VPU - dual-core processor, that
seams to be undocummented. VPU is important because booting process starts on it, at this stage VPU
is responsible for SDRAM initialisation and starting CPU (ARM processor).
During later work of the system, VPU is used for multimedia processing (including video).
The chip architecture has its origin in structure proposed by Alphamosaic Ltd, that has been
bought by Broadcom in 2004.
Authors of VPU architecture patented ("H.Hermitage" has noticed it) many solutions
applied in this core, hence the detailes of the architecture can be found in the patents.
Figure 2. Single core architecture of VPU - this figure is obtained from patent US7043618
That source and some experiments lead a group of interested programmers to
development of basic tools (GNU C compiler, assembler, etc.) and good
description of VPU:
QPU and graphic 3D pipeline
The best documented part of the chip is its 3D graphic module. The manufacturer published documentation, that contains information about graphic pipeline,
detailed description of QPU core, and many usefull aspects of its use:
Beside of the main role of this part of the chip, i.e. 3D graphics processing, it can be
used for extra parallel calculations that are independent to CPU and may support it.
More information may be found on following sites:
- "Hacking the GPU for Fun and Profit" - presents basics of QPU programming, starting from minimal hello_world example and finishing on a program that calculates SHA-256.
(source code). This contains unfinished assembler.
- FFT calculation - full source code that illustrates advanced techniques of QPU programming:
how to start a QPU program from C, passing the data between CPU and QPUs,
core synchronisation, multicore program, calling subroutine, pipe influence,
Sources are present in Raspbianie in /opt/vc/src/hello_pi/hello_fft and are a part
of firmware repository.
There are shaders (QPU programs) in binary and source form.
If someone would build them, VC4ASM should be used for this purpose. The binary code
from firmware and created by 0.2.3 macroassembler have minor differences, that do not affect execution -
registers are cleared by different opcode.
- VC4ASM (by Marcel Muller) - the best macroassembler for QPU (github project). Building it under Raspbian on raspberry-pi is trivial, under windows it requires some tunning of source and configuration (it can be build under MinGW or Cygwin). Building for another Linux than Raspbian
requires proper version of C++ compiler - at least C++11.
- QPULib - interesting definition of
QPU language (by analogy to CUDA) and its compiler. All is based
on new data types, C++ templates and compilation in runtime. Compilation is done by kernel execution (QPU kernel, not Linux kernel) on CPU with overloaded operators, so
Abstract Syntax Tree created by CPU compiler can be replicated into local structures and further compiled to QPU code.
- PyVideoCore (by Koichi Nakamura) - GPGPU library for Python
All BCM283x chips use versions of ARM core for executing Linux (see versions and differences).
Detailed description of each ARM version may be found on sites of ARM Infocenter.
Additional information is provided on raspberrypi.org, description of interrupts and syncronisation of multiple ARM cores in BCM2836
BCM283x contains multiple interfaces:
- serial ports UART, SPI, I2C (called in documenation: BSC - Broadcom Serial Controller)
- External MMC
Description of these periperals can be found in: BCM2835 Peripherals document. There is also information about
interrupt controller, coresponding address spaces of VPU and CPU (inluding phisical and virtual ones)
Booting & Bare metal
Booting the chip starts from executing first stage bootloader located in SoC ROM.
The code is executed by VPU processor, while CPU is hold in reset state (is locked).
Not going into details of multisource booting (see that description on raspberry.org),
bootcode.bin is loaded into RAM (Cache L2) and executed. This program is responsible for SDRAM initiation, clock configuration and starting CPU.
Next, start.elf is started, reads config.txt, cmdline.txt and starts kernel.img (means starts Linux)
Bootcode.bin, start.elf are provided in binary form.
Nevertheless, there are trials of replacing it and executing own code:
 Raspberry Pi repository - sources of Linux, binary firmware files, userspace libraries for QPU interfacing,
 RPi Hub - Raspberry Pi Wiki - a lot of information about Raspberry Pi board
 BCM2835 Datasheet Errata