LEON Floating-Point Unit
Overview
The daiteq FPU (daiFPU) is an IEEE Std. 754 (2019) compliant floating-point unit, designed primarily for LEON processors as a replacement of the former Meiko FPU. The daiFPU supports binary64, binary32, binary16 formats and their combinations, including full hardware support for subnormal numbers. The unit consists of a floating-point datapath and a floating-point controller. The datapath executes all floating-point arithmetic operations and format conversions. The controller manages data exchange between the LEON2 integer pipeline and the daiFPU. The controller also executes floating-point comparisons.
IEEE Std 754 | Abbreviation | Precision [b] | Partitioning |
---|---|---|---|
binary64 | DP | 53 | (1,11,52) |
binary32 | SP | 24 | (1,8,23) |
binary16 | HP | 11 | (1,5,10) |
N/A | PSP | 24 | ((1,8,23),(1,8,23)) |
N/A | PHP | 11 | ((1,5,10),(1,5,10)) |
The daiFPU is targeted to providing flexibility for the FPGA and ASIC technology used in satellite navigation and image processing applications. The key advantage is the ability to increase the actual functional density of the silicon used on board of satellites in the context of the actual on-board computations. This is done through allowing the user to parameterize the FPU at the synthesis time in a way to ensure the correct func- tion of the application while not using more resources than necessary. Classical FPUs used for example with the LEON processors are based on fixed data bus widths of 32 or 64 bits, often in situations where a reduced precision would be sufficient (e.g. 16 bits), also with operations that may not be used in their application. With the daiFPU the user can select seven major configurations (shown in the table below) at the synthesis time that support individual floating-point formats, their combinations, or packed floating-point formats. For each major configuration the user can specify whether floating-point division and square root should be supported.
Implementation | DP | SP | HP | PSP | PHP |
---|---|---|---|---|---|
Two-precision configurations | |||||
DAIFPU-DUAL-DPSP | Y | Y | |||
DAIFPU-DUAL-SPHP | Y | Y | |||
One-precision configurations | |||||
DAIFPU-DP | Y | ||||
DAIFPU-SP | Y | ||||
DAIFPU-HP | Y | ||||
Packed-word configurations | |||||
DAIFPU-PSP | Y | Y | |||
DAIFPU-PHP | Y | Y |
Packed operations are supported in some daiFPU configurations. They are defined for pairs of floating-point values stored in a single register (for two half-precision values stored in one single-precision floating-point register), or in a register pair of two consecutive registers (for two single-precision values stored in a pair of even-odd single-precision registers). Besides common SIMD processing on pairs of values new floating-point instructions have been implemented that support implementation of complex floating arithmetic for the packed formats. A simplified block diagram of the packed daiFPU is shown in the following figure.

One-precision, packed format FPU - a configuration that supports packed and non-packed operations for one precision.
For packed word operations the result is computed as the selected operation performed independently on the upper sub-words and lower sub-words. Exceptions and flags are computed as logical OR of the exceptions and flags generated for the upper and lower word.
Validation
Validation of the daiFPU has been performed in these steps:
- Validation of individual FPU modules and operations in self-checking stand-alone testbenches. Test vectors were generated using the TestFloat tool that has been developed and distributed by John Hauser.
- Validation of the LEON2 / FPU integration using a simple C program that applies a limited number of TestFloat vectors on the FPU inputs and compares the result with a reference result stored in the TestFloat vectors.
- Validation of the LEON2 / FPU integration using the paranoia program originally developed by Prof. Kahan.
- Validation of correct floating-point results computed in LEON2 with daiFPU by comparing them to results of a desktop execution of an identical C program.
Availability
The daiFPU IP core is provided in the form of a synthesizable VHDL code or FPGA netlist. The IP core is available either separately or bundled together with the LEON2-FT processor. Deliverables include:
- VHDL-RTL code or gate-level netlist,
- testing environment,
- simulation scripts,
- golden reference test vectors,
- synthesis scripts,
- user documentation.
The IP core is guaranteed against defects for ninety days from the date of purchase. Thirty days of technical support over email and phone is included. Additional support and maintenance options are available.
Hardware Compatibility
The daiFPU is compatible with the following processors:
- LEON2 / LEON2-FT
- LEON3 / LEON3-FT
Software Compatibility
When used with LEON processors, the daiFPU is compatible with existing compilation toolchains in the configuration DAIFPU-DUAL-DPSP that supports the same floating-point operations as other common FPUs, e.g. Meiko or GRFPU.
For other daiFPU configurations, that is those that introduce new floating-point data types and/or operations, SPARCv8 llvm compiler and binutils with daiteq extensions are required to generate binary files with the new floating-point opcodes.
Xilinx Virtex7 Implentation Results
Implementation parameters for common daiFPU configurations implemented in Xilinx Virtex7 are shown in the following table.
Flavour | Slices | Slice regs | LUTs | LUTRAM | DSP48E1 | Freq [MHz] |
---|---|---|---|---|---|---|
LEON2 w/o FPU | ||||||
. | 2288 | 3359 | 6762 | 24 | 0 | 120 |
LEON2 w/ DAIFPU-DUAL-DPSP | ||||||
divsqrt | 6703 | 6573 | 15507 | 408 | 15 | 102 |
divonly | 5612 | 6109 | 14196 | 385 | 15 | 108 |
none | 5587 | 5773 | 12801 | 302 | 15 | 102 |
LEON2 w/ DAIFPU-DUAL-SPHP | ||||||
divsqrt | 4962 | 5667 | 11596 | 178 | 2 | 117 |
divonly | 3860 | 5472 | 11044 | 170 | 2 | 120 |
none | 4277 | 5064 | 10199 | 155 | 2 | 109 |
LEON2 w/ DAIFPU-DP | ||||||
divsqrt | 6008 | 5783 | 12261 | 348 | 15 | 113 |
divonly | 4849 | 5450 | 11616 | 318 | 15 | 112 |
none | 3840 | 5093 | 10569 | 252 | 15 | 120 |
LEON2 w/ DAIFPU-SP | ||||||
divsqrt | 3763 | 5058 | 10201 | 180 | 2 | 120 |
divonly | 4035 | 4805 | 9555 | 129 | 2 | 112 |
none | 3025 | 4689 | 9430 | 129 | 2 | 111 |
LEON2 w/ DAIFPU-HP | ||||||
divsqrt | 3096 | 4369 | 8813 | 96 | 1 | 118 |
divonly | 4062 | 4335 | 8589 | 85 | 1 | 114 |
none | 3268 | 4141 | 8250 | 80 | 1 | 120 |
LEON2 w/ DAIFPU-PSP | ||||||
divsqrt | 6568 | 6415 | 13106 | 303 | 4 | 109 |
divonly | 5895 | 6244 | 12541 | 249 | 4 | 116 |
none | 4007 | 5574 | 11405 | 231 | 4 | 109 |
LEON2 w/ DAIFPU-PHP | ||||||
divsqrt | 4772 | 5241 | 10337 | 170 | 2 | 115 |
divonly | 3763 | 5223 | 9480 | 162 | 2 | 112 |
none | 3932 | 4954 | 9247 | 151 | 2 | 112 |
MicroSemi PolarFire Implentation Results
Implementation parameters for common daiFPU configurations implemented in MicroSemi PolarFire are shown in the following table.
Flavour | Fabric 4LUT | Fabric DFF | uSRAM 1K | uSRAM 18K | Math (18x18) | Freq [MHz] |
---|---|---|---|---|---|---|
LEON2 w/o FPU | ||||||
. | 9437 | 2290 | 12 | 41 | 1 | 110 |
LEON2 w/ DAIFPU-DUAL-DPSP | ||||||
divsqrt | 25074 | 7099 | 12 | 41 | 13 | 105 |
divonly | 23118 | 6699 | 12 | 41 | 13 | 107 |
none | 20883 | 5950 | 12 | 41 | 13 | 109 |
LEON2 w/ DAIFPU-DUAL-SPHP | ||||||
divsqrt | 17486 | 4714 | 12 | 41 | 4 | 110 |
divonly | 16636 | 4502 | 12 | 41 | 4 | 110 |
none | 15804 | 4290 | 12 | 41 | 4 | 106 |
LEON2 w/ DAIFPU-DP | ||||||
divsqrt | 19299 | 5448 | 12 | 41 | 13 | 110 |
divonly | 18206 | 5127 | 12 | 41 | 13 | 110 |
none | 16308 | 4641 | 12 | 41 | 13 | 112 |
LEON2 w/ DAIFPU-SP | ||||||
divsqrt | 14701 | 3961 | 12 | 41 | 4 | 109 |
divonly | 14125 | 3788 | 12 | 41 | 4 | 110 |
none | 13202 | 3635 | 12 | 41 | 4 | 110 |
LEON2 w/ DAIFPU-HP | ||||||
divsqrt | 12562 | 3170 | 12 | 41 | 4 | 111 |
divonly | 12355 | 3170 | 12 | 41 | 4 | 108 |
none | 11949 | 2973 | 12 | 41 | 4 | 109 |
LEON2 w/ DAIFPU-PSP | ||||||
divsqrt | 19481 | 5721 | 12 | 41 | 7 | 110 |
divonly | 18383 | 5199 | 12 | 41 | 7 | 108 |
none | 17025 | 5122 | 12 | 41 | 7 | 108 |
LEON2 w/ DAIFPU-PHP | ||||||
divsqrt | 15285 | 4438 | 12 | 41 | 7 | 112 |
divonly | 14686 | 4268 | 12 | 41 | 7 | 110 |
none | 13888 | 4037 | 12 | 41 | 7 | 110 |
NanoXplore NG-Medium Implentation Results
Implementation parameters for common daiFPU configurations implemented in NanoXplore NG-Medium are shown in the following table.
Flavour | 4-LUT | DFF | XLUT | RFB | DSP | RAM | Freq [MHz] |
---|---|---|---|---|---|---|---|
LEON2 w/o FPU | |||||||
. | 8181 | 2637 | 0 | 0 | 1 | 28 | 29.035 |
LEON2 w/ DAIFPU-DUAL-DPSP | |||||||
divsqrt | N/A | . | . | . | . | . | . |
divonly | N/A | . | . | . | . | . | . |
none | 16763 | 5891 | 0 | 0 | 13 | 28 | 20.070 |
LEON2 w/ DAIFPU-DUAL-SPHP | |||||||
divsqrt | 14822 | 4990 | 0 | 0 | 4 | 28 | 21.179 |
divonly | 13942 | 4733 | 0 | 0 | 4 | 28 | 21.399 |
none | 13135 | 4406 | 0 | 0 | 4 | 28 | 26.527 |
LEON2 w/ DAIFPU-DP | |||||||
divsqrt | 16343 | 5833 | 0 | 0 | 13 | 28 | 25.692 |
divonly | 15158 | 5508 | 0 | 0 | 13 | 28 | 26.517 |
none | 13664 | 5011 | 0 | 0 | 13 | 28 | 22.171 |
LEON2 w/ DAIFPU-SP | |||||||
divsqrt | 12596 | 4300 | 0 | 0 | 4 | 28 | 23.755 |
divonly | 11953 | 4130 | 0 | 0 | 4 | 28 | 28.420 |
none | 11391 | 3877 | 0 | 0 | 4 | 28 | 25.372 |
LEON2 w/ DAIFPU-HP | |||||||
divsqrt | 11100 | 3686 | 0 | 0 | 3 | 28 | 25.277 |
divonly | 10760 | 3591 | 0 | 0 | 3 | 28 | 28.260 |
none | 10460 | 3454 | 0 | 0 | 3 | 28 | 26.159 |
LEON2 w/ DAIFPU-PSP | |||||||
divsqrt | 16574 | 5828 | 0 | 0 | 7 | 28 | 24.907 |
divonly | 15447 | 5488 | 0 | 0 | 7 | 28 | 22.340 |
none | 14198 | 4984 | 0 | 0 | 7 | 28 | 25.556 |
LEON2 w/ DAIFPU-PHP | |||||||
divsqrt | 13441 | 4610 | 0 | 0 | 5 | 28 | 23.911 |
divonly | 12757 | 4420 | 0 | 0 | 5 | 28 | 25.509 |
none | 12112 | 4148 | 0 | 0 | 5 | 28 | 26.256 |
Floating-Point Performance
The performance of the daiteq FPU measured with the Whetstone and Linpack benchmarks is shown in the following table.
The benchmarks were compiled to test computations in the three common precisions - double, single and half (or binary64, binary32 and binary16).
Values shown in italics correspond to floating-point computations performed in software using John Hauser’s SoftFloat library (release 3e) in daiFPU configurations that do not support the corresponding precisions in hardware.
Benchmark | Unit | AT697 | LEON2-FT w/ daiFPU configuration | |||
---|---|---|---|---|---|---|
. | . | Meiko | DAIFPU-DUAL-DPSP | DAIFPU-DP | DAIFPU-SP | DAIFPU-HP |
BCC1 gcc-3.4.4 | ||||||
whetstone-dp | kWIPS/MHz | 283.00 | 288.81 | N/A | N/A | N/A |
whetstone-sp | kWIPS/MHz | 322.89 | 323.73 | N/A | N/A | N/A |
linpack-dp-rolled | kFLOPS/MHz | 48.85 | 55.27 | N/A | N/A | N/A |
linpack-sp-rolled | kFLOPS/MHz | 82.06 | 72.13 | N/A | N/A | N/A |
linpack-dp-unrolled | kFLOPS/MHz | 51.31 | 58.67 | N/A | N/A | N/A |
linpack-sp-unrolled | kFLOPS/MHz | 81.11 | 79.19 | N/A | N/A | N/A |
daiteq llvm10 | ||||||
whetstone-dp | kWIPS/MHz | 261.68 | 298.25 | 288.4 | 12.92 | 12.94 |
whetstone-sp | kWIPS/MHz | 445.71 | 451.13 | 32.06 | 78.65 | 24.52 |
whetstone-hp | kWIPS/MHz | N/A | 307.5 | 149.14 | 311.25 | 259.21 |
linpack-dp-rolled | kFLOPS/MHz | 49.25 | 54.55 | 53.08 | 2.3 | 2.29 |
linpack-sp-rolled | kFLOPS/MHz | 83.49 | 71.3 | 3.42 | 71.29 | 3.34 |
linpack-hp-rolled | kFLOPS/MHz | N/A | 4.61 | 4.53 | 4.53 | 72.56 |
linpack-dp-unrolled | kFLOPS/MHz | 49.51 | 59.4 | 57.21 | 2.35 | 2.34 |
linpack-sp-unrolled | kFLOPS/MHz | 84.05 | 78.2 | 3.44 | 78.17 | 3.43 |
linpack-hp-unrolled | kFLOPS/MHz | N/A | 4.46 | 4.5 | 4.49 | 81.24 |
Notes:
- Values shown in italics correspond to execution of floating-point operations in the SoftFloat library.
- The low performance of the single-precision Whetstone benchmark computed with DAIFPU-SP (value shown in bold) is caused by the fact that single-precision trigonometric functions use double-precision operations internally to compute the approximations (e.g. the OpenLibm library).