LEON Floating-Point Unit

Overview

The daiteq FPU (daiFPU) is an IEEE Std. 754 (2019) compliant floating-point unit, designed primarily for LEON processors as a replacement of the former Meiko FPU. The daiFPU supports binary64, binary32, binary16 formats and their combinations, including full hardware support for subnormal numbers. The unit consists of a floating-point datapath and a floating-point controller. The datapath executes all floating-point arithmetic operations and format conversions. The controller manages data exchange between the LEON2 integer pipeline and the daiFPU. The controller also executes floating-point comparisons.

Supported precisions.
IEEE Std 754 Abbreviation Precision [b] Partitioning
binary64 DP 53 (1,11,52)
binary32 SP 24 (1,8,23)
binary16 HP 11 (1,5,10)
N/A PSP 24 ((1,8,23),(1,8,23))
N/A PHP 11 ((1,5,10),(1,5,10))

The daiFPU is targeted to providing flexibility for the FPGA and ASIC technology used in satellite navigation and image processing applications. The key advantage is the ability to increase the actual functional density of the silicon used on board of satellites in the context of the actual on-board computations. This is done through allowing the user to parameterize the FPU at the synthesis time in a way to ensure the correct func- tion of the application while not using more resources than necessary. Classical FPUs used for example with the LEON processors are based on fixed data bus widths of 32 or 64 bits, often in situations where a reduced precision would be sufficient (e.g. 16 bits), also with operations that may not be used in their application. With the daiFPU the user can select seven major configurations (shown in the table below) at the synthesis time that support individual floating-point formats, their combinations, or packed floating-point formats. For each major configuration the user can specify whether floating-point division and square root should be supported.

FPU configurations.
Implementation DP SP HP PSP PHP
Two-precision configurations
DAIFPU-DUAL-DPSP Y Y      
DAIFPU-DUAL-SPHP   Y Y    
One-precision configurations
DAIFPU-DP Y        
DAIFPU-SP   Y      
DAIFPU-HP     Y    
Packed-word configurations
DAIFPU-PSP   Y   Y  
DAIFPU-PHP     Y   Y

Packed operations are supported in some daiFPU configurations. They are defined for pairs of floating-point values stored in a single register (for two half-precision values stored in one single-precision floating-point register), or in a register pair of two consecutive registers (for two single-precision values stored in a pair of even-odd single-precision registers). Besides common SIMD processing on pairs of values new floating-point instructions have been implemented that support implementation of complex floating arithmetic for the packed formats. A simplified block diagram of the packed daiFPU is shown in the following figure.

fpu_p_v3.png

One-precision, packed format FPU - a configuration that supports packed and non-packed operations for one precision.

For packed word operations the result is computed as the selected operation performed independently on the upper sub-words and lower sub-words. Exceptions and flags are computed as logical OR of the exceptions and flags generated for the upper and lower word.

Validation

Validation of the daiFPU has been performed in these steps:

  1. Validation of individual FPU modules and operations in self-checking stand-alone testbenches. Test vectors were generated using the TestFloat tool that has been developed and distributed by John Hauser.
  2. Validation of the LEON2 / FPU integration using a simple C program that applies a limited number of TestFloat vectors on the FPU inputs and compares the result with a reference result stored in the TestFloat vectors.
  3. Validation of the LEON2 / FPU integration using the paranoia program originally developed by Prof. Kahan.
  4. Validation of correct floating-point results computed in LEON2 with daiFPU by comparing them to results of a desktop execution of an identical C program.

Availability

The daiFPU IP core is provided in the form of a synthesizable VHDL code or FPGA netlist. The IP core is available either separately or bundled together with the LEON2-FT processor. Deliverables include:

  • VHDL-RTL code or gate-level netlist,
  • testing environment,
  • simulation scripts,
  • golden reference test vectors,
  • synthesis scripts,
  • user documentation.

The IP core is guaranteed against defects for ninety days from the date of purchase. Thirty days of technical support over email and phone is included. Additional support and maintenance options are available.

Hardware Compatibility

The daiFPU is compatible with the following processors:

  • LEON2 / LEON2-FT
  • LEON3 / LEON3-FT

Software Compatibility

When used with LEON processors, the daiFPU is compatible with existing compilation toolchains in the configuration DAIFPU-DUAL-DPSP that supports the same floating-point operations as other common FPUs, e.g. Meiko or GRFPU.

For other daiFPU configurations, that is those that introduce new floating-point data types and/or operations, SPARCv8 llvm compiler and binutils with daiteq extensions are required to generate binary files with the new floating-point opcodes.

Xilinx Virtex7 Implentation Results

Implementation parameters for common daiFPU configurations implemented in Xilinx Virtex7 are shown in the following table.

Resource requirements for the LEON2 processor core with common daiFPU configurations implemented in Xilinx Virtex7.
Flavour Slices Slice regs LUTs LUTRAM DSP48E1 Freq [MHz]
LEON2 w/o FPU
. 2288 3359 6762 24 0 120
LEON2 w/ DAIFPU-DUAL-DPSP
divsqrt 6703 6573 15507 408 15 102
divonly 5612 6109 14196 385 15 108
none 5587 5773 12801 302 15 102
LEON2 w/ DAIFPU-DUAL-SPHP
divsqrt 4962 5667 11596 178 2 117
divonly 3860 5472 11044 170 2 120
none 4277 5064 10199 155 2 109
LEON2 w/ DAIFPU-DP
divsqrt 6008 5783 12261 348 15 113
divonly 4849 5450 11616 318 15 112
none 3840 5093 10569 252 15 120
LEON2 w/ DAIFPU-SP
divsqrt 3763 5058 10201 180 2 120
divonly 4035 4805 9555 129 2 112
none 3025 4689 9430 129 2 111
LEON2 w/ DAIFPU-HP
divsqrt 3096 4369 8813 96 1 118
divonly 4062 4335 8589 85 1 114
none 3268 4141 8250 80 1 120
LEON2 w/ DAIFPU-PSP
divsqrt 6568 6415 13106 303 4 109
divonly 5895 6244 12541 249 4 116
none 4007 5574 11405 231 4 109
LEON2 w/ DAIFPU-PHP
divsqrt 4772 5241 10337 170 2 115
divonly 3763 5223 9480 162 2 112
none 3932 4954 9247 151 2 112

MicroSemi PolarFire Implentation Results

Implementation parameters for common daiFPU configurations implemented in MicroSemi PolarFire are shown in the following table.

Resource requirements for the LEON2 processor core with common daiFPU configurations implemented in MicroSemi PolarFire.
Flavour Fabric 4LUT Fabric DFF uSRAM 1K uSRAM 18K Math (18x18) Freq [MHz]
LEON2 w/o FPU
. 9437 2290 12 41 1 110
LEON2 w/ DAIFPU-DUAL-DPSP
divsqrt 25074 7099 12 41 13 105
divonly 23118 6699 12 41 13 107
none 20883 5950 12 41 13 109
LEON2 w/ DAIFPU-DUAL-SPHP
divsqrt 17486 4714 12 41 4 110
divonly 16636 4502 12 41 4 110
none 15804 4290 12 41 4 106
LEON2 w/ DAIFPU-DP
divsqrt 19299 5448 12 41 13 110
divonly 18206 5127 12 41 13 110
none 16308 4641 12 41 13 112
LEON2 w/ DAIFPU-SP
divsqrt 14701 3961 12 41 4 109
divonly 14125 3788 12 41 4 110
none 13202 3635 12 41 4 110
LEON2 w/ DAIFPU-HP
divsqrt 12562 3170 12 41 4 111
divonly 12355 3170 12 41 4 108
none 11949 2973 12 41 4 109
LEON2 w/ DAIFPU-PSP
divsqrt 19481 5721 12 41 7 110
divonly 18383 5199 12 41 7 108
none 17025 5122 12 41 7 108
LEON2 w/ DAIFPU-PHP
divsqrt 15285 4438 12 41 7 112
divonly 14686 4268 12 41 7 110
none 13888 4037 12 41 7 110

NanoXplore NG-Medium Implentation Results

Implementation parameters for common daiFPU configurations implemented in NanoXplore NG-Medium are shown in the following table.

Resource requirements for the LEON2 processor with common daiFPU configurations implemented in NanoXplore NG-Medium.
Flavour 4-LUT DFF XLUT RFB DSP RAM Freq [MHz]
LEON2 w/o FPU
. 8181 2637 0 0 1 28 29.035
LEON2 w/ DAIFPU-DUAL-DPSP
divsqrt N/A . . . . . .
divonly N/A . . . . . .
none 16763 5891 0 0 13 28 20.070
LEON2 w/ DAIFPU-DUAL-SPHP
divsqrt 14822 4990 0 0 4 28 21.179
divonly 13942 4733 0 0 4 28 21.399
none 13135 4406 0 0 4 28 26.527
LEON2 w/ DAIFPU-DP
divsqrt 16343 5833 0 0 13 28 25.692
divonly 15158 5508 0 0 13 28 26.517
none 13664 5011 0 0 13 28 22.171
LEON2 w/ DAIFPU-SP
divsqrt 12596 4300 0 0 4 28 23.755
divonly 11953 4130 0 0 4 28 28.420
none 11391 3877 0 0 4 28 25.372
LEON2 w/ DAIFPU-HP
divsqrt 11100 3686 0 0 3 28 25.277
divonly 10760 3591 0 0 3 28 28.260
none 10460 3454 0 0 3 28 26.159
LEON2 w/ DAIFPU-PSP
divsqrt 16574 5828 0 0 7 28 24.907
divonly 15447 5488 0 0 7 28 22.340
none 14198 4984 0 0 7 28 25.556
LEON2 w/ DAIFPU-PHP
divsqrt 13441 4610 0 0 5 28 23.911
divonly 12757 4420 0 0 5 28 25.509
none 12112 4148 0 0 5 28 26.256

Floating-Point Performance

The performance of the daiteq FPU measured with the Whetstone and Linpack benchmarks is shown in the following table.

The benchmarks were compiled to test computations in the three common precisions - double, single and half (or binary64, binary32 and binary16).

Values shown in italics correspond to floating-point computations performed in software using John Hauser’s SoftFloat library (release 3e) in daiFPU configurations that do not support the corresponding precisions in hardware.

daiFPU performance measured with the Whetstone and Linpack benchmarks.
Benchmark Unit AT697 LEON2-FT w/ daiFPU configuration
. . Meiko DAIFPU-DUAL-DPSP DAIFPU-DP DAIFPU-SP DAIFPU-HP
BCC1 gcc-3.4.4
whetstone-dp kWIPS/MHz 283.00 288.81 N/A N/A N/A
whetstone-sp kWIPS/MHz 322.89 323.73 N/A N/A N/A
linpack-dp-rolled kFLOPS/MHz 48.85 55.27 N/A N/A N/A
linpack-sp-rolled kFLOPS/MHz 82.06 72.13 N/A N/A N/A
linpack-dp-unrolled kFLOPS/MHz 51.31 58.67 N/A N/A N/A
linpack-sp-unrolled kFLOPS/MHz 81.11 79.19 N/A N/A N/A
daiteq llvm10
whetstone-dp kWIPS/MHz 261.68 298.25 288.4 12.92 12.94
whetstone-sp kWIPS/MHz 445.71 451.13 32.06 78.65 24.52
whetstone-hp kWIPS/MHz N/A 307.5 149.14 311.25 259.21
linpack-dp-rolled kFLOPS/MHz 49.25 54.55 53.08 2.3 2.29
linpack-sp-rolled kFLOPS/MHz 83.49 71.3 3.42 71.29 3.34
linpack-hp-rolled kFLOPS/MHz N/A 4.61 4.53 4.53 72.56
linpack-dp-unrolled kFLOPS/MHz 49.51 59.4 57.21 2.35 2.34
linpack-sp-unrolled kFLOPS/MHz 84.05 78.2 3.44 78.17 3.43
linpack-hp-unrolled kFLOPS/MHz N/A 4.46 4.5 4.49 81.24

Notes:

  1. Values shown in italics correspond to execution of floating-point operations in the SoftFloat library.
  2. The low performance of the single-precision Whetstone benchmark computed with DAIFPU-SP (value shown in bold) is caused by the fact that single-precision trigonometric functions use double-precision operations internally to compute the approximations (e.g. the OpenLibm library).