daiFPU - Configurable Floating-Point Unit
for LEON and NOEL-V

Overview

The daiteq FPU (daiFPU) is an IEEE Std. 754 (2019) compliant floating-point unit, designed for the LEON and NOEL-V processors. The daiFPU supports binary64, binary32, binary16 formats and their combinations, including full hardware support for subnormal numbers. The unit consists of a floating-point datapath and a floating-point controller. The datapath executes all floating-point arithmetic operations and format conversions. The controller manages data exchange between the integer pipeline and the daiFPU. The controller also executes floating-point comparisons.

Supported precisions.
IEEE Std 754 Abbreviation Precision [b] Partitioning
binary64 DP 53 (1,11,52)
binary32 SP 24 (1,8,23)
binary16 HP 11 (1,5,10)
N/A PSP 24 ((1,8,23),(1,8,23))
N/A PHP 11 ((1,5,10),(1,5,10))

The daiFPU is targeted to providing flexibility for the FPGA and ASIC technology used in satellite navigation, deep learning and audio/video processing applications. The key advantage is the ability to increase the actual functional density of the silicon used on board of satellites in the context of the actual on-board computations. This is done through allowing the user to parameterize the FPU at the synthesis time in a way to ensure the correct function of the application while not using more resources than necessary. Classical FPUs used for example with the LEON processors are based on fixed data bus widths of 32 or 64 bits, often in situations where a reduced precision would be sufficient (e.g. 16 bits), also with operations that may not be used in their application. With the daiFPU the user can select seven major configurations (shown in the table below) at the synthesis time that support individual floating-point formats, their combinations, or packed floating-point formats. For each major configuration the user can specify whether floating-point division and square root should be supported.

FPU configurations.
Implementation DP SP HP PSP PHP
Two-precision configurations
DAIFPU-DUAL-DPSP Y Y      
DAIFPU-DUAL-SPHP   Y Y    
One-precision configurations
DAIFPU-DP Y        
DAIFPU-SP   Y      
DAIFPU-HP     Y    
Packed-word configurations
DAIFPU-PSP   Y   Y  
DAIFPU-PHP     Y   Y

Packed operations are supported in some daiFPU configurations. They are defined for pairs of floating-point values stored in a single register (for two half-precision values stored in one single-precision floating-point register), or in a register pair of two consecutive registers (for two single-precision values stored in a pair of even-odd single-precision registers). Besides common SIMD processing on pairs of values new floating-point instructions have been implemented that support implementation of complex floating arithmetic for the packed formats.

For packed word operations the result is computed as the selected operation performed independently on the upper sub-words and lower sub-words. Exceptions and flags are computed as logical OR of the exceptions and flags generated for the upper and lower word.

Validation

Validation of the daiFPU has been performed in these steps:

  1. Validation of individual FPU modules and operations in self-checking stand-alone testbenches. Test vectors were generated using the TestFloat tool that has been developed and distributed by John Hauser.
  2. Validation of the FPU integration with the integer pipeline using a simple C program that applies a limited number of TestFloat vectors on the FPU inputs and compares the result with a reference result stored in the TestFloat vectors.
  3. Validation of the LEON2 / FPU integration using the paranoia program originally developed by Prof. Kahan.
  4. Validation of correct floating-point results computed in LEON2, LEON3 and NOEL-V with daiFPU by comparing them to results of a desktop execution of an identical C program.

Availability

The daiFPU IP core is provided in the form of a synthesizable VHDL code or FPGA netlist. The IP core is available either separately or bundled together with the LEON2-FT processor or the LEON3 and NOEL-V processors.

For the bundled options a separate license has to be obtained from the European Space Agency for the LEON2-FT processor, or from Cobham Gaisler AB for the GRLIB / LEON3 or NOEL-V package.

The deliverables include:

  • VHDL-RTL code or gate-level netlist,
  • testing environment,
  • simulation scripts,
  • golden reference test vectors,
  • synthesis scripts,
  • user documentation.

The IP core is guaranteed against defects for ninety days from the date of purchase. Thirty days of technical support over email and phone is included. Additional support and maintenance options are available.

Hardware Compatibility

The daiFPU is compatible with the following processors:

  • LEON2 / LEON2-FT
  • LEON3
  • NOEL-V

Software Compatibility

When used with LEON and NOEL-V processors, the daiFPU is compatible with existing compilation toolchains in the configuration DAIFPU-DUAL-DPSP that supports the same floating-point operations as other common FPUs, e.g. Meiko or GRFPU.

For other daiFPU configurations, that is those that introduce new floating-point data types and/or operations, SPARCv8 llvm compiler and binutils with daiteq extensions are required to generate binary files with the new floating-point opcodes.

Implentation Results

Indicative implementation results are provided for the daiFPU when implemented with the LEON2 processor in Xilinx Virtex7. For the LEON3 and NOEL-V processors and other FPGA families the results are similar.

daiFPU, resources used.

Flavour

Slices

Slice regs

LUTs

LUTRAM

DSP48E1

daifpu-dual-dpsp

divsqrt

3592

3402

9447

385

15

divonly

2832

2920

8120

362

15

none

2612

2588

6741

279

15

daifpu-dual-sphp

divsqrt

2011

2228

5197

155

2

divonly

1570

2022

4383

147

2

none

1509

1621

3735

132

2

daifpu-dp

divsqrt

2587

2581

6181

325

15

divonly

2090

2259

5258

295

15

none

1447

1921

4215

229

15

daifpu-sp

divsqrt

1244

1540

3261

157

2

divonly

1152

1394

2810

106

2

none

771

1195

2354

106

2

daifpu-hp

divsqrt

685

955

1824

73

1

divonly

687

899

1534

62

1

none

547

748

1327

57

1

daifpu-psp

divsqrt

2859

2954

6641

280

4

divonly

2561

2801

5701

226

4

none

1626

2172

4615

208

4

daifpu-php

divsqrt

1621

1834

3625

147

2

divonly

1186

1732

2961

139

2

none

1013

1440

2553

128

2

Floating-Point Performance

daiFPU performance for Whetstone and Linpack compared to other alternative LEON and NOEL-V FPUs.

Benchmark

Unit

LEON2-FT

LEON3

NOEL (RV64)

.

.

AT697 / Meiko

DAIFPU-DUAL-DPSP

GRFPU-lite

DAIFPU-DUAL-DPSP

GRFPU

nanofpunv

DAIFPU-DUAL-DPSP

GRFPUnv

whetstone-dp

kWIPS/MHz

261.68

298.25

241.49

309.05

429.10

141.44

299.27

539.82

whetstone-sp

kWIPS/MHz

445.71

451.13

391.58

461.46

620.45

187.67

312.65

539.08

linpack-dp-rolled

kFLOPS/MHz

49.25

54.55

37.46

57.68

49.45

25.60

48.80

96.63

linpack-sp-rolled

kFLOPS/MHz

83.49

71.3

55.30

67.44

69.94

31.60

51.10

101.43

linpack-dp-unrolled

kFLOPS/MHz

49.51

59.4

38.24

63.56

51.14

26.90

53.77

117.14

linpack-sp-unrolled

kFLOPS/MHz

84.05

78.2

59.20

76.44

76.87

33.62

56.59

125.32