Floating-Point Numeric Function Generators Based on Piecewise-Split EVMDDs

Shinobu Nagayama¹ Tsutomu Sasao² Jon T. Butler³

¹ Department of Computer and Network Engineering, Hiroshima City University
² Department of Computer Science and Electronics, Kyushu Institute of Technology
³ Department of Electrical and Computer Engineering, Naval Postgraduate School

Abstract

This paper proposes a new architecture for memory-based floating-point numeric function generators (NFGs). The design method uses piecewise-split edge-valued multi-valued decision diagrams (EVMDDs). To design NFGs with less memory size, we partition the domain of the floating-point function into segments, and represent the function using an EVMDD for each segment. By realizing each EVMDD with hardware, we obtain the floating-point NFG. This paper also presents an algorithm that partitions the domain by decomposing the edge-valued binary decision diagram (EVBDD) representing the whole floating-point function. Experimental results show that, for a single-precision floating-point function, our new NFG requires 40% to 65% less memory than any previous one for generic function.

1. Introduction

Numeric functions, such as trigonometric, logarithmic, square root, and combinations of these functions, are widely used in computer graphics, digital signal processing, communication systems, robotics, etc. [11]. In these applications, numeric functions are usually used as a basic operation, along with addition and multiplication. For fast computation of numeric functions, various hardware circuits, called numeric function generators (NFGs), have been proposed. Fixed-point representation of numeric functions is often adopted [1, 9, 14, 16, 18, 19, 21, 22]. However, for numeric functions with a wide domain and range, a fixed-point representation requires a large number of bits. This produces large NFGs. To represent a large real value with fewer bits, floating-point representation is often used. An IEEE standard for real values exists [5]. However, floating-point representation tends to produce complex and slow NFGs. Thus, the design of floating-point NFGs is especially hard, and only design methods for some numeric functions are known [2, 3, 6, 20, 24]. Since these design methods are intended only for specific functions, different functions need different design methods and architectures. Thus, an architecture for floating-point NFGs that can realize a large set of numeric functions is required, along with a systematic design method.

Large-capacity memories are now available. Like SRAM-based FPGAs, implementation of logic circuits using memory has become practical. Thus, we focus on memory-based floating-point NFGs that can realize various numeric functions by changing data in memory. A straightforward memory-based NFG for an arbitrary floating-point function \( f(x) \) is a single lookup table (LUT), in which the address is the binary representation of the value of \( x \) and the content of that address is the corresponding value of \( f(x) \). This NFG is very fast because the value of the function is obtained by only one table lookup. For high-precision computations, however, it requires too much memory. To design monotone floating-point numeric functions with less memory, we have proposed a memory-based NFG using an edge-valued multi-valued decision diagram (EVMDD) [15]. Unfortunately, it still requires a large memory for high-precision computations.

Therefore, this paper proposes a new architecture for memory-based NFGs that requires less memory. To design an NFG with less memory, we partition the domain of a floating-point function into segments, and represent the function using an EVMDD for each segment. By realizing each EVMDD with memory, we obtain a memory-based floating-point NFG. This paper also presents an algorithm that partitions the domain by decomposing the edge-valued binary decision diagram (EVBDD) representing the whole floating-point function.

This paper is organized as follows: Section 2 introduces a floating-point representation of a real-valued numeric function, and decision diagrams used in this paper. Section 3 presents piecewise-split EVMDDs, and an algorithm that partitions the domain of a floating-point function. Section 4 presents a new architecture for memory-based floating-point NFGs. Experimental results are shown in Section 5.
2. Preliminaries

2.1. Number Representation and Precision

This subsection defines a number representation and a precision to convert real functions into integer functions.

**Definition 1** Let $B = \{0, 1\}$. $Z$ be the set of the integers, and $R$ be the set of the real numbers. An $n$-input $m$-output logic function is a mapping: $B^n \rightarrow B^m$, an integer function is a mapping: $Z \rightarrow Z$, and a real function is a mapping: $R \rightarrow R$.

**Definition 2** The $n$-bit precision binary floating-point representation of a number $X$ is a binary $n$-tuple

$$X = (s, e_{a-1}, e_{a-2}, \ldots, e_0, d_{b-1}, d_{b-2}, \ldots, d_0)_2,$$

where $s \in B$ is the sign bit, $E = (e_{a-1}, e_{a-2}, \ldots, e_0)_2$ is the exponent, and $D = (d_{b-1}, d_{b-2}, \ldots, d_0)_2$ is the significand. $a$ and $b$ are the numbers of bits for the exponent and the significand respectively, and $n = a + b + 1$. The value of $X$ is shown in Table 1. When $|X| < 2^{2^{-a-1}}$, $X$ is a subnormal number, in which the exponent $E$ is biased by $E_i = 2^{a-1} - 2$, and the significand $D$ represents only fractional bits of a fixed-point value smaller than 1. When $2^a \leq |X|$, $X$ is infinity. When $X$ cannot be defined as a number, $X$ is represented as not a number (NaN). In other cases, $X$ is a normal number, in which the exponent $E$ is biased by $E_n = 2^{a-1} - 1$, and the significand $D$ represents only fractional bits of a fixed-point value that is larger than or equal to 1 and smaller than 2.

According to IEEE Standard 754-2008 [5], half (16-bit) precision has $a = 5$ and $b = 10$, single (32-bit) precision has $a = 8$ and $b = 23$, double (64-bit) precision has $a = 11$ and $b = 52$, and quad (128-bit) precision has $a = 15$ and $b = 112$.

Note that by using an $n$-bit precision floating-point representation, we can convert a real function into an $n$-input $n$-output logic function. The logic function, in turn, can be converted into an integer function by considering binary vectors as integers. That is, we can convert a real function into an integer function: $P_n \rightarrow P_n$, where $P_n = \{0, 1, \ldots, 2^n - 1\}$. In this paper, numeric functions are converted into integer functions by using a floating-point representation, unless stated otherwise. And, for simplicity, each bit in the floating-point representation of $X$ is denoted by $x_i$, where $x_0$ is the least significant bit.

**Example 1** Table 2 (a) is the function table for $\sqrt{X}$. The 8-bit precision (3-bit exponent and 4-bit significand) floating-point representation of this function is the logic function $f_{0}(X)$ in Table 2 (b). By converting binary vectors into integers, we have the integer function $f(X)$ of $f_{0}(X)$ in Table 2 (c). That is, our 8-bit precision floating-point representation of $f(X) = \sqrt{X}$ corresponds to the integer function of Table 2 (c).

(End of Example)

### 2.2. Edge-Valued MDDs

This subsection defines the decision diagrams used in this paper.

**Definition 3** An edge-valued binary decision diagram (EVBDD) [8, 17, 23] is a variant of the BDD [4, 10] that represents an integer function. The EVBDD is obtained by repeatedly applying the expansion $f = x_0 f_0 + x_1 (f'_1 + \alpha)$ to the integer function, where $f_1 = f'_1 + \alpha$, and $\alpha$ is the constant term of $f_1$. The EVBDD consists of only one terminal node representing 0 and non-terminal nodes with 1-edges having integer weights $\alpha$. In an EVBDD, 0-edges always have zero weights. The incoming edge into the root node can have a non-zero weight. The output (integer) value of an EVBDD is the sum of the weights associated with the paths taken from the root node to the terminal node.

**Definition 4** For a set of $n$ binary variables $\{X\}$, if $\{X\} = \{X_o\} \cup \{X_{a-1}\} \cup \ldots \cup \{X_{i}\}$, $\{X\} \neq \emptyset$, and $\{X_i\} \cap \{X_j\} = 0$ ($i \neq j$), then $(X_o, X_{a-1}, \ldots, X_{i})$ is a partition of $X$. Each
The document contains discussions on EVBDDs and EVMDDs, their use in representing integer functions, and a piecewise-split EVMDD approach for representing floating-point functions. It includes a figure showing the EVBDD and EVMDD for an integer function, and another figure demonstrating the architecture for an NFG based on a monolithic EVMDD. The text explains how to partition the domain of a floating-point function and presents an algorithm for doing so.
Input: EVBDD for floating-point numeric function \( f(X) \) and threshold value \( t \) for the number of nodes.

Output: Piecewise-split EVMDD (set of EVMDDs).

Step:
1. Compute the numbers of nodes in all the sub-EVBDDs.
2. Traverse the EVBDD recursively from the root node.
   2.1. Decompose the EVBDD when the number of nodes in a sub-EVBDD is smaller than or equal to \( t \).
   2.2. Assign a segment number to the sub-EVBDD, which represents the numeric function in the segment.
3. Convert the produced sub-EVBDDs into EVMDDs.
4. Construct a multi-terminal EVBDD by considering each segment number as a terminal node, where the root node of the multi-terminal EVBDD is the same as the original one.
5. Convert the multi-terminal EVBDD into a multi-terminal EVMDD.

Figure 4. Segmentation algorithm using EVBDD.

In a piecewise-split EVMDD, the number of EVMDDs and size of each EVMDD depend on how the domain of the function is segmented. Thus, an effective segmentation algorithm which makes the sizes of all the EVMDDs the same is desired to reduce memory size of NFG. In this paper, we propose a segmentation algorithm that partitions the domain into segments by decomposing the EVBDD representing the whole floating-point function. By using the EVBDD, we can simultaneously produce a segmentation of the domain and the piecewise-split EVMDD.

Fig. 4 shows the proposed segmentation algorithm. This algorithm decomposes a given EVBDD so that sizes of all sub-EVBDDs are smaller than or equal to a given threshold value, as shown in Fig. 5(a). And then, it produces a piecewise-split EVMDD as shown in Fig. 5(b). In Fig. 5(a), sub-EVBDDs whose heights are lower (i.e. in which fewer input variables are used) mean that their segments are narrower. In such segments, the floating-point function values rapidly change, and so, sizes of EVBDDs are large [15]. In Fig. 5(b), the multi-terminal EVMDD is used to select a segment, and the other EVMDDs are used to represent the floating-point function in each segment.

4. NFGs Based on Piecewise-Split EVMDDs

By realizing each EVMDD produced by the proposed segmentation algorithm using the architecture shown in Fig. 2, we obtain the NFG in Fig. 6. A value of the numeric function in each segment is computed using the least significant bits of \( X \) in parallel, and then an appropriate value is selected by the segment selector, which realizes a multi-terminal EVMDD. Since a piecewise-split EVMDD uses the most significant bits of \( X \) in parallel with function values, it is faster than the monolithic EVMDD.

In addition, the proposed NFG has the following advantages:

1. Since the proposed NFG just traverses a set of
Table 3. Memory size needed for NFGs based on a monolithic EVMDD and a piecewise-split EVMDD for single-precision (32-bit) floating-point numeric functions.

<table>
<thead>
<tr>
<th>Functions</th>
<th>Memory size (Mbits)</th>
<th>Number of segments</th>
<th>Ratio (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\sin^{-1}(X)$</td>
<td>34</td>
<td>12</td>
<td>28</td>
</tr>
<tr>
<td>$\ln(X)$</td>
<td>564</td>
<td>337</td>
<td>25</td>
</tr>
<tr>
<td>$1/X$</td>
<td>31</td>
<td>15</td>
<td>31</td>
</tr>
<tr>
<td>$\sqrt{X}$</td>
<td>34</td>
<td>13</td>
<td>23</td>
</tr>
<tr>
<td>$\sqrt{-\ln(X)}$</td>
<td>271</td>
<td>160</td>
<td>32</td>
</tr>
</tbody>
</table>

Ratio = \frac{\text{Piecewise EVMDD}}{\text{Monolithic EVMDD}} \times 100 (%).

EVMDDs, and computes the sum of edge weights (integers) in parallel, it requires only integer adders to compute function values of a floating-point function. That is, it requires neither the rounding circuit nor the normalization circuit (which are complex).

2. Since the NFG is a memory-based architecture, a wide range of numeric functions can be realized by changing only the data in the LUT memories.

3. Since the NFG directly realizes the function table of a floating-point function using a piecewise-split EVMDD, it is more accurate than existing NFGs using polynomial approximation [2, 3, 6, 20, 24].

4. The NFG is suitable for pipeline processing, and thus it can achieve a high throughput.

5. Experimental Results

To show the effectiveness of piecewise-split EVMDDs, we realize single-precision floating-point numeric functions using two types of NFGs, an NFG based on a monolithic EVMDD and an NFG based on a piecewise-split EVMDD. We compare memory size needed for the two types of NFGs. Table 3 shows memory size needed for the NFGs, in mega bits, and the number of segments for piecewise-split EVMDDs.

Since a single LUT that realizes a whole single-precision floating-point function requires $2^{32} \times 32 = 128$ Gbits, memory size needed for both types of NFGs is three or four orders of magnitude less than the single LUT-based NFG. And, by using piecewise-split EVMDDs, we can achieve a further reduction to 35% to 60% of memory size needed for the NFGs based on the monolithic EVMDDs. Table 3 shows that we can reduce memory size significantly with a small number of segments. Thus, the size of the multiplexer used in the NFG is also small. Further, we can generate such compact NFGs automatically.

6. Conclusion and Comments

This paper proposes a new architecture for memory-based floating-point NFGs, and a design method using piecewise-split EVMDDs. We also present an algorithm to produce efficient piecewise-split EVMDDs by decomposing the EVBDD representing a given floating-point function. Experimental results show that, for single-precision floating-point functions, our new NFGs based on piecewise-split EVMDDs require 40% to 65% less memory than ones based on monolithic EVMDDs. By using piecewise-split EVMDDs, we can automatically generate compact NFGs. Since our memory-based NFG is quite general, it can realize not only floating-point functions, but also discrete functions and even two-variable functions.

Future work includes 1) reducing memory size further, and 2) developing an optimization algorithm for decomposing an EVBDD. Our NFG requires still large memory size for FPGA implementation. We will further reduce memory size so that our single-precision floating-point NFG can be implemented with an FPGA. The proposed algorithm decomposes an EVBDD using a threshold value for the number of nodes. But, an algorithm that can find an optimum decomposition of EVBDD in terms of memory size or delay time of NFG is more practical.

Acknowledgments

This research is partly supported by the Grant in Aid for Scientific Research of the Japan Society for the Promotion of Science (JSPS), funds from Ministry of Education, Culture, Sports, Science, and Technology (MEXT) via Knowledge Cluster Project, the MEXT Grant-in-Aid for Young Scientists (B), 200700051, 2009, and Hiroshima City University Grant for Special Academic Research (General Studies), 8108, 2009.

References


