# **Digital System Design**

### by Dr. Lesley Shannon Email: Ishannon@ensc.sfu.ca

Course Website: <a href="http://www.ensc.sfu.ca/~lshannon/courses/ensc350">http://www.ensc.sfu.ca/~lshannon/courses/ensc350</a>



Simon Fraser University

Slide Set: 11 Date: March 9, 2009

### Slide Set Overview

- Datapath Circuits
  - Large digital systems are more than state machines and combinational logic. Generally these systems can be divided into two parts:
    - Control
    - Datapath
  - We'll use examples to understand how to do this:
    - There is no real "recipe" for designing these things, but with experience, you get to be good at it.



• All but the simplest systems have two parts:



### Exponent

- Suppose we want to build a circuit to calculate X<sup>3</sup>
  - X is an n-bit input, and assume that the result also fits in n-bits for now
  - This is a relatively simple circuit



### Exponent: A bit more complicated

 What if we want to compute X<sup>A</sup> where X and A are both inputs?



- If A was fixed, we could figure out how many multipliers we need (as in the previous example)
- But, during the operation of this circuit, suppose A can change. How do we know how many hardware units to put down?

### Exponent: A bit more complicated

The algorithm to be implemented in this block:

```
while (CNT > 0) do
```

```
P = P * X;
```

```
CNT = CNT - 1;
```

end while;

Note 1: this isn't VHDL or C, it just is pseudo-code to illustrate the algorithm.

Note 2: We could write this in VHDL, but it would *not* be *synthesizable*.

So, we have to design it using smaller processes (each one synthesizable)

Rule that has never mattered before: A synthesizable process can only describe what happens in one clock cycle. This would take more than one clock cycle. So, it would not be synthesizable

### Exponent: A bit more complicated

Consider this simple datapath



If we let this run for A+1 clock cycles, we will produce the desired result.

This will work for any value of A

Need a way to initialize P to 1 at the start:



We have to let this run for A cycles. We need some sort of counter to keep track of this.



But we are not there yet. What we really want is:



So to implement this, we need a controller that:

when **s** goes high:

set sel and selA to 1 for one cycle

wait until **z** goes high

when it does, assert *done*, and go back to the start

• Here is a simple controller that does that:



• Now combine the state machine and the datapath into one circuit:



```
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
```

```
entity top is
   port(A, X : in std_logic_vector(7 downto 0);
       s, clk : in std_logic;
        P : out std_logic_vector(7 downto 0);
        done : out std_logic);
```

```
end top;
```

```
architecture behavioural of top is
```

```
signal curr_state: std_logic_vector(1 downto 0) := "00";
signal z, sel, selA : std_logic;
signal P_int: std_logic_vector(7 downto 0);
signal cnt: std_logic_vector(7 downto 0);
```

. . . .

begin

-- Datapath

#### --NOTE THIS STILL ISN'T SYNTHESIZABLE BECAUSE MULTIPLICATION IS NOT SYNTHESIZABLE (BUT YOU CAN SIMULATE IT)

. . . . .

```
process(clk)
  begin
    if (clk = '1') then
      if (selA = '1') then
        cnt \ll A;
      else
        cnt <= cnt - 1;
      end if;
    end if;
  end process;
  process(cnt)
  begin
    if (cnt = "0000000") then
     z <= '1';
    else
     z <= '0';
   end if;
  end process;
```

-- Controller

```
process(clk)
begin
  if (clk = '1') then
   case curr state is
      when "00" =>
         if (s = '0') then
           curr_state <= "00";
         else
           curr state \leq "01";
         end if;
      when "01" =>
         if (z = '0') then
            curr_state <= "01";
         else
            curr_state \leq "10";
         end if;
      when others =>
         if (s = '0') then
            curr_state <= "00";
          else
            curr state \leq 10";
          end if;
    end case;
  end if;
 end process;
```

```
process(curr_state)
begin
case curr_state is
when "00" =>
sel <= '1'; selA <= '1'; done <= '0';
when "01" =>
sel <= '0'; selA <= '0'; done <= '0';
when others =>
sel <= '0'; selA <= '0'; done <= '1';
end case;
end process;
end behavioural;</pre>
```

# Consider simulating this description to see the circuit's behaviour in Altera's waveform viewer.

### **Serial Multiplier**

How do you implement a multiplier (multiply two numbers)

| Decimal | Binary    |
|---------|-----------|
| 13      | 1101      |
| x 11    | x 1 0 1 1 |
| 13      | 1101      |
| _13     | 1101      |
| 143     | 0000      |
|         | 1101      |
|         | 10001111  |

# Serial Multiplier Algorithm

Inputs A and B, Output P:



As before, we could implement this psuedo-code using VHDL. But, it would not be synthesizable. So, we have to break it into smaller processes (a.k.a. design the hardware)

Top level diagram of what we will build:



When **s** goes high, a new n-bit values available on **A** and **B**. The machine then multiplies, and when it is finished, asserts **done** and puts the result on **P**.



• State Machine:



ENSC 350: Lecture Set 11

### Together, the state machine and datapath implement the serial multiply. The state machine is a Mealy Machine, so you would need two processes to describe it.

The datapath can be described using simple components.

# **Bit Counting Circuit**



Suppose we want to count the number of '1's in a word. Algorithm to do this:

#### **B=0**

```
while (A ≠ 0) do
if (a<sub>0</sub> = 1) then
B = B + 1
end if
Right shift A
end while
```

Note 1: this isn't VHDL or C, it just is pseudo-code to illustrate the algorithm.

Note 2: We could write this in VHDL, but it would not be synthesizable. So, we have to design it using smaller processes (each one synthesizable) Top level diagram of what we will build:



When **s** goes high, a new n-bit value is available on **A**. The machine then counts the bits, and when it is finished, asserts **done** and puts the result on **B**.

#### Datapath: add1 0 Data (input) 0 1 addB n 1 0 -loadB loadA load SHIFT shiftA ► shift Register (log<sub>2</sub>n bits) clk ĺlog<sub>2</sub>n n В (the final result) ▼ Z $\dot{a}_0$

• State Machine:



Together, the state machine and datapath implement the bitcounting operation. The state machine is a Mealy Machine, so you would need two processes to describe it. The datapath can be described using simple components.

For practice you can try writing the HDL from this description

### How would you design a divider?



















**Controller State Machine** 

# So both Lab 2 and Lab 3 have datapaths!

Yes, but Lab 2 was only the DES datapath and Lab 3 allows a Master to configure (provide a different key) and obtain the status of the datapath ...







# Sorting

Sorting is the type thing that really makes sense to do in software (since it is so sequential). That being said, there may be times that you want to do it in hardware. Let's look at a fairly complex datapath that will perform sorting.

We will consider two approaches:

- Fully parallel (big)
- Serial (slow, but smaller)



The serial version is described in the textbook in great detail.

ENSC 350: Lecture Set 11

# **Fully-Parallel Sorting**

• First consider designing a block that sorts two input numbers:



Now build a network of these building blocks:



This will sort four numbers of any bit-width in one cycle

Problems:

- Gets big of there are more numbers to sort

Best known: O (n log n) blocks for n inputs

- Can't use this if n is arbitrary (not known when the chip is designed)

Suppose we want to sort k numbers:

```
for (i=0 to k-2) do
 A = Ri;
 for (j=i+1 to k-1) do
    B = Rj
    if (B < A) then
      Ri = B
      Rj = A
     A = Ri
   end if;
 end for;
end for;
```





ENSC 350: Lecture Set 11



Assume that the values to be sorted are in Registers R1 to R4 (circuitry is provided to do this, but assume it has already been done)



#### State S2: Load Register A with Ri and initialize j to value of i



State S2: Load Register A with Ri and initialize j to value of i



#### State S3: Increment j so it equals i+1



### State S4: Load value of Rj into B



#### Where does Imux come from in S4?



#### State S5: A and B are compared, and if B<A, goes into State 6

![](_page_52_Figure_1.jpeg)

![](_page_53_Figure_0.jpeg)

### State 6: Swap Ri and Rj (part 1)

![](_page_54_Figure_1.jpeg)

How did it know which loadj to assert?

![](_page_55_Figure_1.jpeg)

#### State S7: Swap Ri and Rj (part 2)

![](_page_56_Figure_1.jpeg)

How did it know which loadi to assert?

![](_page_57_Figure_1.jpeg)

#### State S8: Load A from Ri

![](_page_58_Figure_1.jpeg)

State S8: This was being done all the time, but now we will use zi and zj

![](_page_59_Figure_1.jpeg)

![](_page_60_Figure_0.jpeg)

#### An alternative datapath: Tri-state buffer based datapath

![](_page_61_Figure_1.jpeg)

### Area vs. Speed

In this example, we saw two implementations: Big and Fast Small and Slow

In general, you can trade off area for speed. Ideally, if you double the number of functional units, then you can reduce the number of cycles by half. Rarely can you achieve this.

Which is the right implementation? Depends on how fast you need the circuit to produce results. Larger circuits cost more (more chip area, more power, higher prob. of defects), so if you don't need the speed, a small implementation is probably better.

There is no general rule: as an engineer, it is up to you to choose a good implementation based on the specs you are designing to ENSC 350: Lecture Set 11

# Summary of this long Slide Set

We saw a lot of examples of datapath and control circuits

Do you need to regurgitate all the details of any of these examples on a test?

No, but you might be asked to design a simple system that contains both a datapath and controller. But, if you understand these examples, you'll be in a good position to do the design on a test, and more importantly, in the real world once you graduate (or go on co-op)

![](_page_63_Figure_4.jpeg)

# Summary of this slide set

- All of these examples except the divider can be found in this textbook (pages 673-712)
- They use ASM charts as opposed to FSMs
- There are no review questions for this slide set, just these examples to guide your thought process

![](_page_64_Picture_4.jpeg)