Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FP8 and FP8ALT support to THMULTI DivSqrt #135

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Bender.yml
Original file line number Diff line number Diff line change
@@ -37,6 +37,11 @@ sources:
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt_radix16_with_sqrt.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt.v
- vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_top.v
- src/fpnew_lut_div8.sv
- src/fpnew_lut_div8alt.sv
- src/fpnew_lut_sqrt8.sv
- src/fpnew_lut_sqrt8alt.sv
- src/fpnew_divsqrt_8_multi_lut.sv
- src/fpnew_divsqrt_th_32.sv
- src/fpnew_divsqrt_th_64_multi.sv
- src/fpnew_divsqrt_multi.sv
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -38,8 +38,8 @@ E.g.: Support for double-precision (64bit) operations and two simultaneous singl

It is also possible to generate only a subset of operations if e.g. divisions are not needed.

<sup>1</sup>Some compliance issues with IEEE 754-2008 are currently known to exist for the PULP DivSqrt unit (Rounding mismatches have been reported in GitHub issues. This can lead to results being off by 1ulp, and the inexact flag not being properly raised in these cases as well)<br>
<sup>2</sup>Two DivSqrt units are supported: the multi-format PULP DivSqrt unit and a 32-bit unit integrated from the T-Head OpenE906. The `PulpDivsqrt` parameter can be set to 1 or 0 to select the former or the latter unit, respectively.<br>
<sup>2</sup>Three DivSqrt units are supported: a multi-format 64-bit unit integrated from the T-Head OpenC910, the multi-format PULP DivSqrt unit, a 32-bit unit integrated from the T-Head OpenE906. The `DivSqrtSel` parameter can be set to `THMULTI`, `PULP`, `TH32`. `THMULTI` (the default) supports SIMD operations and leverages the unit integrated from the T-Head OpenC910 extended for FP16ALT, FP8, and FP8ALT support (thus supporting FP64, FP32, FP16, FP16ALT, FP8, and FP8ALT). `PULP` supports SIMD operations and selects the multi-format PULP DivSqrt unit (supporting FP64, FP32, FP16, FP16ALT, and FP8). `TH32` selects the 32-bit unit from OpenE906 supporting only FP32.<br>
<sup>1</sup>Some compliance issues with IEEE 754-2008 are currently known to exist for the PULP DivSqrt unit (Rounding mismatches have been reported in GitHub issues. This can lead to results being off by 1ulp, and the inexact flag not being properly raised in these cases as well).<br>
<sup>3</sup>Implementing IEEE 754-201x `minimumNumber` and `maximumNumber`, respectively

### Rounding modes
5 changes: 3 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -134,10 +134,11 @@ Enumeration of type `logic [2:0]` holding the supported FP formats.
| `FP16` | IEEE binary16 | 16 bit | 5 | 10 |
| `FP8` | binary8 | 8 bit | 5 | 2 |
| `FP16ALT` | binary16alt | 16 bit | 8 | 7 |
| `FP8ALT` | binary8alt | 8 bit | 4 | 3 |

The following global parameters associated with FP formats are set in `fpnew_pkg`:
```SystemVerilog
localparam int unsigned NUM_FP_FORMATS = 5;
localparam int unsigned NUM_FP_FORMATS = 6;
localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);
```

@@ -359,7 +360,7 @@ It is of type `divsqrt_unit_t`, which is defined as:
typedef enum logic[1:0] {
PULP, // "PULP" instantiates the PULP DivSqrt unit supports FP64, FP32, FP16, FP16ALT, FP8 and SIMD operations
TH32, // "TH32" instantiates the E906 DivSqrt unit supports only FP32 (no SIMD support)
THMULTI // "THMULTI" instantiates the C910 DivSqrt unit supports FP64, FP32, FP16, FP16ALT and SIMD operations
THMULTI // "THMULTI" instantiates the C910 DivSqrt unit supports FP64, FP32, FP16, FP16ALT, FP8, and FP8ALT and SIMD operations
} divsqrt_unit_t;
```

285 changes: 285 additions & 0 deletions src/fpnew_divsqrt_8_multi_lut.sv
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
// Copyright 2024 ETH Zurich and University of Bologna.
//
// Copyright and related rights are licensed under the Solderpad Hardware
// License, Version 0.51 (the "License"); you may not use this file except in
// compliance with the License. You may obtain a copy of the License at
// http://solderpad.org/licenses/SHL-0.51. Unless required by applicable law
// or agreed to in writing, software, hardware and materials distributed under
// this License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
// CONDITIONS OF ANY KIND, either express or implied. See the License for the
// specific language governing permissions and limitations under the License.
//
// SPDX-License-Identifier: SHL-0.51

// Authors: Luca Bertaccini <[email protected]>
// Stefan Mach <[email protected]>

`include "common_cells/registers.svh"

module fpnew_divsqrt_8_multi_lut #(
// FPU configuration
parameter int unsigned NumPipeRegs = 0,
parameter fpnew_pkg::pipe_config_t PipeConfig = fpnew_pkg::AFTER,
parameter type TagType = logic,
parameter type AuxType = logic,
// Do not change
localparam int unsigned NUM_FORMATS = fpnew_pkg::NUM_FP_FORMATS,
localparam int unsigned ExtRegEnaWidth = NumPipeRegs == 0 ? 1 : NumPipeRegs
) (
input logic clk_i,
input logic rst_ni,
// Input signals
input logic [1:0][7:0] operands_i, // 2 operands
input logic [NUM_FORMATS-1:0][1:0] is_boxed_i, // 2 operands
input fpnew_pkg::roundmode_e rnd_mode_i,
input fpnew_pkg::operation_e op_i,
input fpnew_pkg::fp_format_e dst_fmt_i,
input TagType tag_i,
input logic mask_i,
input AuxType aux_i,
input logic vectorial_op_i,
// Input Handshake
input logic in_valid_i,
output logic in_ready_o,
output logic divsqrt_done_o,
input logic simd_synch_done_i,
output logic divsqrt_ready_o,
input logic simd_synch_rdy_i,
input logic flush_i,
// Output signals
output logic [7:0] result_o,
output fpnew_pkg::status_t status_o,
output logic extension_bit_o,
output TagType tag_o,
output logic mask_o,
output AuxType aux_o,
// Output handshake
output logic out_valid_o,
input logic out_ready_i,
// Indication of valid data in flight
output logic busy_o,
// External register enable override
input logic [ExtRegEnaWidth-1:0] reg_ena_i
);

// ----------
// Constants
// ----------
// Pipelines
localparam NUM_INP_REGS = (PipeConfig == fpnew_pkg::BEFORE)
? NumPipeRegs
: (PipeConfig == fpnew_pkg::DISTRIBUTED
? (NumPipeRegs / 2) // Last to get distributed regs
: 0); // no regs here otherwise
localparam NUM_OUT_REGS = (PipeConfig == fpnew_pkg::AFTER || PipeConfig == fpnew_pkg::INSIDE)
? NumPipeRegs
: (PipeConfig == fpnew_pkg::DISTRIBUTED
? ((NumPipeRegs + 1) / 2) // First to get distributed regs
: 0); // no regs here otherwise

// ---------------
// Input pipeline
// ---------------
// Selected pipeline output signals as non-arrays
logic [1:0][7:0] operands_q;
fpnew_pkg::roundmode_e rnd_mode_q;
fpnew_pkg::operation_e op_q;
fpnew_pkg::fp_format_e dst_fmt_q;

// Input pipeline signals, index i holds signal after i register stages
logic [0:NUM_INP_REGS][1:0][7:0] inp_pipe_operands_q;
fpnew_pkg::roundmode_e [0:NUM_INP_REGS] inp_pipe_rnd_mode_q;
fpnew_pkg::operation_e [0:NUM_INP_REGS] inp_pipe_op_q;
fpnew_pkg::fp_format_e [0:NUM_INP_REGS] inp_pipe_dst_fmt_q;
TagType [0:NUM_INP_REGS] inp_pipe_tag_q;
logic [0:NUM_INP_REGS] inp_pipe_mask_q;
AuxType [0:NUM_INP_REGS] inp_pipe_aux_q;
logic [0:NUM_INP_REGS] inp_pipe_vec_op_q;
logic [0:NUM_INP_REGS] inp_pipe_valid_q;
// Ready signal is combinatorial for all stages
logic [0:NUM_INP_REGS] inp_pipe_ready;

// Input stage: First element of pipeline is taken from inputs
assign inp_pipe_operands_q[0] = operands_i;
assign inp_pipe_rnd_mode_q[0] = rnd_mode_i;
assign inp_pipe_op_q[0] = op_i;
assign inp_pipe_dst_fmt_q[0] = dst_fmt_i;
assign inp_pipe_tag_q[0] = tag_i;
assign inp_pipe_mask_q[0] = mask_i;
assign inp_pipe_aux_q[0] = aux_i;
assign inp_pipe_vec_op_q[0] = vectorial_op_i;
assign inp_pipe_valid_q[0] = in_valid_i;
// Input stage: Propagate pipeline ready signal to upstream circuitry
assign in_ready_o = inp_pipe_ready[0];
// Generate the register stages
for (genvar i = 0; i < NUM_INP_REGS; i++) begin : gen_input_pipeline
// Internal register enable for this stage
logic reg_ena;
// Determine the ready signal of the current stage - advance the pipeline:
// 1. if the next stage is ready for our data
// 2. if the next stage only holds a bubble (not valid) -> we can pop it
assign inp_pipe_ready[i] = inp_pipe_ready[i+1] | ~inp_pipe_valid_q[i+1];
// Valid: enabled by ready signal, synchronous clear with the flush signal
`FFLARNC(inp_pipe_valid_q[i+1], inp_pipe_valid_q[i], inp_pipe_ready[i], flush_i, 1'b0, clk_i, rst_ni)
// Enable register if pipleine ready and a valid data item is present
assign reg_ena = (inp_pipe_ready[i] & inp_pipe_valid_q[i]) | reg_ena_i[i];
// Generate the pipeline registers within the stages, use enable-registers
`FFL(inp_pipe_operands_q[i+1], inp_pipe_operands_q[i], reg_ena, '0)
`FFL(inp_pipe_rnd_mode_q[i+1], inp_pipe_rnd_mode_q[i], reg_ena, fpnew_pkg::RNE)
`FFL(inp_pipe_op_q[i+1], inp_pipe_op_q[i], reg_ena, fpnew_pkg::FMADD)
`FFL(inp_pipe_dst_fmt_q[i+1], inp_pipe_dst_fmt_q[i], reg_ena, fpnew_pkg::fp_format_e'(0))
`FFL(inp_pipe_tag_q[i+1], inp_pipe_tag_q[i], reg_ena, TagType'('0))
`FFL(inp_pipe_mask_q[i+1], inp_pipe_mask_q[i], reg_ena, '0)
`FFL(inp_pipe_aux_q[i+1], inp_pipe_aux_q[i], reg_ena, AuxType'('0))
`FFL(inp_pipe_vec_op_q[i+1], inp_pipe_vec_op_q[i], reg_ena, AuxType'('0))
end
// Output stage: assign selected pipe outputs to signals for later use
assign operands_q = inp_pipe_operands_q[NUM_INP_REGS];
assign rnd_mode_q = inp_pipe_rnd_mode_q[NUM_INP_REGS];
assign op_q = inp_pipe_op_q[NUM_INP_REGS];
assign dst_fmt_q = inp_pipe_dst_fmt_q[NUM_INP_REGS];

logic div_valid, sqrt_valid; // input signalling with unit
logic op_starting; // high in the cycle a new operation starts

// Valids are gated by the FSM ready. Invalid input ops run a sqrt to not lose illegal instr.
assign div_valid = inp_pipe_valid_q[NUM_INP_REGS] & (op_q == fpnew_pkg::DIV) & ~flush_i;
assign sqrt_valid = inp_pipe_valid_q[NUM_INP_REGS] & (op_q != fpnew_pkg::DIV) & ~flush_i;
assign op_starting = div_valid | sqrt_valid;

// -----------------
// DIVSQRT instance
// -----------------
logic [1:0][7:0] operands_div8;
logic [1:0][7:0] operands_div8alt;
logic [7:0] operand_sqrt8;
logic [7:0] operand_sqrt8alt;

always_comb begin : silence_inputs_of_unused_units
operands_div8 = '0;
operands_div8alt = '0;
operand_sqrt8 = '0;
operand_sqrt8alt = '0;
if (op_starting) begin
if (div_valid && dst_fmt_q == fpnew_pkg::FP8) begin
operands_div8 = operands_q;
end else if (div_valid && dst_fmt_q == fpnew_pkg::FP8ALT) begin
operands_div8alt = operands_q;
end else if (sqrt_valid && dst_fmt_q == fpnew_pkg::FP8) begin
operand_sqrt8 = operands_q[0];
end else if (sqrt_valid && dst_fmt_q == fpnew_pkg::FP8ALT) begin
operand_sqrt8alt = operands_q[0];
end
end
end

logic [7:0] result_div8;
logic [7:0] result_div8alt;
logic [7:0] result_sqrt8;
logic [7:0] result_sqrt8alt;
logic [4:0] status_div8, status_div8alt, status_sqrt8, status_sqrt8alt;

fpnew_lut_div8 i_div8_lut (
.input_i ({operands_div8[0], operands_div8[1]}),
.out_o (result_div8),
.status_o (status_div8)
);

fpnew_lut_div8alt i_div8alt_lut (
.input_i ({operands_div8alt[0], operands_div8alt[1]}),
.out_o (result_div8alt),
.status_o (status_div8alt)
);

fpnew_lut_sqrt8 i_sqrt8_lut (
.input_i (operand_sqrt8),
.out_o (result_sqrt8),
.status_o (status_sqrt8)
);

fpnew_lut_sqrt8alt i_sqrt8alt_lut (
.input_i (operand_sqrt8alt),
.out_o (result_sqrt8alt),
.status_o (status_sqrt8alt)
);

// --------------
// Output Select
// --------------
logic [7:0] result_d;
fpnew_pkg::status_t status_d;

always_comb begin : select_output
result_d = '0;
if (div_valid && dst_fmt_q == fpnew_pkg::FP8) begin
result_d = result_div8;
status_d = status_div8;
end else if (div_valid && dst_fmt_q == fpnew_pkg::FP8ALT) begin
result_d = result_div8alt;
status_d = status_div8alt;
end else if (sqrt_valid && dst_fmt_q == fpnew_pkg::FP8) begin
result_d = result_sqrt8;
status_d = status_sqrt8;
end else if (sqrt_valid && dst_fmt_q == fpnew_pkg::FP8ALT) begin
result_d = result_sqrt8alt;
status_d = status_sqrt8alt;
end
end

// ----------------
// Output Pipeline
// ----------------
// Output pipeline signals, index i holds signal after i register stages
logic [0:NUM_OUT_REGS][7:0] out_pipe_result_q;
fpnew_pkg::status_t [0:NUM_OUT_REGS] out_pipe_status_q;
TagType [0:NUM_OUT_REGS] out_pipe_tag_q;
logic [0:NUM_OUT_REGS] out_pipe_mask_q;
AuxType [0:NUM_OUT_REGS] out_pipe_aux_q;
logic [0:NUM_OUT_REGS] out_pipe_valid_q;
// Ready signal is combinatorial for all stages
logic [0:NUM_OUT_REGS] out_pipe_ready;

// Input stage: First element of pipeline is taken from inputs
assign out_pipe_result_q[0] = result_d;
assign out_pipe_status_q[0] = status_d;
assign out_pipe_tag_q[0] = inp_pipe_tag_q[NUM_INP_REGS];
assign out_pipe_mask_q[0] = inp_pipe_mask_q[NUM_INP_REGS];
assign out_pipe_aux_q[0] = inp_pipe_aux_q[NUM_INP_REGS];
assign out_pipe_valid_q[0] = inp_pipe_valid_q[NUM_INP_REGS];
// Input stage: Propagate pipeline ready signal to inside pipe
assign inp_pipe_ready[NUM_INP_REGS] = out_pipe_ready[0];
// Generate the register stages
for (genvar i = 0; i < NUM_OUT_REGS; i++) begin : gen_output_pipeline
// Internal register enable for this stage
logic reg_ena;
// Determine the ready signal of the current stage - advance the pipeline:
// 1. if the next stage is ready for our data
// 2. if the next stage only holds a bubble (not valid) -> we can pop it
assign out_pipe_ready[i] = out_pipe_ready[i+1] | ~out_pipe_valid_q[i+1];
// Valid: enabled by ready signal, synchronous clear with the flush signal
`FFLARNC(out_pipe_valid_q[i+1], out_pipe_valid_q[i], out_pipe_ready[i], flush_i, 1'b0, clk_i, rst_ni)
// Enable register if pipleine ready and a valid data item is present
assign reg_ena = (out_pipe_ready[i] & out_pipe_valid_q[i]) | reg_ena_i[NUM_INP_REGS + i];
// Generate the pipeline registers within the stages, use enable-registers
`FFL(out_pipe_result_q[i+1], out_pipe_result_q[i], reg_ena, '0)
`FFL(out_pipe_status_q[i+1], out_pipe_status_q[i], reg_ena, '0)
`FFL(out_pipe_tag_q[i+1], out_pipe_tag_q[i], reg_ena, TagType'('0))
`FFL(out_pipe_mask_q[i+1], out_pipe_mask_q[i], reg_ena, '0)
`FFL(out_pipe_aux_q[i+1], out_pipe_aux_q[i], reg_ena, AuxType'('0))
end
// Output stage: Ready travels backwards from output side, driven by downstream circuitry
assign out_pipe_ready[NUM_OUT_REGS] = out_ready_i;
// Output stage: assign module outputs
assign result_o = out_pipe_result_q[NUM_OUT_REGS];
assign status_o = out_pipe_status_q[NUM_OUT_REGS];
assign extension_bit_o = 1'b1; // always NaN-Box result
assign tag_o = out_pipe_tag_q[NUM_OUT_REGS];
assign mask_o = out_pipe_mask_q[NUM_OUT_REGS];
assign aux_o = out_pipe_aux_q[NUM_OUT_REGS];
assign out_valid_o = out_pipe_valid_q[NUM_OUT_REGS];
assign busy_o = (| {inp_pipe_valid_q, op_starting, out_pipe_valid_q});

assign divsqrt_done_o = 1'b1;
assign divsqrt_ready_o = 1'b1;
endmodule

87 changes: 64 additions & 23 deletions src/fpnew_divsqrt_th_64_multi.sv
Original file line number Diff line number Diff line change
@@ -157,34 +157,48 @@ module fpnew_divsqrt_th_64_multi #(
// -----------------
// Input processing
// -----------------
logic [3:0] divsqrt_fmt;
logic [5:0] divsqrt_fmt;

// Translate fpnew formats into divsqrt formats
if(WIDTH == 64) begin : translate_fmt_64_bits
always_comb begin : translate_fmt
unique case (dst_fmt_q)
fpnew_pkg::FP64: divsqrt_fmt = 4'b1000;
fpnew_pkg::FP32: divsqrt_fmt = 4'b0100;
fpnew_pkg::FP16: divsqrt_fmt = 4'b0010;
fpnew_pkg::FP16ALT: divsqrt_fmt = 4'b0001;
default: divsqrt_fmt = 4'b1000; // 64 bit max width
fpnew_pkg::FP64: divsqrt_fmt = 6'b100000;
fpnew_pkg::FP32: divsqrt_fmt = 6'b010000;
fpnew_pkg::FP16: divsqrt_fmt = 6'b001000;
fpnew_pkg::FP16ALT: divsqrt_fmt = 6'b000100;
fpnew_pkg::FP8: divsqrt_fmt = 6'b000010;
fpnew_pkg::FP8ALT: divsqrt_fmt = 6'b000001;
default: divsqrt_fmt = 6'b100000; // 66 bit m00ax width
endcase
end
end else if(WIDTH == 32) begin : translate_fmt_32_bits
always_comb begin : translate_fmt
unique case (dst_fmt_q)
fpnew_pkg::FP32: divsqrt_fmt = 4'b0100;
fpnew_pkg::FP16: divsqrt_fmt = 4'b0010;
fpnew_pkg::FP16ALT: divsqrt_fmt = 4'b0001;
default: divsqrt_fmt = 4'b0100; // 32 bit max width
fpnew_pkg::FP32: divsqrt_fmt = 6'b010000;
fpnew_pkg::FP16: divsqrt_fmt = 6'b001000;
fpnew_pkg::FP16ALT: divsqrt_fmt = 6'b000100;
fpnew_pkg::FP8: divsqrt_fmt = 6'b000010;
fpnew_pkg::FP8ALT: divsqrt_fmt = 6'b000001;
default: divsqrt_fmt = 6'b010000; // 32 bit max width
endcase
end
end else if(WIDTH == 16) begin : translate_fmt_16_bits
always_comb begin : translate_fmt
unique case (dst_fmt_q)
fpnew_pkg::FP16: divsqrt_fmt = 4'b0010;
fpnew_pkg::FP16ALT: divsqrt_fmt = 4'b0001;
default: divsqrt_fmt = 4'b0010; // 16 bit max width
fpnew_pkg::FP16: divsqrt_fmt = 6'b001000;
fpnew_pkg::FP16ALT: divsqrt_fmt = 6'b000100;
fpnew_pkg::FP8: divsqrt_fmt = 6'b000010;
fpnew_pkg::FP8ALT: divsqrt_fmt = 6'b000001;
default: divsqrt_fmt = 6'b001000; // 16 bit max width
endcase
end
end else if(WIDTH == 8) begin : translate_fmt_16_bits
always_comb begin : translate_fmt
unique case (dst_fmt_q)
fpnew_pkg::FP8: divsqrt_fmt = 6'b000010;
fpnew_pkg::FP8ALT: divsqrt_fmt = 6'b000001;
default: divsqrt_fmt = 6'b000010; // 8 bit max width
endcase
end
end else begin
@@ -316,7 +330,7 @@ module fpnew_divsqrt_th_64_multi #(

// Regs to save current instruction
fpnew_pkg::roundmode_e rm_q;
logic[3:0] divsqrt_fmt_q;
logic[5:0] divsqrt_fmt_q;
fpnew_pkg::operation_e divsqrt_op_q;
logic div_op, sqrt_op;
logic [WIDTH-1:0] srcf0_q, srcf1_q;
@@ -332,48 +346,75 @@ module fpnew_divsqrt_th_64_multi #(
// NaN-box inputs with max WIDTH
if(WIDTH == 64) begin : gen_fmt_64_bits
always_comb begin : NaN_box_inputs
if(divsqrt_fmt_q == 4'b1000) begin // 64-bit
if(divsqrt_fmt_q == 6'b100000) begin // 64-bit
srcf0[63:0] = srcf0_q[63:0];
srcf1[63:0] = srcf1_q[63:0];
end else if(divsqrt_fmt_q == 4'b0100) begin // 32-bit
end else if(divsqrt_fmt_q == 6'b010000) begin // 32-bit
srcf0[63:32] = '1;
srcf1[63:32] = '1;
srcf0[31:0] = srcf0_q[31:0];
srcf1[31:0] = srcf1_q[31:0];
end else if((divsqrt_fmt_q == 4'b0010) || (divsqrt_fmt_q == 4'b0001)) begin //16-bit
end else if((divsqrt_fmt_q == 6'b001000) || (divsqrt_fmt_q == 6'b000100)) begin //16-bit
srcf0[63:16] = '1;
srcf1[63:16] = '1;
srcf0[15:0] = srcf0_q[15:0];
srcf1[15:0] = srcf1_q[15:0];
end else if((divsqrt_fmt_q == 6'b000010) || (divsqrt_fmt_q == 6'b000001)) begin //8-bit
srcf0[63:8] = '1;
srcf1[63:8] = '1;
srcf0[7:0] = srcf0_q[7:0];
srcf1[7:0] = srcf1_q[7:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
end
end
end else if (WIDTH == 32) begin : gen_fmt_32_bits
always_comb begin : NaN_box_inputs
if(divsqrt_fmt_q == 4'b0100) begin // 32-bit
if(divsqrt_fmt_q == 6'b010000) begin // 32-bit
srcf0[63:32] = '1;
srcf1[63:32] = '1;
srcf0[31:0] = srcf0_q[31:0];
srcf1[31:0] = srcf1_q[31:0];
end else if((divsqrt_fmt_q == 4'b0010) || (divsqrt_fmt_q == 4'b0001)) begin // 16-bit
end else if((divsqrt_fmt_q == 6'b001000) || (divsqrt_fmt_q == 6'b000100)) begin // 16-bit
srcf0[63:16] = '1;
srcf1[63:16] = '1;
srcf0[15:0] = srcf0_q[15:0];
srcf1[15:0] = srcf1_q[15:0];
end else if((divsqrt_fmt_q == 6'b000010) || (divsqrt_fmt_q == 6'b000001)) begin //8-bit
srcf0[63:8] = '1;
srcf1[63:8] = '1;
srcf0[7:0] = srcf0_q[7:0];
srcf1[7:0] = srcf1_q[7:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
end
end
end else if (WIDTH == 16) begin : gen_fmt_16_bits
always_comb begin : NaN_box_inputs
if((divsqrt_fmt_q == 4'b0010) || (divsqrt_fmt_q == 4'b0001)) begin // 16-bit
srcf0[63:16] = '1;
srcf1[63:16] = '1;
if((divsqrt_fmt_q == 6'b001000) || (divsqrt_fmt_q == 6'b000100)) begin // 16-bit
srcf0[63:8] = '1;
srcf1[63:8] = '1;
srcf0[15:0] = srcf0_q[15:0];
srcf1[15:0] = srcf1_q[15:0];
end else if((divsqrt_fmt_q == 6'b000010) || (divsqrt_fmt_q == 6'b000001)) begin //8-bit
srcf0[63:8] = '1;
srcf1[63:8] = '1;
srcf0[7:0] = srcf0_q[7:0];
srcf1[7:0] = srcf1_q[7:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
end
end
end else if (WIDTH == 8) begin : gen_fmt_8_bits
always_comb begin : NaN_box_inputs
if((divsqrt_fmt_q == 6'b000010) || (divsqrt_fmt_q == 6'b000001)) begin //8-bit
srcf0[63:8] = '1;
srcf1[63:8] = '1;
srcf0[7:0] = srcf0_q[7:0];
srcf1[7:0] = srcf1_q[7:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
@@ -408,7 +449,7 @@ module fpnew_divsqrt_th_64_multi #(
.dp_vfdsu_fdiv_gateclk_issue ( 1'b1 ), // Local clock enable (same as above)
.dp_vfdsu_idu_fdiv_issue ( op_starting ), // 1. Issue fdiv (FSM in ctrl)
.forever_cpuclk ( clk_i ), // Clock input
.idu_vfpu_rf_pipex_func ( {3'b0, divsqrt_fmt_q, 11'b0 ,sqrt_op, div_op} ), // Defines format (bits 16,15) and operation (bits 1,0)
.idu_vfpu_rf_pipex_func ( {3'b0, divsqrt_fmt_q, 9'b0 ,sqrt_op, div_op} ), // Defines format (bits 16,15) and operation (bits 1,0)
.idu_vfpu_rf_pipex_gateclk_sel ( func_sel ), // 2. Select func
.pad_yy_icg_scan_en ( 1'b0 ), // SE signal for the redundant clock gating module
.rtu_yy_xx_flush ( flush_i | last_inp_reg_ena), // Flush
262,176 changes: 262,176 additions & 0 deletions src/fpnew_lut_div8.sv

Large diffs are not rendered by default.

262,176 changes: 262,176 additions & 0 deletions src/fpnew_lut_div8alt.sv

Large diffs are not rendered by default.

1,056 changes: 1,056 additions & 0 deletions src/fpnew_lut_sqrt8.sv

Large diffs are not rendered by default.

1,056 changes: 1,056 additions & 0 deletions src/fpnew_lut_sqrt8alt.sv

Large diffs are not rendered by default.

120 changes: 77 additions & 43 deletions src/fpnew_opgroup_multifmt_slice.sv
Original file line number Diff line number Diff line change
@@ -70,16 +70,12 @@ module fpnew_opgroup_multifmt_slice #(
if ((DivSqrtSel == fpnew_pkg::TH32) && !((FpFmtConfig[0] == 1) && (FpFmtConfig[1:NUM_FORMATS-1] == '0))) begin
$fatal(1, "T-Head-based DivSqrt unit supported only in FP32-only configurations. \
Set DivSqrtSel = THMULTI or DivSqrtSel = PULP to use a multi-format divider");
end else if ((DivSqrtSel == fpnew_pkg::THMULTI) && (FpFmtConfig[3] == 1'b1)) begin
$warning("The DivSqrt unit of C910 (instantiated by DivSqrtSel = THMULTI) does not support \
FP8. Please use the PULP DivSqrt unit when in need of div/sqrt operations on FP8.");
end
end

localparam int unsigned MAX_FP_WIDTH = fpnew_pkg::max_fp_width(FpFmtConfig);
localparam int unsigned MAX_INT_WIDTH = fpnew_pkg::max_int_width(IntFmtConfig);
localparam int unsigned NUM_LANES = fpnew_pkg::max_num_lanes(Width, FpFmtConfig, 1'b1);
localparam int unsigned NUM_DIVSQRT_LANES = fpnew_pkg::num_divsqrt_lanes(Width, FpFmtConfig, 1'b1, DivSqrtSel);
localparam int unsigned NUM_INT_FORMATS = fpnew_pkg::NUM_INT_FORMATS;
// We will send the format information along with the data
localparam int unsigned FMT_BITS =
@@ -183,7 +179,7 @@ FP8. Please use the PULP DivSqrt unit when in need of div/sqrt operations on FP8
logic [LANE_WIDTH-1:0] local_result; // lane-local results

// Generate instances only if needed, lane 0 always generated
if ((lane == 0) || (EnableVectors & (!(OpGroup == fpnew_pkg::DIVSQRT && (lane >= NUM_DIVSQRT_LANES))))) begin : active_lane
if ((lane == 0) || EnableVectors) begin : active_lane
logic in_valid, out_valid, out_ready; // lane-local handshake

logic [NUM_OPERANDS-1:0][LANE_WIDTH-1:0] local_operands; // lane-local oprands
@@ -292,42 +288,80 @@ FP8. Please use the PULP DivSqrt unit when in need of div/sqrt operations on FP8
.reg_ena_i
);
end else if(DivSqrtSel == fpnew_pkg::THMULTI) begin : gen_thmulti_c910_divsqrt
fpnew_divsqrt_th_64_multi #(
.FpFmtConfig ( LANE_FORMATS ),
.NumPipeRegs ( NumPipeRegs ),
.PipeConfig ( PipeConfig ),
.TagType ( TagType ),
.AuxType ( logic [AUX_BITS-1:0] )
) i_fpnew_divsqrt_th_64_c910 (
.clk_i,
.rst_ni,
.operands_i ( local_operands[1:0] ), // 2 operands
.is_boxed_i ( is_boxed_2op ), // 2 operands
.rnd_mode_i,
.op_i,
.dst_fmt_i,
.tag_i,
.mask_i ( simd_mask_i[lane] ),
.aux_i ( aux_data ),
.vectorial_op_i ( vectorial_op ), // synchronize only vectorial operations
.in_valid_i ( in_valid ),
.in_ready_o ( lane_in_ready[lane] ),
.divsqrt_done_o ( divsqrt_done[lane] ),
.simd_synch_done_i( simd_synch_done ),
.divsqrt_ready_o ( divsqrt_ready[lane] ),
.simd_synch_rdy_i ( simd_synch_rdy ),
.flush_i,
.result_o ( op_result ),
.status_o ( op_status ),
.extension_bit_o ( lane_ext_bit[lane] ),
.tag_o ( lane_tags[lane] ),
.mask_o ( lane_masks[lane] ),
.aux_o ( lane_aux[lane] ),
.out_valid_o ( out_valid ),
.out_ready_i ( out_ready ),
.busy_o ( lane_busy[lane] ),
.reg_ena_i
);
if (LANE_WIDTH != 8) begin
fpnew_divsqrt_th_64_multi #(
.FpFmtConfig ( LANE_FORMATS ),
.NumPipeRegs ( NumPipeRegs ),
.PipeConfig ( PipeConfig ),
.TagType ( TagType ),
.AuxType ( logic [AUX_BITS-1:0] )
) i_fpnew_divsqrt_th_64_c910 (
.clk_i,
.rst_ni,
.operands_i ( local_operands[1:0] ), // 2 operands
.is_boxed_i ( is_boxed_2op ), // 2 operands
.rnd_mode_i,
.op_i,
.dst_fmt_i,
.tag_i,
.mask_i ( simd_mask_i[lane] ),
.aux_i ( aux_data ),
.vectorial_op_i ( vectorial_op ), // synchronize only vectorial operations
.in_valid_i ( in_valid ),
.in_ready_o ( lane_in_ready[lane] ),
.divsqrt_done_o ( divsqrt_done[lane] ),
.simd_synch_done_i( simd_synch_done ),
.divsqrt_ready_o ( divsqrt_ready[lane] ),
.simd_synch_rdy_i ( simd_synch_rdy ),
.flush_i,
.result_o ( op_result ),
.status_o ( op_status ),
.extension_bit_o ( lane_ext_bit[lane] ),
.tag_o ( lane_tags[lane] ),
.mask_o ( lane_masks[lane] ),
.aux_o ( lane_aux[lane] ),
.out_valid_o ( out_valid ),
.out_ready_i ( out_ready ),
.busy_o ( lane_busy[lane] ),
.reg_ena_i
);
end else begin
fpnew_divsqrt_8_multi_lut #(
.NumPipeRegs ( NumPipeRegs ),
.PipeConfig ( PipeConfig ),
.TagType ( TagType ),
.AuxType ( logic [AUX_BITS-1:0] )
) i_fpnew_divsqrt_8_lut (
.clk_i,
.rst_ni,
.operands_i ( local_operands[1:0] ), // 2 operands
.is_boxed_i ( is_boxed_2op ), // 2 operands
.rnd_mode_i,
.op_i,
.dst_fmt_i,
.tag_i,
.mask_i ( simd_mask_i[lane] ),
.aux_i ( aux_data ),
.vectorial_op_i ( vectorial_op ), // synchronize only vectorial operations
.in_valid_i ( in_valid ),
.in_ready_o ( lane_in_ready[lane] ),
.divsqrt_done_o ( divsqrt_done[lane] ),
.simd_synch_done_i( simd_synch_done ),
.divsqrt_ready_o ( divsqrt_ready[lane] ),
.simd_synch_rdy_i ( simd_synch_rdy ),
.flush_i,
.result_o ( op_result ),
.status_o ( op_status ),
.extension_bit_o ( lane_ext_bit[lane] ),
.tag_o ( lane_tags[lane] ),
.mask_o ( lane_masks[lane] ),
.aux_o ( lane_aux[lane] ),
.out_valid_o ( out_valid ),
.out_ready_i ( out_ready ),
.busy_o ( lane_busy[lane] ),
.reg_ena_i
);
end
end else begin : gen_pulp_divsqrt
fpnew_divsqrt_multi #(
.FpFmtConfig ( LANE_FORMATS ),
@@ -528,8 +562,8 @@ FP8. Please use the PULP DivSqrt unit when in need of div/sqrt operations on FP8

if ((DivSqrtSel != fpnew_pkg::TH32) && !ExtRegEna) begin
// Synch lanes if there is more than one
assign simd_synch_rdy = EnableVectors ? &divsqrt_ready[NUM_DIVSQRT_LANES-1:0] : divsqrt_ready[0];
assign simd_synch_done = EnableVectors ? &divsqrt_done[NUM_DIVSQRT_LANES-1:0] : divsqrt_done[0];
assign simd_synch_rdy = EnableVectors ? &divsqrt_ready[NUM_LANES-1:0] : divsqrt_ready[0];
assign simd_synch_done = EnableVectors ? &divsqrt_done[NUM_LANES-1:0] : divsqrt_done[0];
end else begin
// Unused (TH32 divider only supported for scalar FP32 divsqrt)
assign simd_synch_rdy = '0;
32 changes: 14 additions & 18 deletions src/fpnew_pkg.sv
Original file line number Diff line number Diff line change
@@ -25,6 +25,7 @@ package fpnew_pkg;
// | FP16 | IEEE binary16 | 16 bit | 5 | 10
// | FP8 | binary8 | 8 bit | 5 | 2
// | FP16ALT | binary16alt | 16 bit | 8 | 7
// | FP8ALT | binary8alt | 8 bit | 4 | 3
// *NOTE:* Add new formats only at the end of the enumeration for backwards compatibilty!

// Encoding for a format
@@ -33,7 +34,7 @@ package fpnew_pkg;
int unsigned man_bits;
} fp_encoding_t;

localparam int unsigned NUM_FP_FORMATS = 5; // change me to add formats
localparam int unsigned NUM_FP_FORMATS = 6; // change me to add formats
localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);

// FP formats
@@ -42,7 +43,8 @@ package fpnew_pkg;
FP64 = 'd1,
FP16 = 'd2,
FP8 = 'd3,
FP16ALT = 'd4
FP16ALT = 'd4,
FP8ALT = 'd5
// add new formats here
} fp_format_e;

@@ -52,14 +54,15 @@ package fpnew_pkg;
'{11, 52}, // IEEE binary64 (double)
'{5, 10}, // IEEE binary16 (half)
'{5, 2}, // custom binary8
'{8, 7} // custom binary16alt
'{8, 7}, // custom binary16alt
'{4, 3} // custom binary8alt
// add new formats here
};

typedef logic [0:NUM_FP_FORMATS-1] fmt_logic_t; // Logic indexed by FP format (for masks)
typedef logic [0:NUM_FP_FORMATS-1][31:0] fmt_unsigned_t; // Unsigned indexed by FP format

localparam fmt_logic_t CPK_FORMATS = 5'b11000; // FP32 and FP64 can provide CPK only
localparam fmt_logic_t CPK_FORMATS = 6'b110000; // FP32 and FP64 can provide CPK only

// ---------
// INT TYPES
@@ -130,7 +133,7 @@ package fpnew_pkg;
typedef enum logic[1:0] {
PULP, // "PULP" instantiates the PULP DivSqrt unit supports FP64, FP32, FP16, FP16ALT, FP8 and SIMD operations
TH32, // "TH32" instantiates the E906 DivSqrt unit supports only FP32 (no SIMD support)
THMULTI // "THMULTI" instantiates the C910 DivSqrt unit supports FP64, FP32, FP16, FP16ALT and SIMD operations
THMULTI // "THMULTI" instantiates the C910 DivSqrt unit supports FP64, FP32, FP16, FP16ALT, FP8, FP8ALT and SIMD operations
} divsqrt_unit_t;

// -------------------
@@ -221,47 +224,47 @@ package fpnew_pkg;
Width: 64,
EnableVectors: 1'b0,
EnableNanBox: 1'b1,
FpFmtMask: 5'b11000,
FpFmtMask: 6'b110000,
IntFmtMask: 4'b0011
};

localparam fpu_features_t RV32D = '{
Width: 64,
EnableVectors: 1'b1,
EnableNanBox: 1'b1,
FpFmtMask: 5'b11000,
FpFmtMask: 6'b110000,
IntFmtMask: 4'b0010
};

localparam fpu_features_t RV32F = '{
Width: 32,
EnableVectors: 1'b0,
EnableNanBox: 1'b1,
FpFmtMask: 5'b10000,
FpFmtMask: 6'b100000,
IntFmtMask: 4'b0010
};

localparam fpu_features_t RV64D_Xsflt = '{
Width: 64,
EnableVectors: 1'b1,
EnableNanBox: 1'b1,
FpFmtMask: 5'b11111,
FpFmtMask: 6'b111111,
IntFmtMask: 4'b1111
};

localparam fpu_features_t RV32F_Xsflt = '{
Width: 32,
EnableVectors: 1'b1,
EnableNanBox: 1'b1,
FpFmtMask: 5'b10111,
FpFmtMask: 6'b101111,
IntFmtMask: 4'b1110
};

localparam fpu_features_t RV32F_Xf16alt_Xfvec = '{
Width: 32,
EnableVectors: 1'b1,
EnableNanBox: 1'b1,
FpFmtMask: 5'b10001,
FpFmtMask: 6'b100010,
IntFmtMask: 4'b0110
};

@@ -406,13 +409,6 @@ package fpnew_pkg;
return vec ? width / min_fp_width(cfg) : 1; // if no vectors, only one lane
endfunction

// Returns the maximum number of lanes in the FPU according to width, format config and vectors
function automatic int unsigned num_divsqrt_lanes(int unsigned width, fmt_logic_t cfg, logic vec, divsqrt_unit_t DivSqrtSel);
automatic fmt_logic_t cfg_tmp;
cfg_tmp = (DivSqrtSel == THMULTI) ? cfg & 5'b11101 : cfg;
return vec ? width / min_fp_width(cfg_tmp) : 1; // if no vectors, only one lane
endfunction

// Returns a mask of active FP formats that are present in lane lane_no of a multiformat slice
function automatic fmt_logic_t get_lane_formats(int unsigned width,
fmt_logic_t cfg,
5 changes: 5 additions & 0 deletions src_files.yml
Original file line number Diff line number Diff line change
@@ -33,6 +33,11 @@ fpnew:
vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt_radix16_with_sqrt.v,
vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt.v,
vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_top.v,
src/fpnew_lut_div8.sv,
src/fpnew_lut_div8alt.sv,
src/fpnew_lut_sqrt8.sv,
src/fpnew_lut_sqrt8alt.sv,
src/fpnew_divsqrt_8_multi_lut.sv,
src/fpnew_divsqrt_th_32.sv,
src/fpnew_divsqrt_th_64_multi.sv,
src/fpnew_divsqrt_multi.sv,
Original file line number Diff line number Diff line change
@@ -257,8 +257,9 @@ end
//For Single, initial is 5'b01110('d14), calculate 15 round
assign srt_cnt_ini[4:0] = (ex1_double) ? 5'b01101 :
(ex1_single) ? 5'b00110 :
(ex1_half) ? 5'b00011
: 5'b00010;
(ex1_half) ? 5'b00011 :
(ex1_bfloat) ? 5'b00010
: 5'b00001;

//vfdsu ex2 pipedown signal
assign ex2_pipedown = srt_last_round && div_st_ex2;
@@ -291,8 +292,9 @@ assign srt_secd_round = ex2_srt_secd_round;
assign ex2_srt_secd_round_pre = srt_sm_on && srt_secd_round_pre;
assign srt_secd_round_pre = vfdsu_ex2_double ? srt_cnt[4:0]==5'b01101 :
vfdsu_ex2_single ? srt_cnt[4:0]==5'b00110 :
vfdsu_ex2_half ? srt_cnt[4:0]==5'b00011
: srt_cnt[4:0]==5'b00010;
vfdsu_ex2_half ? srt_cnt[4:0]==5'b00011 :
vfdsu_ex2_bfloat ? srt_cnt[4:0]==5'b00010
: srt_cnt[4:0]==5'b00001;

//==========================================================
// EX3 Stage Control Signal
Original file line number Diff line number Diff line change
@@ -26,6 +26,7 @@ module ct_vfdsu_double(
ex1_single,
ex1_half,
ex1_bfloat,
ex1_fp8,
ex1_sqrt,
ex1_src0,
ex1_src1,
@@ -56,6 +57,7 @@ input ex1_scalar;
input ex1_single;
input ex1_half;
input ex1_bfloat;
input ex1_fp8;
input ex1_sqrt;
input [63:0] ex1_src0;
input [63:0] ex1_src1;
@@ -89,6 +91,7 @@ wire ex1_scalar;
wire ex1_single;
wire ex1_half;
wire ex1_bfloat;
wire ex1_fp8;
wire ex1_sqrt;
wire [63:0] ex1_src0;
wire [63:0] ex1_src1;
@@ -124,13 +127,16 @@ wire [2 :0] vfdsu_ex2_rm;
wire vfdsu_ex2_single;
wire vfdsu_ex2_half;
wire vfdsu_ex2_bfloat;
wire vfdsu_ex2_fp8;
wire vfdsu_ex2_sqrt;
wire vfdsu_ex2_srt_skip;
wire [12:0] vfdsu_ex3_doub_expnt_rst;
wire vfdsu_ex3_double;
wire vfdsu_ex3_dz;
wire [12:0] vfdsu_ex3_half_expnt_rst;
wire [12:0] vfdsu_ex3_bfloat_expnt_rst;
wire [12:0] vfdsu_ex3_fp8_expnt_rst;
wire [12:0] vfdsu_ex3_fp8alt_expnt_rst;
wire vfdsu_ex3_id_srt_skip;
wire vfdsu_ex3_nv;
wire vfdsu_ex3_of;
@@ -152,6 +158,7 @@ wire [8 :0] vfdsu_ex3_sing_expnt_rst;
wire vfdsu_ex3_single;
wire vfdsu_ex3_half;
wire vfdsu_ex3_bfloat;
wire vfdsu_ex3_fp8;
wire vfdsu_ex3_uf;
wire vfdsu_ex4_denorm_to_tiny_frac;
wire vfdsu_ex4_double;
@@ -177,6 +184,7 @@ wire vfdsu_ex4_rslt_denorm;
wire vfdsu_ex4_single;
wire vfdsu_ex4_half;
wire vfdsu_ex4_bfloat;
wire vfdsu_ex4_fp8;
wire vfdsu_ex4_uf;
wire vfpu_yy_xx_dqnan;
wire [2 :0] vfpu_yy_xx_rm;
@@ -196,6 +204,7 @@ ct_vfdsu_prepare x_ct_vfdsu_prepare (
.ex1_single (ex1_single ),
.ex1_half (ex1_half ),
.ex1_bfloat (ex1_bfloat ),
.ex1_fp8 (ex1_fp8 ),
.ex1_sqrt (ex1_sqrt ),
.ex1_src0 (ex1_src0 ),
.ex1_src1 (ex1_src1 ),
@@ -221,6 +230,7 @@ ct_vfdsu_prepare x_ct_vfdsu_prepare (
.vfdsu_ex2_single (vfdsu_ex2_single ),
.vfdsu_ex2_half (vfdsu_ex2_half ),
.vfdsu_ex2_bfloat (vfdsu_ex2_bfloat ),
.vfdsu_ex2_fp8 (vfdsu_ex2_fp8 ),
.vfdsu_ex2_sqrt (vfdsu_ex2_sqrt ),
.vfdsu_ex2_srt_skip (vfdsu_ex2_srt_skip ),
.vfpu_yy_xx_dqnan (vfpu_yy_xx_dqnan ),
@@ -265,13 +275,16 @@ ct_vfdsu_srt x_ct_vfdsu_srt (
.vfdsu_ex2_single (vfdsu_ex2_single ),
.vfdsu_ex2_half (vfdsu_ex2_half ),
.vfdsu_ex2_bfloat (vfdsu_ex2_bfloat ),
.vfdsu_ex2_fp8 (vfdsu_ex2_fp8 ),
.vfdsu_ex2_sqrt (vfdsu_ex2_sqrt ),
.vfdsu_ex2_srt_skip (vfdsu_ex2_srt_skip ),
.vfdsu_ex3_doub_expnt_rst (vfdsu_ex3_doub_expnt_rst ),
.vfdsu_ex3_double (vfdsu_ex3_double ),
.vfdsu_ex3_dz (vfdsu_ex3_dz ),
.vfdsu_ex3_half_expnt_rst (vfdsu_ex3_half_expnt_rst ),
.vfdsu_ex3_bfloat_expnt_rst (vfdsu_ex3_bfloat_expnt_rst ),
.vfdsu_ex3_fp8_expnt_rst (vfdsu_ex3_fp8_expnt_rst ),
.vfdsu_ex3_fp8alt_expnt_rst (vfdsu_ex3_fp8alt_expnt_rst ),
.vfdsu_ex3_id_srt_skip (vfdsu_ex3_id_srt_skip ),
.vfdsu_ex3_nv (vfdsu_ex3_nv ),
.vfdsu_ex3_of (vfdsu_ex3_of ),
@@ -293,6 +306,7 @@ ct_vfdsu_srt x_ct_vfdsu_srt (
.vfdsu_ex3_single (vfdsu_ex3_single ),
.vfdsu_ex3_half (vfdsu_ex3_half ),
.vfdsu_ex3_bfloat (vfdsu_ex3_bfloat ),
.vfdsu_ex3_fp8 (vfdsu_ex3_fp8 ),
.vfdsu_ex3_uf (vfdsu_ex3_uf )
);

@@ -311,6 +325,8 @@ ct_vfdsu_round x_ct_vfdsu_round (
.vfdsu_ex3_dz (vfdsu_ex3_dz ),
.vfdsu_ex3_half_expnt_rst (vfdsu_ex3_half_expnt_rst ),
.vfdsu_ex3_bfloat_expnt_rst (vfdsu_ex3_bfloat_expnt_rst ),
.vfdsu_ex3_fp8_expnt_rst (vfdsu_ex3_fp8_expnt_rst ),
.vfdsu_ex3_fp8alt_expnt_rst (vfdsu_ex3_fp8alt_expnt_rst ),
.vfdsu_ex3_id_srt_skip (vfdsu_ex3_id_srt_skip ),
.vfdsu_ex3_nv (vfdsu_ex3_nv ),
.vfdsu_ex3_of (vfdsu_ex3_of ),
@@ -332,6 +348,7 @@ ct_vfdsu_round x_ct_vfdsu_round (
.vfdsu_ex3_single (vfdsu_ex3_single ),
.vfdsu_ex3_half (vfdsu_ex3_half ),
.vfdsu_ex3_bfloat (vfdsu_ex3_bfloat ),
.vfdsu_ex3_fp8 (vfdsu_ex3_fp8 ),
.vfdsu_ex3_uf (vfdsu_ex3_uf ),
.vfdsu_ex4_denorm_to_tiny_frac (vfdsu_ex4_denorm_to_tiny_frac ),
.vfdsu_ex4_double (vfdsu_ex4_double ),
@@ -357,6 +374,7 @@ ct_vfdsu_round x_ct_vfdsu_round (
.vfdsu_ex4_single (vfdsu_ex4_single ),
.vfdsu_ex4_half (vfdsu_ex4_half ),
.vfdsu_ex4_bfloat (vfdsu_ex4_bfloat ),
.vfdsu_ex4_fp8 (vfdsu_ex4_fp8 ),
.vfdsu_ex4_uf (vfdsu_ex4_uf )
);

@@ -388,6 +406,7 @@ ct_vfdsu_pack x_ct_vfdsu_pack (
.vfdsu_ex4_single (vfdsu_ex4_single ),
.vfdsu_ex4_half (vfdsu_ex4_half ),
.vfdsu_ex4_bfloat (vfdsu_ex4_bfloat ),
.vfdsu_ex4_fp8 (vfdsu_ex4_fp8 ),
.vfdsu_ex4_uf (vfdsu_ex4_uf )
);

Original file line number Diff line number Diff line change
@@ -41,6 +41,7 @@ module ct_vfdsu_pack(
vfdsu_ex4_single,
vfdsu_ex4_half,
vfdsu_ex4_bfloat,
vfdsu_ex4_fp8,
vfdsu_ex4_uf
);

@@ -69,6 +70,7 @@ input vfdsu_ex4_rslt_denorm;
input vfdsu_ex4_single;
input vfdsu_ex4_half;
input vfdsu_ex4_bfloat;
input vfdsu_ex4_fp8;
input vfdsu_ex4_uf;
output [4 :0] ex4_out_expt;
output [63:0] ex4_out_result;
@@ -78,6 +80,8 @@ reg [51:0] ex4_denorm_frac;
reg [51:0] ex4_frac_52;
reg [51:0] ex4_half_denorm_frac;
reg [51:0] ex4_bfloat_denorm_frac;
reg [51:0] ex4_fp8_denorm_frac;
reg [51:0] ex4_fp8alt_denorm_frac;
reg [63:0] ex4_out_result;
reg [51:0] ex4_single_denorm_frac;
reg [12:0] expnt_add_op1;
@@ -105,6 +109,16 @@ wire [63:0] ex4_bfloat_rst0;
wire [63:0] ex4_bfloat_rst_inf;
wire [63:0] ex4_bfloat_rst_norm;
wire [63:0] ex4_bfloat_rst_qnan;
wire [63:0] ex4_fp8_lfn;
wire [63:0] ex4_fp8_rst0;
wire [63:0] ex4_fp8_rst_inf;
wire [63:0] ex4_fp8_rst_norm;
wire [63:0] ex4_fp8_rst_qnan;
wire [63:0] ex4_fp8alt_lfn;
wire [63:0] ex4_fp8alt_rst0;
wire [63:0] ex4_fp8alt_rst_inf;
wire [63:0] ex4_fp8alt_rst_norm;
wire [63:0] ex4_fp8alt_rst_qnan;
wire ex4_of_plus;
wire [4 :0] ex4_out_expt;
wire ex4_result_inf;
@@ -146,6 +160,7 @@ wire vfdsu_ex4_rslt_denorm;
wire vfdsu_ex4_single;
wire vfdsu_ex4_half;
wire vfdsu_ex4_bfloat;
wire vfdsu_ex4_fp8;
wire vfdsu_ex4_uf;


@@ -306,6 +321,31 @@ case(vfdsu_ex4_expnt_rst[12:0])
endcase
end

always @( vfdsu_ex4_expnt_rst[12:0]
or ex4_frac[54:1]
or vfdsu_ex4_denorm_to_tiny_frac)
begin
case(vfdsu_ex4_expnt_rst[12:0])
13'h1: ex4_fp8_denorm_frac[51:0] = { ex4_frac[52:1]}; //-1022 1
13'h0: ex4_fp8_denorm_frac[51:0] = { ex4_frac[53:2]}; //-1023 0
13'h1fff:ex4_fp8_denorm_frac[51:0] = { ex4_frac[54:3]}; //-1024 -1
default :ex4_fp8_denorm_frac[51:0] = vfdsu_ex4_denorm_to_tiny_frac ?{2'b1,50'b0} : 52'b0; //-1045
endcase
end

always @( vfdsu_ex4_expnt_rst[12:0]
or ex4_frac[54:1]
or vfdsu_ex4_denorm_to_tiny_frac)
begin
case(vfdsu_ex4_expnt_rst[12:0])
13'h1: ex4_fp8alt_denorm_frac[51:0] = { ex4_frac[52:1]}; //-1022 1
13'h0: ex4_fp8alt_denorm_frac[51:0] = { ex4_frac[53:2]}; //-1023 0
13'h1fff:ex4_fp8alt_denorm_frac[51:0] = { ex4_frac[54:3]}; //-1024 -1
13'h1ffe:ex4_fp8alt_denorm_frac[51:0] = {1'b0, ex4_frac[54:4]}; //-1025 -2
default :ex4_fp8alt_denorm_frac[51:0] = vfdsu_ex4_denorm_to_tiny_frac ?{3'b1,49'b0} : 52'b0; //-1045
endcase
end

//here when denormal number round to add1, it will become normal number
assign ex4_denorm_potnt_norm = (vfdsu_ex4_potnt_norm[1] && ex4_frac[53]) ||
(vfdsu_ex4_potnt_norm[0] && ex4_frac[54]) ;
@@ -317,9 +357,13 @@ assign ex4_denorm_result[63:0] = vfdsu_ex4_double ?
vfdsu_ex4_single ? {32'hffffffff,vfdsu_ex4_result_sign,
8'h0,ex4_single_denorm_frac[51:29]} :
vfdsu_ex4_half ? {48'hffffffffffff,vfdsu_ex4_result_sign,5'h0,
ex4_half_denorm_frac[51:42]}
: {48'hffffffffffff,vfdsu_ex4_result_sign,8'h0,
ex4_bfloat_denorm_frac[51:45]};
ex4_half_denorm_frac[51:42]} :
vfdsu_ex4_bfloat ? {48'hffffffffffff,vfdsu_ex4_result_sign,8'h0,
ex4_bfloat_denorm_frac[51:45]} :
vfdsu_ex4_fp8 ? {56'hffffffffffffff,vfdsu_ex4_result_sign,5'h0,
ex4_fp8_denorm_frac[51:50]}
: {56'hffffffffffffff,vfdsu_ex4_result_sign,4'h0,
ex4_fp8alt_denorm_frac[51:49]};



@@ -339,6 +383,22 @@ assign ex4_bfloat_rst_norm[63:0] = {48'hffffffffffff,vfdsu_ex4_result_sign,
ex4_frac_52[51:45]};
assign ex4_bfloat_rst0[63:0] = {48'hffffffffffff,vfdsu_ex4_result_sign,15'h0};

assign ex4_fp8_lfn[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,5'h1e,{2{1'b1}}};
assign ex4_fp8_rst_qnan[63:0] = {56'hffffffffffffff,vfdsu_ex4_qnan_sign, 5'h1f,1'b1, vfdsu_ex4_qnan_f[0]};
assign ex4_fp8_rst_inf[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,5'h1f,2'b0};
assign ex4_fp8_rst_norm[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,
ex4_expnt_rst[4:0],
ex4_frac_52[51:50]};
assign ex4_fp8_rst0[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,7'h0};

assign ex4_fp8alt_lfn[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,4'he,{3{1'b1}}};
assign ex4_fp8alt_rst_qnan[63:0] = {56'hffffffffffffff,vfdsu_ex4_qnan_sign, 4'hf,1'b1, vfdsu_ex4_qnan_f[1:0]};
assign ex4_fp8alt_rst_inf[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,4'hf,3'b0};
assign ex4_fp8alt_rst_norm[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,
ex4_expnt_rst[3:0],
ex4_frac_52[51:49]};
assign ex4_fp8alt_rst0[63:0] = {56'hffffffffffffff,vfdsu_ex4_result_sign,7'h0};

//ex4 overflow/underflow plus
assign ex4_rst_nor = vfdsu_ex4_result_nor;
assign ex4_of_plus = vfdsu_ex4_potnt_of &&
@@ -386,22 +446,38 @@ assign ex4_sing_rst_norm[63:0] = {32'hffffffff,vfdsu_ex4_result_sign,
ex4_frac_52[51:29]};
assign ex4_rst_lfn[63:0] = (vfdsu_ex4_double) ? ex4_doub_lfn[63:0] :
vfdsu_ex4_single ? ex4_sing_lfn[63:0] :
vfdsu_ex4_half ? ex4_half_lfn[63:0] : ex4_bfloat_lfn[63:0];
vfdsu_ex4_half ? ex4_half_lfn[63:0] :
vfdsu_ex4_bfloat ? ex4_bfloat_lfn[63:0] :
vfdsu_ex4_fp8 ? ex4_fp8_lfn[63:0]
: ex4_fp8alt_lfn[63:0];

assign ex4_rst0[63:0] = (vfdsu_ex4_double) ? ex4_doub_rst0[63:0] :
vfdsu_ex4_single ? ex4_sing_rst0[63:0] :
vfdsu_ex4_half ? ex4_half_rst0[63:0] : ex4_bfloat_rst0[63:0];
vfdsu_ex4_half ? ex4_half_rst0[63:0] :
vfdsu_ex4_bfloat ? ex4_bfloat_rst0[63:0] :
vfdsu_ex4_fp8 ? ex4_fp8_rst0[63:0]
: ex4_fp8alt_rst0[63:0];

assign ex4_rst_qnan[63:0] = (vfdsu_ex4_double) ? ex4_doub_rst_qnan[63:0] :
vfdsu_ex4_single ? ex4_sing_rst_qnan[63:0] :
vfdsu_ex4_half ? ex4_half_rst_qnan[63:0] : ex4_bfloat_rst_qnan[63:0];
vfdsu_ex4_half ? ex4_half_rst_qnan[63:0] :
vfdsu_ex4_bfloat ? ex4_bfloat_rst_qnan[63:0] :
vfdsu_ex4_fp8 ? ex4_fp8_rst_qnan[63:0]
: ex4_fp8alt_rst_qnan[63:0];

assign ex4_rst_norm[63:0] = (vfdsu_ex4_double) ? ex4_doub_rst_norm[63:0] :
vfdsu_ex4_single ? ex4_sing_rst_norm[63:0] :
vfdsu_ex4_half ? ex4_half_rst_norm[63:0] : ex4_bfloat_rst_norm[63:0];
vfdsu_ex4_half ? ex4_half_rst_norm[63:0] :
vfdsu_ex4_bfloat ? ex4_bfloat_rst_norm[63:0] :
vfdsu_ex4_fp8 ? ex4_fp8_rst_norm[63:0]
: ex4_fp8alt_rst_norm[63:0];

assign ex4_rst_inf[63:0] = (vfdsu_ex4_double) ? ex4_doub_rst_inf[63:0] :
vfdsu_ex4_single ? ex4_sing_rst_inf[63:0] :
vfdsu_ex4_half ? ex4_half_rst_inf[63:0] : ex4_bfloat_rst_inf[63:0];
vfdsu_ex4_half ? ex4_half_rst_inf[63:0] :
vfdsu_ex4_bfloat ? ex4_bfloat_rst_inf[63:0] :
vfdsu_ex4_fp8 ? ex4_fp8_rst_inf[63:0]
: ex4_fp8alt_rst_inf[63:0];


assign ex4_cor_uf = (vfdsu_ex4_uf && !ex4_denorm_potnt_norm || ex4_uf_plus)
Original file line number Diff line number Diff line change
@@ -27,6 +27,7 @@ module ct_vfdsu_prepare(
ex1_single,
ex1_half,
ex1_bfloat,
ex1_fp8,
ex1_sqrt,
ex1_src0,
ex1_src1,
@@ -52,6 +53,7 @@ module ct_vfdsu_prepare(
vfdsu_ex2_single,
vfdsu_ex2_half,
vfdsu_ex2_bfloat,
vfdsu_ex2_fp8,
vfdsu_ex2_sqrt,
vfdsu_ex2_srt_skip,
vfpu_yy_xx_dqnan,
@@ -69,6 +71,7 @@ input ex1_scalar;
input ex1_single;
input ex1_half;
input ex1_bfloat;
input ex1_fp8;
input ex1_sqrt;
input [63:0] ex1_src0;
input [63:0] ex1_src1;
@@ -98,6 +101,7 @@ output [2 :0] vfdsu_ex2_rm;
output vfdsu_ex2_single;
output vfdsu_ex2_half;
output vfdsu_ex2_bfloat;
output vfdsu_ex2_fp8;
output vfdsu_ex2_sqrt;
output vfdsu_ex2_srt_skip;

@@ -125,6 +129,7 @@ reg [2 :0] vfdsu_ex2_rm;
reg vfdsu_ex2_single;
reg vfdsu_ex2_half;
reg vfdsu_ex2_bfloat;
reg vfdsu_ex2_fp8;
reg vfdsu_ex2_sqrt;
reg vfdsu_ex2_srt_skip;

@@ -177,6 +182,18 @@ wire ex1_bfloat_expnt0_zero;
wire ex1_bfloat_expnt1_zero;
wire ex1_bfloat_frac0_all0;
wire ex1_bfloat_frac1_all0;
wire ex1_fp8_expnt0_max;
wire ex1_fp8_expnt1_max;
wire ex1_fp8_expnt0_zero;
wire ex1_fp8_expnt1_zero;
wire ex1_fp8_frac0_all0;
wire ex1_fp8_frac1_all0;
wire ex1_fp8alt_expnt0_max;
wire ex1_fp8alt_expnt1_max;
wire ex1_fp8alt_expnt0_zero;
wire ex1_fp8alt_expnt1_zero;
wire ex1_fp8alt_frac0_all0;
wire ex1_fp8alt_frac1_all0;
wire ex1_nv;
wire ex1_op0_cnan;
wire [51:0] ex1_op0_f;
@@ -234,6 +251,7 @@ wire ex1_sing_frac1_all0;
wire ex1_single;
wire ex1_half;
wire ex1_bfloat;
wire ex1_fp8;
wire ex1_sqrt;
wire ex1_sqrt_expnt_odd;
wire ex1_sqrt_expnt_result_odd;
@@ -265,10 +283,12 @@ assign ex1_oper1[63:0] = ex1_src1[63:0];
//Sign bit prepare
assign ex1_op0_sign = ex1_double ? ex1_oper0[63] :
ex1_single ? ex1_oper0[31] :
ex1_half ? ex1_oper0[15] : ex1_oper0[15];
ex1_half ? ex1_oper0[15] :
ex1_bfloat ? ex1_oper0[15] : ex1_oper0[7];
assign ex1_op1_sign = ex1_double ? ex1_oper1[63] :
ex1_single ? ex1_oper1[31] :
ex1_half ? ex1_oper1[15] : ex1_oper1[15];
ex1_half ? ex1_oper1[15] :
ex1_bfloat ? ex1_oper1[15] : ex1_oper1[7];
assign div_sign = ex1_op0_sign ^ ex1_op1_sign;
assign sqrt_sign = ex1_op0_sign;
assign ex1_result_sign = (ex1_div)
@@ -283,12 +303,20 @@ assign ex1_half_expnt0_max = &ex1_oper0[14:10];
assign ex1_half_expnt1_max = &ex1_oper1[14:10];
assign ex1_bfloat_expnt0_max = &ex1_oper0[14:7];
assign ex1_bfloat_expnt1_max = &ex1_oper1[14:7];
assign ex1_fp8_expnt0_max = &ex1_oper0[6:2];
assign ex1_fp8_expnt1_max = &ex1_oper1[6:2];
assign ex1_fp8alt_expnt0_max = &ex1_oper0[6:3];
assign ex1_fp8alt_expnt1_max = &ex1_oper1[6:3];
assign ex1_expnt0_max = ex1_double ? ex1_doub_expnt0_max :
ex1_single ? ex1_sing_expnt0_max :
ex1_half ? ex1_half_expnt0_max : ex1_bfloat_expnt0_max;
ex1_half ? ex1_half_expnt0_max :
ex1_bfloat ? ex1_bfloat_expnt0_max :
ex1_fp8 ? ex1_fp8_expnt0_max : ex1_fp8alt_expnt0_max;
assign ex1_expnt1_max = ex1_double ? ex1_doub_expnt1_max :
ex1_single ? ex1_sing_expnt1_max :
ex1_half ? ex1_half_expnt1_max : ex1_bfloat_expnt1_max;
ex1_half ? ex1_half_expnt1_max :
ex1_bfloat ? ex1_bfloat_expnt1_max :
ex1_fp8 ? ex1_fp8_expnt1_max : ex1_fp8alt_expnt1_max;

//exponent zero
assign ex1_doub_expnt0_zero = ~|ex1_oper0[62:52];
@@ -299,12 +327,20 @@ assign ex1_half_expnt0_zero = ~|ex1_oper0[14:10];
assign ex1_half_expnt1_zero = ~|ex1_oper1[14:10];
assign ex1_bfloat_expnt0_zero = ~|ex1_oper0[14:7];
assign ex1_bfloat_expnt1_zero = ~|ex1_oper1[14:7];
assign ex1_fp8_expnt0_zero = ~|ex1_oper0[6:2];
assign ex1_fp8_expnt1_zero = ~|ex1_oper1[6:2];
assign ex1_fp8alt_expnt0_zero = ~|ex1_oper0[6:3];
assign ex1_fp8alt_expnt1_zero = ~|ex1_oper1[6:3];
assign ex1_expnt0_zero = ex1_double ? ex1_doub_expnt0_zero :
ex1_single ? ex1_sing_expnt0_zero :
ex1_half ? ex1_half_expnt0_zero : ex1_bfloat_expnt0_zero;
ex1_half ? ex1_half_expnt0_zero :
ex1_bfloat ? ex1_bfloat_expnt0_zero :
ex1_fp8 ? ex1_fp8_expnt0_zero : ex1_fp8alt_expnt0_zero;
assign ex1_expnt1_zero = ex1_double ? ex1_doub_expnt1_zero :
ex1_single ? ex1_sing_expnt1_zero :
ex1_half ? ex1_half_expnt1_zero : ex1_bfloat_expnt1_zero;
ex1_half ? ex1_half_expnt1_zero :
ex1_bfloat ? ex1_bfloat_expnt1_zero :
ex1_fp8 ? ex1_fp8_expnt1_zero : ex1_fp8alt_expnt1_zero;

//fraction zero
assign ex1_doub_frac0_all0 = ~|ex1_oper0[51:0];
@@ -315,20 +351,36 @@ assign ex1_half_frac0_all0 = ~|ex1_oper0[9:0];
assign ex1_half_frac1_all0 = ~|ex1_oper1[9:0];
assign ex1_bfloat_frac0_all0 = ~|ex1_oper0[6:0];
assign ex1_bfloat_frac1_all0 = ~|ex1_oper1[6:0];
assign ex1_fp8_frac0_all0 = ~|ex1_oper0[1:0];
assign ex1_fp8_frac1_all0 = ~|ex1_oper1[1:0];
assign ex1_fp8alt_frac0_all0 = ~|ex1_oper0[2:0];
assign ex1_fp8alt_frac1_all0 = ~|ex1_oper1[2:0];
assign ex1_frac0_all0 = ex1_double ? ex1_doub_frac0_all0 :
ex1_single ? ex1_sing_frac0_all0 :
ex1_half ? ex1_half_frac0_all0 : ex1_bfloat_frac0_all0;
ex1_half ? ex1_half_frac0_all0 :
ex1_bfloat ? ex1_bfloat_frac0_all0 :
ex1_fp8 ? ex1_fp8_frac0_all0 : ex1_fp8alt_frac0_all0;
assign ex1_frac1_all0 = ex1_double ? ex1_doub_frac1_all0 :
ex1_single ? ex1_sing_frac1_all0 :
ex1_half ? ex1_half_frac1_all0 : ex1_bfloat_frac1_all0;
ex1_half ? ex1_half_frac1_all0 :
ex1_bfloat ? ex1_bfloat_frac1_all0 :
ex1_fp8 ? ex1_fp8_frac1_all0 : ex1_fp8alt_frac1_all0;
assign ex1_frac0_msb = ex1_double ? ex1_oper0[51] :
ex1_single ? ex1_oper0[22] :
ex1_half ? ex1_oper0[9] : ex1_oper0[6];
ex1_half ? ex1_oper0[9] :
ex1_bfloat ? ex1_oper0[6] :
ex1_fp8 ? ex1_oper0[1] : ex1_oper0[2];
assign ex1_frac1_msb = ex1_double ? ex1_oper1[51] :
ex1_single ? ex1_oper1[22] :
ex1_half ? ex1_oper1[9] : ex1_oper1[6];
assign ex1_oper0_high_all1 = ex1_single ? &ex1_oper0[63:32] : &ex1_oper0[63:16];
assign ex1_oper1_high_all1 = ex1_single ? &ex1_oper1[63:32] : &ex1_oper1[63:16];
ex1_half ? ex1_oper1[9] :
ex1_bfloat ? ex1_oper1[6] :
ex1_fp8 ? ex1_oper1[1] : ex1_oper1[2];
assign ex1_oper0_high_all1 = ex1_single ? &ex1_oper0[63:32] :
ex1_half ? &ex1_oper0[63:16] :
ex1_bfloat ? &ex1_oper0[63:16] : &ex1_oper0[63:8];
assign ex1_oper1_high_all1 = ex1_single ? &ex1_oper1[63:32] :
ex1_half ? &ex1_oper1[63:16] :
ex1_bfloat ? &ex1_oper1[63:16] : &ex1_oper1[63:8];


//infinity number
@@ -418,29 +470,35 @@ ct_vfdsu_ff1 x_frac1_expnt (
// &Connect(.fanc_shift_num(ex1_oper1_id_frac[51:0])); @158
assign ex1_oper0_frac[51:0] = ex1_double ? ex1_oper0[51:0] :
ex1_single ? {ex1_oper0[22:0],29'b0} :
ex1_half ? {ex1_oper0[9:0],42'b0}
: {ex1_oper0[6:0],45'b0};
ex1_half ? {ex1_oper0[9:0],42'b0} :
ex1_bfloat ? {ex1_oper0[6:0],45'b0} :
ex1_fp8 ? {ex1_oper0[1:0],50'b0} : {ex1_oper0[2:0],49'b0};
assign ex1_oper1_frac[51:0] = ex1_double ? ex1_oper1[51:0] :
ex1_single ? {ex1_oper1[22:0],29'b0} :
ex1_half ? {ex1_oper1[9:0],42'b0}
: {ex1_oper1[6:0],45'b0};
ex1_half ? {ex1_oper1[9:0],42'b0} :
ex1_bfloat ? {ex1_oper1[6:0],45'b0} :
ex1_fp8 ? {ex1_oper1[1:0],50'b0} : {ex1_oper1[2:0],49'b0};
//=====================exponent add=========================
//exponent number 0
assign ex1_div_op0_expnt[12:0] = ex1_double ? {2'b0,ex1_oper0[62:52]} :
ex1_single ? {5'b0,ex1_oper0[30:23]} :
ex1_half ? {8'b0,ex1_oper0[14:10]}
: {5'b0,ex1_oper0[14:7]};
ex1_half ? {8'b0,ex1_oper0[14:10]} :
ex1_bfloat ? {5'b0,ex1_oper0[14:7]} :
ex1_fp8 ? {8'b0,ex1_oper0[6:2]} : {9'b0,ex1_oper0[6:3]};
assign ex1_expnt_adder_op0[12:0] = ex1_op0_id_nor ? ex1_oper0_id_expnt[12:0]
: ex1_div_op0_expnt[12:0];
//exponent number 1
assign ex1_div_op1_expnt[12:0] = ex1_double ? {2'b0,ex1_oper1[62:52]} :
ex1_single ? {5'b0,ex1_oper1[30:23]} :
ex1_half ? {8'b0,ex1_oper1[14:10]}
: {5'b0,ex1_oper1[14:7]};
ex1_half ? {8'b0,ex1_oper1[14:10]} :
ex1_bfloat ? {5'b0,ex1_oper1[14:7]} :
ex1_fp8 ? {8'b0,ex1_oper1[6:2]} : {9'b0,ex1_oper1[6:3]};
assign ex1_sqrt_op1_expnt[12:0] = ex1_double ? {3'b0,{10{1'b1}}} : //'d1023
ex1_single ? {6'b0,{7{1'b1}}} ://'d127
ex1_half ? {9'b0,{4{1'b1}}} //'d15
: {6'b0,{7{1'b1}}}; //'d127
ex1_half ? {9'b0,{4{1'b1}}} ://'d15
ex1_bfloat ? {6'b0,{7{1'b1}}} ://'d127
ex1_fp8 ? {9'b0,{4{1'b1}}} //'d15
: {10'b0,{3{1'b1}}};//'d7

// &CombBeg; @180
always @( ex1_oper1_id_expnt[12:0]
@@ -610,12 +668,16 @@ assign ex1_div_srt_op1[52:0] = ex1_div_nor_srt_op1[52:0];
//ex1_div_nor_srt_op0
assign ex1_div_noid_nor_srt_op0[52:0] = ex1_double ? {1'b1,ex1_oper0[51:0]} :
ex1_single ? {1'b1,ex1_oper0[22:0],29'b0} :
ex1_half ? {1'b1,ex1_oper0[9:0],42'b0}
: {1'b1,ex1_oper0[6:0],45'b0};
ex1_half ? {1'b1,ex1_oper0[9:0],42'b0} :
ex1_bfloat ? {1'b1,ex1_oper0[6:0],45'b0} :
ex1_fp8 ? {1'b1,ex1_oper0[1:0],50'b0}
: {1'b1,ex1_oper0[2:0],49'b0};
assign ex1_div_noid_nor_srt_op1[52:0] = ex1_double ? {1'b1,ex1_oper1[51:0]} :
ex1_single ? {1'b1,ex1_oper1[22:0],29'b0} :
ex1_half ? {1'b1,ex1_oper1[9:0],42'b0}
: {1'b1,ex1_oper1[6:0],45'b0};
ex1_half ? {1'b1,ex1_oper1[9:0],42'b0} :
ex1_bfloat ? {1'b1,ex1_oper1[6:0],45'b0} :
ex1_fp8 ? {1'b1,ex1_oper1[1:0],50'b0}
: {1'b1,ex1_oper1[2:0],49'b0};
assign ex1_div_nor_srt_op0[52:0] = ex1_op0_id_nor ? {ex1_oper0_id_frac[51:0],1'b0}
: ex1_div_noid_nor_srt_op0[52:0];
//ex1_div_nor_srt_op1
@@ -743,6 +805,7 @@ begin
vfdsu_ex2_single <= 1'b0;
vfdsu_ex2_half <= 1'b0;
vfdsu_ex2_bfloat <= 1'b0;
vfdsu_ex2_fp8 <= 1'b0;
end
else if(ex1_pipedown)
begin
@@ -767,6 +830,7 @@ begin
vfdsu_ex2_single <= ex1_single;
vfdsu_ex2_half <= ex1_half;
vfdsu_ex2_bfloat <= ex1_bfloat;
vfdsu_ex2_fp8 <= ex1_fp8;
end
else
begin
@@ -791,6 +855,7 @@ begin
vfdsu_ex2_single <= vfdsu_ex2_single;
vfdsu_ex2_half <= vfdsu_ex2_half;
vfdsu_ex2_bfloat <= vfdsu_ex2_bfloat;
vfdsu_ex2_fp8 <= vfdsu_ex2_fp8;
end
end

211 changes: 194 additions & 17 deletions vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_round.v

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -32,6 +32,7 @@ module ct_vfdsu_scalar_dp(
ex1_scalar,
ex1_half,
ex1_bfloat,
ex1_fp8,
ex1_single,
ex1_sqrt,
ex1_src0,
@@ -54,7 +55,8 @@ module ct_vfdsu_scalar_dp(
vfdsu_ex2_double,
vfdsu_ex2_single,
vfdsu_ex2_half,
vfdsu_ex2_bfloat
vfdsu_ex2_bfloat,
vfdsu_ex2_fp8
);

// &Ports; @24
@@ -85,6 +87,7 @@ output ex1_scalar;
output ex1_single;
output ex1_half;
output ex1_bfloat;
output ex1_fp8;
output ex1_sqrt;
output [63:0] ex1_src0;
output [63:0] ex1_src1;
@@ -97,13 +100,15 @@ output vfdsu_ex2_double;
output vfdsu_ex2_single;
output vfdsu_ex2_half;
output vfdsu_ex2_bfloat;
output vfdsu_ex2_fp8;

// &Regs; @25
reg ex1_div;
reg ex1_double;
reg ex1_single;
reg ex1_half;
reg ex1_bfloat;
reg ex1_fp8;
reg ex1_sqrt;
reg vfdsu_ex2_div;
reg vfdsu_ex2_double;
@@ -113,6 +118,7 @@ reg [6 :0] vfdsu_ex2_iid;
reg vfdsu_ex2_single;
reg vfdsu_ex2_half;
reg vfdsu_ex2_bfloat;
reg vfdsu_ex2_fp8;
reg vfdsu_ex2_sqrt;
reg [4 :0] vfdsu_ex3_dst_ereg;
reg [6 :0] vfdsu_ex3_dst_vreg;
@@ -189,6 +195,7 @@ begin
ex1_single <= 1'b0;
ex1_half <= 1'b0;
ex1_bfloat <= 1'b0;
ex1_fp8 <= 1'b0;
end
else if(idu_vfpu_rf_pipex_gateclk_sel)
begin
@@ -198,6 +205,7 @@ begin
ex1_single <= idu_vfpu_rf_pipex_func[15];
ex1_half <= idu_vfpu_rf_pipex_func[14];
ex1_bfloat <= idu_vfpu_rf_pipex_func[13];
ex1_fp8 <= idu_vfpu_rf_pipex_func[12];
end
end
assign ex1_scalar = 1'b1;
@@ -222,6 +230,7 @@ begin
vfdsu_ex2_single <= 1'b0;
vfdsu_ex2_half <= 1'b0;
vfdsu_ex2_bfloat <= 1'b0;
vfdsu_ex2_fp8 <= 1'b0;
vfdsu_ex2_div <= 1'b0;
vfdsu_ex2_sqrt <= 1'b0;
end
@@ -234,6 +243,7 @@ begin
vfdsu_ex2_single <= ex1_single;
vfdsu_ex2_half <= ex1_half;
vfdsu_ex2_bfloat <= ex1_bfloat;
vfdsu_ex2_fp8 <= ex1_fp8;
vfdsu_ex2_div <= ex1_div;
vfdsu_ex2_sqrt <= ex1_sqrt;
end
@@ -246,6 +256,7 @@ begin
vfdsu_ex2_single <= vfdsu_ex2_single;
vfdsu_ex2_half <= ex1_half;
vfdsu_ex2_bfloat <= ex1_bfloat;
vfdsu_ex2_fp8 <= ex1_fp8;
vfdsu_ex2_div <= vfdsu_ex2_div;
vfdsu_ex2_sqrt <= vfdsu_ex2_sqrt;
end
108 changes: 102 additions & 6 deletions vendor/openc910/C910_RTL_FACTORY/gen_rtl/vfdsu/rtl/ct_vfdsu_srt.v
Original file line number Diff line number Diff line change
@@ -51,13 +51,16 @@ module ct_vfdsu_srt(
vfdsu_ex2_single,
vfdsu_ex2_half,
vfdsu_ex2_bfloat,
vfdsu_ex2_fp8,
vfdsu_ex2_sqrt,
vfdsu_ex2_srt_skip,
vfdsu_ex3_doub_expnt_rst,
vfdsu_ex3_double,
vfdsu_ex3_dz,
vfdsu_ex3_half_expnt_rst,
vfdsu_ex3_bfloat_expnt_rst,
vfdsu_ex3_fp8_expnt_rst,
vfdsu_ex3_fp8alt_expnt_rst,
vfdsu_ex3_id_srt_skip,
vfdsu_ex3_nv,
vfdsu_ex3_of,
@@ -79,6 +82,7 @@ module ct_vfdsu_srt(
vfdsu_ex3_single,
vfdsu_ex3_half,
vfdsu_ex3_bfloat,
vfdsu_ex3_fp8,
vfdsu_ex3_uf
);

@@ -116,6 +120,7 @@ input [2 :0] vfdsu_ex2_rm;
input vfdsu_ex2_single;
input vfdsu_ex2_half;
input vfdsu_ex2_bfloat;
input vfdsu_ex2_fp8;
input vfdsu_ex2_sqrt;
input vfdsu_ex2_srt_skip;
output srt_ctrl_rem_zero;
@@ -126,6 +131,8 @@ output vfdsu_ex3_double;
output vfdsu_ex3_dz;
output [12:0] vfdsu_ex3_half_expnt_rst;
output [12:0] vfdsu_ex3_bfloat_expnt_rst;
output [12:0] vfdsu_ex3_fp8_expnt_rst;
output [12:0] vfdsu_ex3_fp8alt_expnt_rst;
output vfdsu_ex3_id_srt_skip;
output vfdsu_ex3_nv;
output vfdsu_ex3_of;
@@ -147,18 +154,23 @@ output [8 :0] vfdsu_ex3_sing_expnt_rst;
output vfdsu_ex3_single;
output vfdsu_ex3_half;
output vfdsu_ex3_bfloat;
output vfdsu_ex3_fp8;
output vfdsu_ex3_uf;

// &Regs; @24
reg [52:0] ex2_result_double_denorm_round_add_num;
reg [52:0] ex2_result_half_denorm_round_add_num;
reg [52:0] ex2_result_single_denorm_round_add_num;
reg [52:0] ex2_result_bfloat_denorm_round_add_num;
reg [52:0] ex2_result_fp8_denorm_round_add_num;
reg [52:0] ex2_result_fp8alt_denorm_round_add_num;
reg [12:0] vfdsu_ex3_doub_expnt_rst;
reg vfdsu_ex3_double;
reg vfdsu_ex3_dz;
reg [12:0] vfdsu_ex3_half_expnt_rst;
reg [12:0] vfdsu_ex3_bfloat_expnt_rst;
reg [12:0] vfdsu_ex3_fp8_expnt_rst;
reg [12:0] vfdsu_ex3_fp8alt_expnt_rst;
reg vfdsu_ex3_id_srt_skip;
reg vfdsu_ex3_nv;
reg vfdsu_ex3_of;
@@ -179,6 +191,7 @@ reg [8 :0] vfdsu_ex3_sing_expnt_rst;
reg vfdsu_ex3_single;
reg vfdsu_ex3_half;
reg vfdsu_ex3_bfloat;
reg vfdsu_ex3_fp8;
reg vfdsu_ex3_uf;

// &Wires; @25
@@ -210,6 +223,16 @@ wire ex2_bfloat_expnt_uf;
wire ex2_bfloat_id_nor_srt_skip;
wire ex2_bfloat_potnt_of;
wire ex2_bfloat_potnt_uf;
wire ex2_fp8_expnt_of;
wire ex2_fp8_expnt_uf;
wire ex2_fp8_id_nor_srt_skip;
wire ex2_fp8_potnt_of;
wire ex2_fp8_potnt_uf;
wire ex2_fp8alt_expnt_of;
wire ex2_fp8alt_expnt_uf;
wire ex2_fp8alt_id_nor_srt_skip;
wire ex2_fp8alt_potnt_of;
wire ex2_fp8alt_potnt_uf;
wire ex2_id_nor_srt_skip;
wire ex2_of;
wire ex2_of_plus;
@@ -274,6 +297,7 @@ wire [2 :0] vfdsu_ex2_rm;
wire vfdsu_ex2_single;
wire vfdsu_ex2_half;
wire vfdsu_ex2_bfloat;
wire vfdsu_ex2_fp8;
wire vfdsu_ex2_sqrt;
wire vfdsu_ex2_srt_skip;
wire vfdsu_ex3_rem_zero;
@@ -305,28 +329,48 @@ assign ex2_half_expnt_of = ~vfdsu_ex2_expnt_rst[6] && (vfdsu_ex2_expnt_rst[5]
assign ex2_bfloat_expnt_of = ~vfdsu_ex2_expnt_rst[9] && (vfdsu_ex2_expnt_rst[8]
|| (vfdsu_ex2_expnt_rst[7] &&
|vfdsu_ex2_expnt_rst[6:0]));
assign ex2_fp8_expnt_of = ~vfdsu_ex2_expnt_rst[6] && (vfdsu_ex2_expnt_rst[5]
|| (vfdsu_ex2_expnt_rst[4] &&
|vfdsu_ex2_expnt_rst[3:0]));
assign ex2_fp8alt_expnt_of = ~vfdsu_ex2_expnt_rst[5] && (vfdsu_ex2_expnt_rst[4]
|| (vfdsu_ex2_expnt_rst[3] &&
|vfdsu_ex2_expnt_rst[2:0]));
assign ex2_expnt_of = vfdsu_ex2_double ? ex2_doub_expnt_of :
vfdsu_ex2_single ? ex2_sing_expnt_of :
vfdsu_ex2_half ? ex2_half_expnt_of : ex2_bfloat_expnt_of;
vfdsu_ex2_half ? ex2_half_expnt_of :
vfdsu_ex2_bfloat ? ex2_bfloat_expnt_of :
vfdsu_ex2_fp8 ? ex2_fp8_expnt_of : ex2_fp8alt_expnt_of;
assign ex2_potnt_of_pre = vfdsu_ex2_double ? ex2_doub_potnt_of :
vfdsu_ex2_single ? ex2_sing_potnt_of :
vfdsu_ex2_half ? ex2_half_potnt_of : ex2_bfloat_potnt_of;
vfdsu_ex2_half ? ex2_half_potnt_of :
vfdsu_ex2_bfloat ? ex2_bfloat_potnt_of :
vfdsu_ex2_fp8 ? ex2_fp8_potnt_of : ex2_fp8alt_potnt_of;
assign ex2_potnt_uf_pre = vfdsu_ex2_double ? ex2_doub_potnt_uf :
vfdsu_ex2_single ? ex2_sing_potnt_uf :
vfdsu_ex2_half ? ex2_half_potnt_uf : ex2_bfloat_potnt_uf;
vfdsu_ex2_half ? ex2_half_potnt_uf :
vfdsu_ex2_bfloat ? ex2_bfloat_potnt_uf :
vfdsu_ex2_fp8 ? ex2_fp8_potnt_uf : ex2_fp8alt_potnt_uf;
assign ex2_expnt_uf = vfdsu_ex2_double ? ex2_doub_expnt_uf :
vfdsu_ex2_single ? ex2_sing_expnt_uf :
vfdsu_ex2_half ? ex2_half_expnt_uf : ex2_bfloat_expnt_uf;
vfdsu_ex2_half ? ex2_half_expnt_uf :
vfdsu_ex2_bfloat ? ex2_bfloat_expnt_uf :
vfdsu_ex2_fp8 ? ex2_fp8_expnt_uf : ex2_fp8alt_expnt_uf;
assign ex2_id_nor_srt_skip = vfdsu_ex2_double ? ex2_double_id_nor_srt_skip :
vfdsu_ex2_single ? ex2_single_id_nor_srt_skip :
vfdsu_ex2_half ? ex2_half_id_nor_srt_skip : ex2_bfloat_id_nor_srt_skip;
vfdsu_ex2_half ? ex2_half_id_nor_srt_skip :
vfdsu_ex2_bfloat ? ex2_bfloat_id_nor_srt_skip :
vfdsu_ex2_fp8 ? ex2_fp8_id_nor_srt_skip : ex2_fp8alt_id_nor_srt_skip;
assign ex2_result_denorm_round_add_num[52:0] = vfdsu_ex2_double ?
ex2_result_double_denorm_round_add_num[52:0] :
vfdsu_ex2_single ?
ex2_result_single_denorm_round_add_num[52:0] :
vfdsu_ex2_half ?
ex2_result_half_denorm_round_add_num[52:0] :
ex2_result_bfloat_denorm_round_add_num[52:0];
vfdsu_ex2_bfloat ?
ex2_result_bfloat_denorm_round_add_num[52:0] :
vfdsu_ex2_fp8 ?
ex2_result_fp8_denorm_round_add_num[52:0] :
ex2_result_fp8alt_denorm_round_add_num[52:0];


//potential overflow when E1-E2 = 128/1024
@@ -346,6 +390,14 @@ assign ex2_bfloat_potnt_of = ~vfdsu_ex2_expnt_rst[9] &&
~vfdsu_ex2_expnt_rst[8] &&
vfdsu_ex2_expnt_rst[7] &&
~|vfdsu_ex2_expnt_rst[6:0];
assign ex2_fp8_potnt_of = ~vfdsu_ex2_expnt_rst[6] &&
~vfdsu_ex2_expnt_rst[5] &&
vfdsu_ex2_expnt_rst[4] &&
~|vfdsu_ex2_expnt_rst[3:0];
assign ex2_fp8alt_potnt_of = ~vfdsu_ex2_expnt_rst[5] &&
~vfdsu_ex2_expnt_rst[4] &&
vfdsu_ex2_expnt_rst[3] &&
~|vfdsu_ex2_expnt_rst[2:0];
assign ex2_potnt_of = ex2_potnt_of_pre &&
vfdsu_ex2_op0_norm &&
vfdsu_ex2_op1_norm &&
@@ -356,6 +408,8 @@ assign ex2_doub_expnt_uf = vfdsu_ex2_expnt_rst[12] && (vfdsu_ex2_expnt_rst[11:0]
assign ex2_sing_expnt_uf = vfdsu_ex2_expnt_rst[12] && (vfdsu_ex2_expnt_rst[11:0] <= 12'hf81);
assign ex2_bfloat_expnt_uf = vfdsu_ex2_expnt_rst[12] && (vfdsu_ex2_expnt_rst[11:0] <= 12'hf81);
assign ex2_half_expnt_uf = vfdsu_ex2_expnt_rst[12] && (vfdsu_ex2_expnt_rst[11:0] <= 12'hff1);
assign ex2_fp8_expnt_uf = vfdsu_ex2_expnt_rst[12] && (vfdsu_ex2_expnt_rst[11:0] <= 12'hff1);
assign ex2_fp8alt_expnt_uf = vfdsu_ex2_expnt_rst[12] && (vfdsu_ex2_expnt_rst[11:0] <= 12'hff9);
assign ex2_half_potnt_uf = &vfdsu_ex2_expnt_rst[6:4] &&
~|vfdsu_ex2_expnt_rst[3:2] &&
vfdsu_ex2_expnt_rst[1] &&
@@ -375,6 +429,14 @@ assign ex2_bfloat_potnt_uf = &vfdsu_ex2_expnt_rst[9:7] &&
~|vfdsu_ex2_expnt_rst[6:2] &&
vfdsu_ex2_expnt_rst[1] &&
!vfdsu_ex2_expnt_rst[0];
assign ex2_fp8_potnt_uf = &vfdsu_ex2_expnt_rst[6:4] &&
~|vfdsu_ex2_expnt_rst[3:2] &&
vfdsu_ex2_expnt_rst[1] &&
!vfdsu_ex2_expnt_rst[0];
assign ex2_fp8alt_potnt_uf = &vfdsu_ex2_expnt_rst[5:3] &&
~|vfdsu_ex2_expnt_rst[2] &&
vfdsu_ex2_expnt_rst[1] &&
!vfdsu_ex2_expnt_rst[0];

assign ex2_potnt_uf = (ex2_potnt_uf_pre &&
vfdsu_ex2_op0_norm &&
@@ -411,6 +473,10 @@ assign ex2_half_id_nor_srt_skip = vfdsu_ex2_expnt_rst[12]
&& (vfdsu_ex2_expnt_rst[11:0]<12'hfe7);
assign ex2_bfloat_id_nor_srt_skip = vfdsu_ex2_expnt_rst[12]
&& (vfdsu_ex2_expnt_rst[11:0]<12'hf6a);
assign ex2_fp8_id_nor_srt_skip = vfdsu_ex2_expnt_rst[12]
&& (vfdsu_ex2_expnt_rst[11:0]<12'hfe7);
assign ex2_fp8alt_id_nor_srt_skip = vfdsu_ex2_expnt_rst[12]
&& (vfdsu_ex2_expnt_rst[11:0]<12'hff6);
assign ex2_rslt_denorm = ex2_uf;

//=======================EX2 skip srt iteration======================
@@ -545,6 +611,27 @@ case(vfdsu_ex2_expnt_rst[12:0])
endcase
end

always @( vfdsu_ex2_expnt_rst[12:0])
begin
case(vfdsu_ex2_expnt_rst[12:0])
13'h1ff2:ex2_result_fp8_denorm_round_add_num[52:0] = 53'h4000000000000; //-14 1
13'h1ff1:ex2_result_fp8_denorm_round_add_num[52:0] = 53'h8000000000000; //-15 0
13'h1ff0:ex2_result_fp8_denorm_round_add_num[52:0] = 53'h10000000000000; //-16 -1
default: ex2_result_fp8_denorm_round_add_num[52:0] = 53'h0; // -23
endcase
end

always @( vfdsu_ex2_expnt_rst[12:0])
begin
case(vfdsu_ex2_expnt_rst[12:0])
13'h1ffa:ex2_result_fp8alt_denorm_round_add_num[52:0] = 53'h2000000000000; //-6 1
13'h1ff9:ex2_result_fp8alt_denorm_round_add_num[52:0] = 53'h4000000000000; //-7 0
13'h1ff8:ex2_result_fp8alt_denorm_round_add_num[52:0] = 53'h8000000000000; //-8 -1
13'h1ff7:ex2_result_fp8alt_denorm_round_add_num[52:0] = 53'h10000000000000; //-9 -2
default: ex2_result_fp8alt_denorm_round_add_num[52:0] = 53'h0; // -10
endcase
end

//===================special result========================
assign ex2_result_zero = vfdsu_ex2_result_zero;
assign ex2_result_qnan = vfdsu_ex2_result_qnan;
@@ -597,6 +684,8 @@ begin
vfdsu_ex3_sing_expnt_rst[8:0] <= 9'b0;
vfdsu_ex3_half_expnt_rst[12:0] <= 13'b0;
vfdsu_ex3_bfloat_expnt_rst[12:0] <= 13'b0;
vfdsu_ex3_fp8_expnt_rst[12:0] <= 13'b0;
vfdsu_ex3_fp8alt_expnt_rst[12:0] <= 13'b0;
vfdsu_ex3_result_sign <= 1'b0;
vfdsu_ex3_qnan_sign <= 1'b0;
vfdsu_ex3_qnan_f[51:0] <= 52'b0;
@@ -609,6 +698,7 @@ begin
vfdsu_ex3_single <= 1'b0;
vfdsu_ex3_half <= 1'b0;
vfdsu_ex3_bfloat <= 1'b0;
vfdsu_ex3_fp8 <= 1'b0;
end
else if(ex2_pipedown)
begin
@@ -628,6 +718,8 @@ begin
vfdsu_ex3_sing_expnt_rst[8:0] <= vfdsu_ex2_expnt_rst[8:0];
vfdsu_ex3_half_expnt_rst[12:0] <= vfdsu_ex2_expnt_rst[12:0];
vfdsu_ex3_bfloat_expnt_rst[12:0] <= vfdsu_ex2_expnt_rst[12:0];
vfdsu_ex3_fp8_expnt_rst[12:0] <= vfdsu_ex2_expnt_rst[12:0];
vfdsu_ex3_fp8alt_expnt_rst[12:0] <= vfdsu_ex2_expnt_rst[12:0];
vfdsu_ex3_result_sign <= vfdsu_ex2_result_sign;
vfdsu_ex3_qnan_sign <= vfdsu_ex2_qnan_sign;
vfdsu_ex3_qnan_f[51:0] <= vfdsu_ex2_qnan_f[51:0];
@@ -640,6 +732,7 @@ begin
vfdsu_ex3_single <= vfdsu_ex2_single;
vfdsu_ex3_half <= vfdsu_ex2_half;
vfdsu_ex3_bfloat <= vfdsu_ex2_bfloat;
vfdsu_ex3_fp8 <= vfdsu_ex2_fp8;
end
else
begin
@@ -659,6 +752,8 @@ begin
vfdsu_ex3_sing_expnt_rst[8:0] <= vfdsu_ex3_sing_expnt_rst[8:0];
vfdsu_ex3_half_expnt_rst[12:0] <= vfdsu_ex3_half_expnt_rst[12:0];
vfdsu_ex3_bfloat_expnt_rst[12:0] <= vfdsu_ex3_bfloat_expnt_rst[12:0];
vfdsu_ex3_fp8_expnt_rst[12:0] <= vfdsu_ex3_fp8_expnt_rst[12:0];
vfdsu_ex3_fp8alt_expnt_rst[12:0] <= vfdsu_ex3_fp8alt_expnt_rst[12:0];
vfdsu_ex3_result_sign <= vfdsu_ex3_result_sign;
vfdsu_ex3_qnan_sign <= vfdsu_ex3_qnan_sign;
vfdsu_ex3_qnan_f[51:0] <= vfdsu_ex3_qnan_f[51:0];
@@ -671,6 +766,7 @@ begin
vfdsu_ex3_single <= vfdsu_ex3_single;
vfdsu_ex3_half <= vfdsu_ex3_half;
vfdsu_ex3_bfloat <= vfdsu_ex3_bfloat;
vfdsu_ex3_fp8 <= vfdsu_ex3_fp8;
end
end
assign vfdsu_ex3_rem_zero = ~|srt_remainder[60:0];
Original file line number Diff line number Diff line change
@@ -101,6 +101,7 @@ wire ex1_scalar;
wire ex1_single;
wire ex1_half;
wire ex1_bfloat;
wire ex1_fp8;
wire ex1_sqrt;
wire [63:0] ex1_src0;
wire [63:0] ex1_src1;
@@ -132,6 +133,7 @@ wire vfdsu_ex2_double;
wire vfdsu_ex2_single;
wire vfdsu_ex2_half;
wire vfdsu_ex2_bfloat;
wire vfdsu_ex2_fp8;
wire vfdsu_ifu_debug_ex2_wait;
wire vfdsu_ifu_debug_idle;
wire vfdsu_ifu_debug_pipe_busy;
@@ -276,6 +278,7 @@ ct_vfdsu_double x_ct_vfdsu_double (
.ex1_single (ex1_single ),
.ex1_half (ex1_half ),
.ex1_bfloat (ex1_bfloat ),
.ex1_fp8 (ex1_fp8 ),
.ex1_sqrt (ex1_sqrt ),
.ex1_src0 (ex1_src0 ),
.ex1_src1 (ex1_src1 ),
@@ -314,6 +317,7 @@ ct_vfdsu_scalar_dp x_ct_vfdsu_scalar_dp (
.ex1_single (ex1_single ),
.ex1_half (ex1_half ),
.ex1_bfloat (ex1_bfloat ),
.ex1_fp8 (ex1_fp8 ),
.ex1_sqrt (ex1_sqrt ),
.ex1_src0 (ex1_src0 ),
.ex1_src1 (ex1_src1 ),
@@ -335,7 +339,8 @@ ct_vfdsu_scalar_dp x_ct_vfdsu_scalar_dp (
.vfdsu_ex2_double (vfdsu_ex2_double ),
.vfdsu_ex2_single (vfdsu_ex2_single ),
.vfdsu_ex2_half (vfdsu_ex2_half ),
.vfdsu_ex2_bfloat (vfdsu_ex2_bfloat )
.vfdsu_ex2_bfloat (vfdsu_ex2_bfloat ),
.vfdsu_ex2_fp8 (vfdsu_ex2_fp8 )
);


Large diffs are not rendered by default.