diff --git a/README.md b/README.md index 20a0c1e..fa5bbc3 100644 --- a/README.md +++ b/README.md @@ -280,10 +280,15 @@ Please follow the steps to build the design for zcu102 (ZU9 device based board) 1. Please generate a custom platform with 1x and 2x clocks using the steps described [here](./docs/CUSTOM_PLATFORM_GEN.md). With Chai-v2, we now have the DSPs operating at twice the frequency of the rest of the core. +1. Open `/design/conv/scripts/mcps.tcl` file and modify the path of `xdc` file as shown below. + ``` + read_xdc /design/conv/scripts/mcp_const.xdc + ``` + >**:pushpin: NOTE:** + > - Make sure the path to mcp_const.xdc is correct 1. Go to `CHaiDNN/design/build` folder. - 1. Set SDx tool environment - For BASH: ```sh @@ -347,7 +352,8 @@ Follow the steps to compile the software stack. ``` 1. Now run the following commands. - + >**:pushpin: NOTE:** + > - running ultraclean erases the previous libxlnxdnn.so copied into <...>SD_Card/lib/ from <...>/design/build/sd_card ```sh make ultraclean make diff --git a/docs/BUILD_USING_SDX_GUI.md b/docs/BUILD_USING_SDX_GUI.md index 43ac923..9c24cab 100644 --- a/docs/BUILD_USING_SDX_GUI.md +++ b/docs/BUILD_USING_SDX_GUI.md @@ -117,7 +117,7 @@ To build `CHaiDNN` using the Xilinx® SDx Development Environment, perform th 19. In `SDS++ Linker` add the following in the `command` ``` - sds++ -xp param:compiler.skipTimingCheckAndFrequencyScaling=1 -xp "vivado_prop:run.impl_1.{STEPS.OPT_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.impl_1.{STEPS.PLACE_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.impl_1.STEPS.PHYS_OPT_DESIGN.IS_ENABLED=1" -xp "vivado_prop:run.impl_1.{STEPS.PHYS_OPT_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.impl_1.{STEPS.ROUTE_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.synth_1.{STEPS.SYNTH_DESIGN.TCL.PRE}={/src/conv/scripts/mcps.tcl}" -xp "vivado_prop:run.impl_1.{STEPS.PLACE_DESIGN.TCL.PRE}={/src/conv/scripts/mcps.tcl}" -Wno-unused-label + sds++ -xp param:compiler.skipTimingCheckAndFrequencyScaling=1 -xp "vivado_prop:run.impl_1.{STEPS.OPT_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.impl_1.{STEPS.PLACE_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.impl_1.STEPS.PHYS_OPT_DESIGN.IS_ENABLED=1" -xp "vivado_prop:run.impl_1.{STEPS.PHYS_OPT_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.impl_1.{STEPS.ROUTE_DESIGN.ARGS.MORE OPTIONS}={-directive Explore}" -xp "vivado_prop:run.synth_1.{STEPS.SYNTH_DESIGN.TCL.PRE}={/conv/scripts/mcps.tcl}" -xp "vivado_prop:run.impl_1.{STEPS.PLACE_DESIGN.TCL.PRE}={/conv/scripts/mcps.tcl}" -Wno-unused-label ``` 20. In `SDS++ Linker`, Select `Libraries` and add the following libs ``` @@ -140,7 +140,7 @@ To build `CHaiDNN` using the Xilinx® SDx Development Environment, perform th 24. Apply changes and close the window. -25. Open `/src/design/scripts/mcps.tcl` file and modify the path of `xdc` file as shown below. +25. Open `/src/design/conv/scripts/mcps.tcl` file and modify the path of `xdc` file as shown below. ``` read_xdc /src/design/conv/scripts/mcp_const.xdc ``` @@ -148,9 +148,9 @@ To build `CHaiDNN` using the Xilinx® SDx Development Environment, perform th 26. Select the Hardware functions. - - Navigate to `src/design/src/pool/src/pooling_layer_dp_2xio_top.cpp` file using SDx explorer, right click on `PoolTop` and select Toggle HW/SW. + - Navigate to `src/design/pool/src/pooling_layer_dp_2xio_top.cpp` file using SDx explorer, right click on `PoolTop` and select Toggle HW/SW. - - Navigate to `src/design/src/deconv/src/xi_deconv_top.cpp` file using SDx explorer, right click on `XiDeconvTop` and select Toggle HW/SW. + - Navigate to `src/design/deconv/src/xi_deconv_top.cpp` file using SDx explorer, right click on `XiDeconvTop` and select Toggle HW/SW. >**:pushpin: NOTE:** When building `DietChai` don't map any function to HW. `XiConvolutionTop` will be mapped to HW by default. diff --git a/docs/PERFORMANCE_EVAL.md b/docs/PERFORMANCE_EVAL.md index 81dbbcc..84ff2ec 100644 --- a/docs/PERFORMANCE_EVAL.md +++ b/docs/PERFORMANCE_EVAL.md @@ -46,7 +46,9 @@ ## Quick Performance Evaluation -CHai-v2 provides support for a variety of networks for classification, object detection and segmentation. Please refer to [Model Zoo](./MODELZOO.md) for list of networks and [Supported layers](./SUPPORTED_LAYERS.md) for list of layers that are required for these networks. While we believe that we have addressed most of the frequently-used heavy-lifting layers required in networks, we also understand that there might be some networks which might require support for some additional layers. We encourage users to add support for these layers and you could use the [software plugin](./SOFTWARE_LAYER_PLUGIN.md) methodology or the methodology described [here](./HW_SW_PARTITIONING.md) for efficient hardware software partitioning. But from a system design perspective, we understand that it might be useful to understand the performance of CHai on the layers that are already supported for a given network. We describe an API in this section which can be used to achieve this. For example, if a network is built with 50 layers and the support is missing for only two layers on CHai, then you could get a ball-park estimate on the latency of the 48 layers which are supported. Please note that these latency numbers are only an estimate and they could be optimistic or pessimistic based on the structure of the network. The reason for this is that the runtime of the accelerator fuses some layers for latency optimization. +

CHai-v2 provides support for a variety of networks for classification, object detection and segmentation. Please refer to [Model Zoo](./MODELZOO.md) for list of networks and [Supported layers](./SUPPORTED_LAYERS.md) for list of layers that are required for these networks. While we believe that we have addressed most of the frequently-used heavy-lifting layers required in networks, we also understand that there might be some networks which might require support for some additional layers.

+

We encourage users to add support for these layers and you could use the [software plugin](./SOFTWARE_LAYER_PLUGIN.md) methodology or the methodology described [here](./HW_SW_PARTITIONING.md) for efficient hardware software partitioning. But from a system design perspective, we understand that it might be useful to understand the performance of CHai on the layers that are already supported for a given network.

+

We describe an API in this section which can be used to achieve this. For example, if a network is built with 50 layers and the support is missing for only two layers on CHai, then you could get a ball-park estimate on the latency of the 48 layers which are supported. Please note that these latency numbers are only an estimate and they could be optimistic or pessimistic based on the structure of the network. The reason for this is that the runtime of the accelerator fuses some layers for latency optimization.

API