Written by Holguer Andres

 

Introduction to HLS (High Level Synthesis)

Using Verilog, or VHDL as a hardware description language might be difficult for software centric people who have to break the mental paradigm between concurrent software, and concurrent hardware, which sometimes can be confusing, and could be a pain in the ass for many. In addition, while designing a product, or developing a new prototype, using Verilog, or VHDL can be rigorous and tedious, due to the required time for compiling and changing. Furthermore, so many times, the developer wants to implement complex algorithms, trying to port them from C code to Verilog/VHDL manually, but this is a hard task for even an expert developer, and especially it will take a lot of our precious time. However, today we count with different tools that people have might missed due to the lack of information, or maybe the lack of guides to introduce this tools in a clear, and comprehensive way to understand it easily. Among the tools we have HDL-Coder which is Matlab's IP, and you have to pay for it(unless you use an academic purposes license), OpenCL for Altera (Excellent, but Paid as well), LegUp (Free, and looks really promising, though it has been tested on Linux only), C2H (old C to Hardware tool by Altera, but paid), DSP Buiilder (Paid), Altera HLS (not released, but surely paid), Bluespec  (Promising but Paid), MyHDL (Python to Verilog tool, really interesting and for free), HLS/HLx (Xilinx Vivado tool for HLS, paid, and academic), and among others.

For this Introduction, it is going to be used HLx to generate Verilog from C code, and this generated Verilog code will be used on Altera's FPGAs to show that even though the tool was designed to be used with Xilinx Devices it can be implemented for Altera's FPGAs as well, following some basic steps.

This manual will be divided into 4 parts.

  1. Vivado HLx Installation, and introduction.
  2. Generating C to Verilog using different optimization methods(heuristics).
  3. Simulating, and Debugging the C code, and Verilog Using Vivado HLx, and Modelsim.
  4. Compiling the Generated Verilog File using Quartus.

 Procedure

  1. First you must download the Vivado Hlx and install it on your machine, making sure you download the WebPack Edition to use the academic license, you have to register first on the webpage and then download the WebPack edition.
  2. Once you have downloaded the HLx, you have to install it as WebPack Edition.




  3. After installing HLx you need to open it.
  4. For this example we are going to used an already designed example given by Xilinx, which it is going to be the Fixed Square Root of a number, for this, we press on the button, "Open Example Project", which you can find on the main menu after opening HLx.
  5. Then you select the example "fxp_sqrt".
  6. Save this example into any folder you like, for this example, this is going to be saved on: 
  7. Once, HLx creates the project, you will see a folder with the name of "proj_fxp_sqrt", which has inside a source file called "fxp_sqrt_top.cpp" and contains the C function we want to accelerate or convert to Verilog/VHDL. You can double click and explore the function.
  8. When you double click on the flle "fxp_sqrt_top.cpp" you will find a main function with a inner function which is called "fxp_sqrt", and contains 2 arguments, one is "result", and the other one "in_val", these are the result and the input data to the function "fxp_sqrt" which finds the Square root of a fixed number (that you will see later in this manual). If you right-click on the function "fxp_sqrt", you can explore it by pressing "Open Declaration".

    As you might notice, the Square Root of a Fixed point number is a series of calculations(Taylor Series) that are used to find the Square root of a number, in this case we have 4 bits for the integer part and 16 bits for the non-integer part, and this can be changed by parameters on the function sqrt.
    It is important to highlight that for this example you will see that the Generated Verilog is compatible with Altera's FPGA because the C code does not contain highlevel mathematical operations, all the operations are calculated using Taylor Series, which makes that the HLx generates a Verilog code without any IP cores.
  9. Once you have explored the function, and get familiarized with it, press on the menu "Project->" the button "Project Settings..."
  10. It will take you to the project settings, in which you can find "Synthesis", and you can set the top function you want to accelerate, in this case, you can see that the function to accelerate is the one called "fxp_sqrt_top", therefore any function inside this will be accelerated as well.
  11. Now, close the project settings window, and press on the play button to synthesize the project, . You will see on the Console, that the C synthesis starts, and Vivado starts running different scripts.
  12. When all the scripts have been implemented by HLx, you will see a message on the console that says that it was generated a Verilog module and and RTL for the module.
  13. Now it is time to evaluate the reached performance of the accelerated function, this can done if you go to the right top corner, and press the button "Analysis"  
  14. Click On the tap "Performance" to see the required time for the accelerated SQRT function to generate a result.
  15. You will see the review of all the required operations in each different clock cycle, for instance on the Cycle 19, there are 9 concurrent operations that HLx are taking to reduce the time, instead of calculating each one individually as software would do, the HLx compiler saw that in that particular clock cycle 9 operation can be performed in parallel, this was done using Heuristic algorithms or also called Optimization algorithms. You can explore more into the details by pressing right click on each operation  to go to the source of the C code, or Verilog Code.




  16. To see other summary details of the performance you can go to the tap "Synthesis" to check on the Performance Estimates, which tell you that the latency of the function you just accelerated is 22 Clocks.
  17. Once you have familiarized with the basics of HLx, it is time to use the different kinds of optimizations you can apply to the C code to vary the performance of your results. In this example, as it is a template, different optimization methods were applied, to see them, go back to the main code on the function "fxq_sqrt_top", and then click on the right top corner on "Synthesis", you will see different optimization methods that are being applied to this function, among we find one with the name of "HLS_PIPELINE rewind", please right click and remove it to see what the different is when this optimization method is removed.
  18. Save.
  19. Press again the button play to synthesize the project again, , and go back to check the performance report.
  20. Now you sill see that the latency has been increased by 38 cycles, which are all on the "loop 1". This means that the optimization mode for PIPELINE is not activated which means that the loops inside the function "fxp_sqrt_top" are going to be treated as simple Finite States Machines, which affects the performance dramatically. 
  21. Now go back to the top function "fxp_sqrt_top", and press again on "Synthesis", then right click on "fxp_sqrt_top" to insert Directive...


  22. On the window "HLS Directive Editor", add the directive "PIPELINE", as it is shown in the next image.
  23. Save.
  24. Press again the button play to synthesize, and you will observe that the latency will return to 22 cycles as initially.


  25. As a complementary Material for Optimization you can read the following material on your own to know more about heuristics and optimization, I am certain this material will be really illustrative to get started with these concepts.

  26. Retaking, go to the navigation project on the folder "solution1" you will find a folder named "syn", this folder contains the generated Hardware in SystemC, Verilog, and VHDL. You can go to the folder in which you saved the project and find these files. Open the Verilog File, and explore it, you will find that the inputs and outputs of the module meet the standard of using a start, done, idle, and ready signals, which makes it easy to implement.




  27. If you have ModelSim you can simulate the Verilog file easily, and try it out to check whether or not it works, in the following image it is shown a simulation with modelSim using as an input the number 2(in_val_V), a clock of 50 MHz.

    As the simulation shows, counting right after the start signal, it is required 22 clocks to calculate the sqrt of any input number. Concurrently with the output, the done signal will change automatically to 1, for this example the result is "0x0016a0a" which can be calculated as 1.414215087890625.


  28. It is important to highlight that ModelSim works for simulating the Generated Verilog module. Nevertheless, you can also debug the code using HLx. If you go to the right top corner and press on the button "Debug" you can debug the C code as any regular code, using the step over, step in, pause and play buttons.


  29. After you have debugged the code, and simulated the Verilog/VHDL or whatever, it is time to implement the Verilog code on an Altera's FPGA. This can be done by simply copying the module on your project, and instantiate it.


  30. Once complied, the FPGA can be programmed with the generated SOF file (For this example the Cyclone V GX starter kit board was used).

  31. Congrats!!, now it is your turn for testing, maybe Mandelbrot?, maybe Raytracing? or any other algorithm you might like to accelerate, Why don't you combine this along with Nios II, or the HPS?
  32. The solution of this manual can be downloaded from here.
Attachments:
Download this file (Optimization_Readings.zip)Optimization_Readings.zip[ ]12298 kB
Download this file (Vivado_sqrt_example.zip)Vivado_sqrt_example.zip[ ]5722 kB
Powered by OrdaSoft!
  Written By Peter Gomez Este contenido esta orientado a los programadores que tienen problema con la conectividad("SGC PmmC") de su pantalla uOled-128-g1/g2…
Written by Sherneyko Plata Rangel   Pynq-z2: Hello world   In this tutorial we will implement a simple test of the inputs/outputs available on…
Objetivos Requerimientos Procedimiento Descripción de Hardware. Qsys. Nios II. UCOS II. Secuencia de Sprite. Sintesis de Audio. Descargas Glosario Otros Resultados. Ejemplo de Sprites.     Objetivos: Diseñar una plantilla general para el diseño de…
Written by Holguer A. Becerra           Requerimientos: DE0-NANO USB-UART(solo para parte 3) Python 2.7 ó superior.   Objetivos: Dar una introducción a los conceptos de Multitasking, Scheduling y…
Written by Holguer A. Becerra             Based on Gregory Estrade's Work.   I have ported the PC Engine System on the DE0-NANO back in 2014, why…
      Arduino tools are generally nice tools for quick prototyping and improvized projects, and the Seeeduino Xiao…
Written by: Holguer A Becerra         En esta practica vamos a construir nuestro primer juego retro  usando un sincronizador de Video VGA…
Written by: Andrea Paola Pabón Ortega & Daniel Enrique Mejia Rueda Revision by: Ing Holguer A. Becerra   DESCRIPCIÓN DEL PROYECTO: El  RTAWD DE0NANO…
  Written by Holguer Andres   Requires: DE0-NANO. 4.3 Inch 480x272 Screen.( WQVGA ) ?️       Parte HW: Descargue la siguiente plantilla( DE0_NANO_TFT_PSP.zip) y descomprimala en una ruta sin espacios y…
Escrito por Guillermo Acevedo   Diseño   En esta practica desarrollaremos un filtro FIR en hardware, para este caso en especifico, realizaremos un filtro…
 Written By Juan David Delgado   FILTRO FIR (FILTRO DE RESPUESTA FINITA AL IMPULSO)     Son conocidos como filtros digitales no recursivos debido a…
XISCKER: Reduced and Complex Instruction Set Computing Key Educational Resources A Configurable Platform for the Emulation of Different Computer Architectures An introduction to Computer Architectures through digital design description for FPGA devices   Computer Architecture embraces all three…
Escrito por: Alix Angarita   En el manual a continuación se explica un método de debug adicional que es muy interesante debido a…
By: Fabio Hernández   INTRODUCCIÓN:   El presente documento pretende mostrar la manera de generar software para una imagen de Linux implementada en la…
Summary Written by Fabio Hernandez   HARD PROCESSOR SYSTEM (HPS)     ------------------------------------------------------------------------------------------------------------------------------------------------   Introducción   Tenemos  2 nuclos de procesamiento ARM cortex-A9, cada uno son su propio cache  se…
Escrito por Jesus Lopez         INTRODUCCIÓN   El acceso directo a memoria (DMA, del inglés direct memory access) permite a cierto tipo de componentes de una computadora acceder a…
    Written by  Sebastian Baquero       Objetivos  Introducción a los conceptos de Multitasking, Scheduling y Context Switching.  Ampliación de los conceptos a cerca de el…