-
-
Category: Cores
-
-
Hits: 11304
Authors:
Full Paper for Citation:
Verilog files for the core:
- Quartus Project:
- Individual Verilog Files:
- Zip File:
Video Example of the core working on a Real Vital Signs Monitor:
Introduction
Signal wave graphing is a common requirement in embedded instrumentation systems dedicated to the monitoring and logging of external variables. However, this functionality requires a lot more resources than the ones available in simple signal acquisition systems as expected in power and economic budgets, as they may require the use of external GPUs that integrate a variety of graphing resources unnecessary for logging-and-graphing dedicated applications, such as Digital Oscilloscopes, Industrial Monitoring, Vital Signs Monitors and others.
A Verilog HDL module for the acceleration of signal wave graphing using FPGAs has been developed as part of an ongoing process carried by the ADT(Advanced Digital Technologies) academic working-group at UPB(Universidad Pontificia Bolivariana) Bucaramanga. The ADT group develops resources related to its fields of study and leaves them available and open for other students and the engineering industry to use in embedded systems solutions. This paper describes this module's justification, specifications, development methodology, comparison test results, performance and future work.
An advantage provided by the use of FPGAs is the possibility of performing signal wave graphing using the same integrated circuit used for signal processing and peripheral interfaces, without the high cost related to custom ASICs (Application Specific Integrated Circuits). FPGAs are a fast way of getting to market with highly integrated devices before or without moving to ASICs. For instance, Global Information and Communication Technology service provider Huawei was known for being the single largest buyer of FPGAs before changing their commercial solutions to ASICs [*], and recent news stated that Intel will be manufacturing Altera's high-end FPGAs, giving them an edge in technology that should boost the use of their devices in faster design-to-market solutions.
The algorithms for signal wave graphing can be carried out by a processor with the use of graphing libraries and frame dedicated modules, but these systems result in slower processing speeds for any other task, as video processing is a very demanding task by itself due to the speeds required to perform stable and optimal frame rates. However, the iterative nature of rasterization and other wave related algorithms makes them highly translatable to efficient HWD modules, where the inputs are points X and Y, where X could be time and Y is the input data, while the output is the wave pixels setting to be plotted in the display. This project makes use of custom HWD for the acquisition and processing of incoming data from multiple signal waves, as well as rasterization algorithms and the graphing of pixels directly into the video frame.
Graph Module
A following step is to develop a set of modules that are capable of graphing any input data. For this purpose there is a peripheral device that delivers mutiple-wave continuous data through a UART (Universal Asynchronous Receiver Transmitter) transmission that has to be decoded and arranged. The graphing modules store a complete trace of the wave and create the graph to be displayed through VGA.
Figure 1. Block Diagram of video synchronization and frame drawing module
The first module required for this purpose takes the synchronization signals, that in the case of VGA are Horizontal and Vertical Sync, and counts them to obtain the location (X,Y) of the pixel being set in the frame. It resets in every frame and is configurable for any resolution. This count is then pipelined into the Graph Module, that is also given data from a UART receiver module, which stores every new data in RAM addressed by the X axis count. These two inputs are then used to create a wave graph according to the configuration inputs that allow the user to change any graph parameter: color, size, position, visibility, speed and mode (scroll or redraw). Figure 1 is a block diagram of the three mentioned modules, along with a VGA synchronization block that generates the corresponding H-Sync and V-Sync signals according to the resolution and frame rate to be displayed.
Every new data received in the Graph Module is the amplitude of a signal. They are cyclically written into a RAM which depth is equal to the width of the wave graphing area in the interface, i.e. 640 in a 640x480 screen where the wave graph takes the whole screen. Every row in RAM matches a pixel in the graphing area where pixel count X is the address of the data in RAM shifted by the X location of the wave graph area, and pixel count Y is compared to the amplitude of the data in that address, which is also shifted by the Y location of the wave graph area, when it is read. For Y the number is negated, taking into account that the pixel count starts at 0,0 that is the leftmost upper corner and the wave should increase towards that direction.
The X pixel count is then constantly reading this RAM, and the Graph Module determines whether the current pixel should display or not a wave dot when pixel count Y matches the data read. The outputs from the Graph Module are then the RGB colors configured by the user and a bit signal called “Graph ON” that is high when the pixel being displayed matches the data to be graphed. This results in a cyclical drawing of dots along the axes of the graph, like the ones pictured in image (a) of Figure 2, which does not represent a continuous graph. A rasterization algorithm is then required in order to obtain continuous lines joining the dots, so this process is also carried out by a logical HWD inside the Graph Module.
Figure 2. Data Rasterization. (a) Data as pixels. (b) Data as lines.
When a new amplitude data is read from RAM, the previous data is saved in a register and is used to determine if a line of pixels should be drawn above or under the new data pixel, to connect both dots. It is a simple rasterization algorithm that allows a continuous line to be drawn from a vector of data, these pixels are then going to generate a “Graph ON” signal when the following condition is true:
wire SlopeFX=
((CountY_in+line_width)>=((SDRAM_OUT_1[7:0]/Height)+ Pos_Y))?
((CountY_in<=((SDRAM_OUT[7:0]/Height)+ Pos_Y))&
((CountY_in+line_width)>=((SDRAM_OUT_1[7:0]/Height)+ Pos_Y))):
((CountY_in)<=((SDRAM_OUT_1[7:0]/Height)+ Pos_Y))?
(((CountY_in+line_width)>=((SDRAM_OUT[7:0]/Height)+ Pos_Y))&
(CountY_in<=((SDRAM_OUT_1[7:0]/Height)+ Pos_Y))):
1'b0;
Figure 2 (b) is a demonstration of how lines are drawn using this module, gray pixels are drawn by the rasterization conditions while black ones are read directly from RAM.
The configuration for a “scroll” display mode is done by incrementing the address register for RAM, giving the appearance that data is being shifted through RAM. When RAM is overflowed the address offset register resets and continues the process. The full I/O port list of the graph module is as follows:
module plot_graph(
//Inputs
CountX_in,//X-position of current pixel
CountY_in,//Y-position of current pixel
data_graph,//Input data to store in RAM
data_graph_rdy,//Input data ready
display_clk,//Graph module clock = Video clock
color_graph, //RGB format graph color
scroll_en,//Painting mode: Scroll or Redraw
Pos_X,//X-axis position for graph origin
Pos_Y,//Y-axis position for graph origin
Witdh,//Ammount of data to be graphed
Height,//Maximum wave amplitude
line_width,//Pixel width of the wave
Graph_en,//View graph or make invisible
reset_n,//Active-low reset
pause_graph,//Stop graphing new data
//Outputs
Graph_on,//Graph is showed in current pixel
R,//Red component
G,//Green component
B//Blue component
);
First Test
The first testing stage of these modules involves a comparison with a wave graphing system based solely on software, for which the Nios II soft-processor from Altera was chosen, along with the DE0-Nano Development board with external RAM(Random Access Memory) and Altera’s VIP Cores. As shown in Figure 3, the processor takes the UART input, graphing libraries and control of the video cores to perform the graphing process, data is written to RAM with functions that form the waves according to the video frame format that the frame reader can use. However, when processing multiple signals, this system presents stuttering and lost values caused by the sequential processing, rasterization and graphing of data with insufficient clock speeds required to display an optimal frame rate. This design is then used as a base to create a second test by merging it with the previous HWD design.
Figure 3. Block Diagram of the First Test
Second Test
In order to broaden the applications of the first HWD modules there needs to be an additional processing system that creates a video source to work as a background and for user interaction, on top of which the waves will be graphed. Figure 4 is an example of the video output used for the system tests, in which only the graphs are implemented with SW(1st Test) or HWD(2nd and 3rd Tests), while the picture and the labels are always drawn using SW. This background, labels and pictures are set by a system similar to the one used in the first test: a Nios II processor, VIP Cores and a user input (Touch, Keyboard or Mouse). The system uses a Frame Reader module to acquire sprites and drawings from RAM to use as buttons, text boxes, lines and pictures; these form a GUI (Graphic User Interface) with a dedicated area for wave graphing.
Figure 4. Video Output Example
Frames are then sent through a video converter or CVO (Clocked Video Output) that turns them into VGA or Digital RGB format so they can be run through the previously designed HWD modules. This results in a processing system dedicated to the GUI that is going to be updated with every interaction of the user, at very low frequencies, and a HWD system for the signal graphing that requires higher frequencies related to data acquisition and visualization. Data acquired through a UART module attached to the Nios II processor is decoded and then sent to the Graph Module where it rasterizes the data into waves to be blended with the background. The resulting system is pictured by the block diagram in Figure 5. Control signals for wave parameters like color and location are sent from the Nios II processor to the graphing module. This approach shows a performance upgrade from the previous system but the task for data acquisition and decoding is still demanding processor resources that can be outsourced.
Figure 5. Block Diagram of the Second Test
Third Test
For the final design, the UART data decoding and arranging is taken out of the processor and put into a custom HWD module, distributing the design and releasing more processing tasks out of the Nios II, as shown in Figure 6. Similar to the first HWD, this design shows a wave graph on the display if no other video has been set by the Nios II and VIP modules. It keeps the configuration ports attached to the Nios II allowing the customization of the graphs through interaction of the user, whether it is by using a touch display or through other input devices attached to the Nios II. The Nios II processor uses the VIP modules to lay a background and GUI (Graphical User Interface), resulting in a complete system for the visualization of signal graphs with the possibility for interaction and custom interface designs.
Additional development involved the use of LCDs instead of VGA monitors, which requires a different Video Frame Converter and connection of the video signals to the “Pixel Count” module, without regarding the other components of the design as RGB data is kept the same way. This serves as an example of the flexibility offered by this development, by performing equally in both Xilinx and Altera and with different display technologies with only a few modifications. These changes can be done due to a pure Verilog description of the modules without depending on device primitives or additional software.
Figure 6. Block Diagram of the Third Test
Results
Each of the three testing systems was run with different parameters, varying frequencies and all versions of the Nios II processor (Fast, Standard and Economic) that can achieve 170, 96 and 23 DMIPS (Dhrystone Million Instructions Per Second) correspondingly at 150 MHz. The results acquired from these tests can be divided in three main characteristics: Each Graph Module's implementation, Frames Per Second (FPS) and power dissipation.
Table 1 is a detailed description of how each Graph Module is implemented in the FPGA for every test. Although the resolution values and frequency are variable, the remaining values are exactly the same for every instance that is added to the design. Therefore, a system with 8 Graph modules will occupy 8 times 99 Logic Cells (LC) of combinational logic, 8 times 27 LC of register logic and 8 times 3840 bits of memory. This is the characteristic that allows for a better performance of the system, both in FPS as well as power.
Table 1. Characteristics of each graphing module
FPS are the most important unit of measure in these tests and the results shown in Figure 7 are the justification for this project. This graph shows the number of FPS that every test system can show using different parameters and with an increasing number of waves to be graphed. The first test system is run with the highest DMIPS processor, the second test is done with a Standard Nios II while the third test is run with the lowest performance processor. This however shows that while the first test is unable to keep a FPS rate of 60 Hz with more than one wave being graphed, the second test system is capable of graphing four, despite of having a lower performance Nios II and lower frequency. Furthermore, the third system is the only one capable of maintaining a stable 60 Hz FPS rate at the video output, while having the lowest performance Nios II, at 8 DMIPS (Nios II economic at 50 Hz). The same results can be achieved by the third test system using any other operational frequency for the Nios II processor, this is because the data acquisition and graphing is done independently.
Figure 7. FPS Comparison
The use of HWD modules dedicated to demanding tasks like data processing, decoding and video algorithms results in a direct power consumption reduction. These modules called “Accelerators” have the objective of releasing a task from a processor and they can be made by using tools such as C2H (C to Hardware) translators. However, the most accurate way of “Accelerating” any task is by creating a custom HWD module from scratch, although it is very time consuming for any project. The tasks chosen to be accelerated need to have a behavior that can be parallelized and successfully converted to HWD, video processing is a good example.
An approximation of the power dissipation of every HWD module can be derived from the device and described system's characteristics, such as frequency, LC and operating temperature (25°C room temperature by default), the Quartus II software from Altera gives an approximate Total Thermal Power that is a sum of the Dynamic, Static and I/O power estimated in the design . This value is the one used in this project to compare each of the evaluated systems, by taking the minimum power at which the compared system is capable of graphing one wave at 60 Hz, the results are shown in Figure 8.
Figure 8. Minimum power at 60FPS
The decrease that can be seen between the first and second tests relates only to the Graph Module as it was shown in the previous block diagrams, however, when the third system is tested at 120 MHz the power reduction relates not only to the Graph Module but also to the UART receiver and decoder that is now a HWD and is not being done by SW, thereby the system is capable of displaying a 60 HZ FPS video output while having a lower performance Nios II that is also reducing power consumption. Further reduction of power in the latter examples can be expected due to the relation between frequency and power in digital circuits, the Nios II processor frequency can be reduced, whereas the Graph module’s frequency remains as 40 MHz.
Conclusion
The final design, used as the third testing system, shows a liable behavior under demanding conditions like multiple graphs and complex data frames, making it appropriate for FPGA-based embedded systems or SoC (System on-a Chip) implementations where a high level of reliability is required in the continuity and stability of real-time multiple wave graphing (i.e Digital Oscilloscopes, Spectrum Analyzers, Vital Signs Monitors or Audio players).
The system offers ease of scalability and real-time processing while keeping a lower memory usage due to the use of raw data vectors and the re-processing of it to obtain the wave graph on every frame read, instead of storing the processed matrices or graphical objects. This is also more energy efficient despite of having to put the data through the combinational logic that determines the waves to be graphed every single time.
In regard to power consumption the module showed a noticeable upgrade from a typical application using the Nios II to write the video frames in memory, this is even more relevant when the system requires the graphing of more than one wave. Although for this kind of values the tests were limited, taking into account that between the two systems (first and third), only the final system was capable of displaying eight wave graphs while still keeping a stable FPS rate. However, compared to the second system there was always a decrease in power related to the implementation of UART data reception and decoding with a custom HWD module instead of the Avalon interface used in the previous implementations (e.g 0.64 mW for the UART receiver and decoder).
Future work on this system aims towards the standardization of the system modules by using a more complex HWD for the rasterization algorithm (e.g. Bresenham's Algorithm). As a result the Graph Module should be able to still perform real-time processing of the data points with configurable width and extend its application to dynamic X axis data. This Graph Module's visualization could then be optimized by adding a function for lines and markers, and also upgraded to be used through Avalon or Wishbone bus standards, improving its applicability in commercial FPGA systems.