

# PCI Express Demos for the ECP5™ PCI Express Board

**User Guide** 



### Introduction

This guide describes how to start using the ECP5<sup>™</sup> PCI Express Board, a low-cost platform to demonstrate the PCI Express reference design, for evaluating solutions for your own specific application.

This guide will familiarize you with the process of setting up your PCI Express development environment. This document assumes you do not have any associated tools installed on your system.

The demos discussed in this document include the PCI Express Basic Demo, PCI Express Throughput Demo and PCI Express Scatter-Gather DMA Demos (EBR or DDR3 based).

### **Learning Objectives**

After you complete the steps in this guide, you will be able to do the following:

- Set up the ECP5 PCI Express Board properly and become familiar with its main features.
- Install all applicable development tools and the PCI Express demonstration applications.
- Establish communication with the ECP5 PCI Express Board through the PCI Express link.
- Run the PCI Express Basic demo, that allows you to run the preset LED light sequence, interactively light LED segments, and familiarize yourself with other features of the software.
- Run the PCI Express Throughput demo, that allows you to see the performance of the Lattice PCI Express SERDES hardware and PCI Express Endpoint IP core in terms of maximum data rates for writes/reads to and from your system memory.
- Run the DMA demos and observe how the Scatter-Gather DMA IP core, together with the PCI Express Endpoint
  IP core, demonstrate data transfer between the DDR3 memory controlled by the DDR3 SDRAM Controller IP
  core, the Lattice FPGA and system memory using the ECP5 PCI Express Board.
- Use what each demo teaches you about designing Lattice PCI Express solutions.
- Become familiar with an approach that will enable you to modify and rebuild the PCI Express Basic demo for your own purposes.
- Become familiar with the software development tools and major design flow steps employed in this kit.
- Use other existing documentation in conjunction with this guide.

You can obtain more detailed information on specific board features by referring to EB91, ECP5 PCI Express Board User's Guide.

In addition to reading this guide, you should visit the ECP5 PCI Express Board web page on the Lattice web site and familiarize yourself with the set of other documents related to PCI Express.

This document assumes that you have already installed Lattice Diamond<sup>®</sup> design software and are familiar with basic tasks. If not, please refer to the Diamond Help system.



### **Related Documentation**

In addition to using this guide to help you get started developing the PCI Express solution on your device, you can refer to other documents applicable to this guide that may contain more detailed information that is beyond the scope of this guide.

All of the following documents can be obtained on the Lattice web site:

- EB91, ECP5 PCI Express Board User's Guide Describes board features, power requirements, device programming, clock management, and board schematics in detail.
- IPUG112, PCI Express x1/x2/x4 Endpoint IP Core User's Guide Describes the features that the x1, x2 and x4 Endpoint IP Cores support and provides a functional description of the IPs, parameters, signals, port lists, timing diagrams, memory maps, and step-by-step procedures for creating the core in Clarity Designer.
- UG06, PCI Express Scatter-Gather DMA Demo Verilog Source Code User's Guide Provides details of the Verilog code used for the Lattice PCI Express Scatter-Gather DMA demo application.
- UG07, PCI Express Throughput Demo Verilog Source Code User's Guide Provides details of the Verilog code
  used for the Lattice PCI Express Throughput Demo (also known as the Stored FIFO Interface (SSIF) Demo).
- UG15, PCI Express Basic Demo Verilog Source Code User's Guide Provides details of the Verilog code used for the Lattice PCI Express Basic demo, a block diagram of the design, and descriptions of design modules. Instructions for building the demo design in Diamond or ispLEVER are also included.
- ECP5 PCI Express Board web page Visit this web page on the Lattice web site for updates on this and other related documents. You can download kit installation files from this page.

### **Hardware Requirements**

To install the kit design and run the demo software, a single computer with a PCI Express x16, x8, x4, or x1 slot is required. You must also have a powered USB port. All of the other hardware and drivers are included in the kit.

Note: Up to 4Gb of memory (system RAM) is supported for 64-bit systems.

# **Software Requirements**

Please be aware of the following software requirements to ensure you obtain the expected results for the procedures described in this guide:

- For Windows, the Lattice PCI Express IP demo is compatible with Microsoft Windows 7/Vista/XP/2000, or Windows Server 2003 32-bit platforms.
- The Verilog HDL demo design projects in this kit are built with Diamond.
- If you are using Lattice Diamond Programmer software for your FPGA configurations, ensure you are using the latest version for proper bitstream downloading results. You can obtain the standalone programmer software on the Programmer and Deployment Tool Software web page.
- To develop PCI Express designs, your computer must meet minimum system requirements as described in the Diamond 3.2 Installation Notice for Windows.



### Installing the ECP5 PCI Express Board PCI Express Demos

This section provides Windows installation instructions for the ECP5 PCI Express demo files and demo applications.

To install the ECP5 PCI Express Board in Windows:

- 1. Using a web browser, go to the ECP5 PCI Express Board Setup (Windows) page, and download the Windows setup.exe file: **DK-ECP5-PCIE-setup.exe**.
- 2. Double-click on the **DK-ECP5-PCIE-setup.exe** file.
- If the Install Program as Other User dialog appears, choose to install as the current user if you have Administrator privileges or select another user with those privileges and click OK.
- 4. Click Next to start the installation. You must have administrative privileges to install the kit.
- Click Yes to accept the license agreement.
- Click Next to install the kit in the default C:\Lattice\_DevKits location on your hard drive or install in the desired location by using the Browse button.
- 7. If desired, click **Yes** to accept the prompt to create shortcut desktop icons for the kit demo applications.
- 8. Click **Finish** to complete the kit installation. Figure 1 shows the directory structure of the installed evaluation board.

Figure 1. Installed ECP5 PCI Express Board PCI Express Development Kit Directory Structure (Windows)



Figure 1 shows the default installation path for Windows. Please note that whenever the kit directory or <kit\_dir> is referred to in this document, it refers to the <install\_dir>\DK-ECP5-PCIE-XXX\ file path, where the default <install\_dir> path is C:\Lattice\_Devkits and XXX is the kit revision number.

After you install the development kit, you will also see Start Menu shortcuts for running demo applications if you opted to create them during the installation. Please see the following sections for further sequential instructions.



### **Hardware Installation**

The procedures in this section provide step-by-step instructions for installing hardware and drivers to ensure proper board and PC communication and operation.

After board setup, you can install the hardware. You must have administrative privileges on Windows to perform this installation.

Caution: Lattice is not liable for any loss of data or damages that may result from the installation of the hardware and execution of the kit demo software tools. Do not install and operate on mission-critical systems.

To install the ECP5 PCI Express Board for Windows:

- 1. Shut down Windows, turn off the PC and unplug the power cord.
  - IMPORTANT: This step is necessary because PC power supplies have standby voltages that are present even when the PC power light and fan are turned off. Unplugging the PC is the safest way to ensure the board will not be "hot-swapped".
- 2. Locate an available PCI Express slot. The board can be installed in any slot that is larger than the finger edges in use, x1, x4, or x16.
- Ensure that the board is not connected to any external power supply before proceeding.
- 4. Using ESD precautions, install the ECP5 PCI Express Board in the PCI Express slot in the x1 position.
- 5. Power-on the PC and observe that it boots normally to the Windows login screen. If anything abnormal occurs, refer to Appendix A. Troubleshooting.
- 6. Log in as a user with administrative privileges. During the login process Windows will detect the new hardware and ask if you want to install it.

#### **Installing Drivers**

This section describes installation of the ECP5 PCI Express Board device driver software on a Windows PC. This procedure pertains specifically to the PCI Express Basic demo application. For the PCI Express Throughput and DMA demos, you will need to load the appropriate drivers from the respective demo folders.

Also note that this procedure describes installation on the Windows 7 and Windows XP platforms only. This may vary slightly on Windows 2000 or Windows Server 2003.

Note: The Found New Hardware popup dialog in Windows appears when the PC is first booted with the board installed. If this screen does not appear, the board was not properly detected by the PC BIOS or by Windows. Refer to Appendix A. Troubleshooting for more information.

To install the ECP5 PCI Express Board drivers on Windows 7:

- 1. Go to Computer properties Device Manager. Select the PCI Express device under the Other Devices tab.
- 2. Right-click and choose Update Driver Software.
- Use the Browse tab to navigate to where you have installed the demo package, locate the Demonstration\PCleBasic directory path on the top level of the directory, and select the Driver folder.
  - Note: For the PCI Express Throughput and DMA demos used in this kit, you must install the appropriate drivers located in similar directory paths in the PCIeThruput and PCIeDMA folders, respectively.
- 4. Click Next. Allow the software to install by selecting Install on the pop-up window. Windows now copies the driver files and will display a screen indicating this. Upon completion, a capital "I" representing initialization will be displayed on the 14-segment LED.



To verify proper driver installation and device recognition on Windows:

- Go back to **Device Manager**. The board (**LSC\_PClexpress**) should be in the list of hardware devices in your system.
- 2. Right-click on the Lattice evaluation board icon and select **Properties** to show the resources assigned to the device and the driver information.

Memory ranges corresponding to the configured BAR registers will be assigned to the board. If this is all present, then the demo program is able to run and access the hardware on the board.

To install the ECP5 PCI Express Board drivers on Windows XP:

- 1. In the Found New Hardware popup dialog, choose the Install from specific location option and click Next.
- 2. Use the **Browse** button to navigate to where you installed the demo package, locate the **Demonstration\PCle-Basic** directory path on the top level of the kit directory, and select the **Driver** folder.
  - Note: For the PCI Express Throughput and DMA demos used in this kit, you must install the appropriate drivers located in similar directory paths in the PCIeThruput and PCIeDMA folders, respectively.
- 3. Click **Next**. Windows now copies the driver files and will display a screen indicating this. Upon completion, a capital "I" representing initialization will be displayed on the 14-segment LED.

To verify proper driver installation and device recognition on Windows:

- 1. Right click on the **My Computer** icon on your Windows desktop and choose **Properties** from the popup menu. Confirm the installation and verify that Windows properly detects the ECP5 PCI Express Board hardware.
- 2. In the System Properties dialog, choose the **Hardware** tab and click the **Device Manager** button. The board (LSC\_PClexpress) should be in the list of hardware devices in your system.
- 3. Right-click on the **Lattice evaluation board icon** and select **Properties** to show the resources assigned to the device and the driver information.

Memory ranges corresponding to the configured BAR registers will be assigned to the board. If this is all present, then the demo program is able to run and access the hardware on the board.

#### **Installing Hardware into a Different Slot**

Windows identifies PCI/PCI Express hardware devices using the bus, slot, vendor ID, and device ID fields. If you install the board into a different slot, the slot number will change. This will cause Windows to display the Found New Hardware popup screen when the system powers up.

The full procedure described above for installing the driver is unnecessary since the driver has already been installed. If the board is installed in a new slot, simply choose to allow Windows to search for the driver or the Install the Software Automatically (Recommended) option and install automatically. Windows will then associate the newly created device registry tag (bus, slot, vendor and device ID) with the Iscpcie.sys driver and the demo GUI will work with the board in the new slot.

Now that your board is set up and hardware is installed on your computer, you can proceed on to the next section that describes software installation, execution, and tasks to complete the demo.



### **Verifying Correct Board Operation**

The section lists checks you should make to ensure proper functioning of the board. Also refer to related documentation on this board described in EB91, ECP5 PCI Express Board User's Guide.

There are eight User-defined LEDs provided on the ECP5 PCI Express Board. Table 1 lists the definitions of the eight LEDs used in the PCI Express Basic demo 14-segment Display Control. LED1 and LED2 are the two status LED lights (DL\_UP and L0) that will go through a light sequence when the device is first powered on. To verify the PCI Express link is functioning properly, examine these indicators at the time of powering up. The PCI Express demonstration software used later in the kit verifies board operation. In addition, you can also check that the status LED lights are functioning at normal conditions in the sections below.

Note: All boards leave the manufacturer fully tested. See EB91, ECP5 PCI Express Board User's Guide for details.

#### **LED Definitions**

The User-defined LEDs on the ECP5 PCI Express Board are located horizontally along the bottom edge, middle portion of the board.

The LEDs are in the following order and have the following functions, as shown in the tables below.

Table 1. PCI Express Basic Demo LED Definitions

| LED Number | FPGA Ball<br>Number | LED Color | Definition                                                                                                     |
|------------|---------------------|-----------|----------------------------------------------------------------------------------------------------------------|
| LED1       | AM28                | Red       | DL_UP, Data Link Up, ready for packets at Transaction Layer. LED1 is lighted when the PCI enumeration is done. |
| LED2       | AL28                | Red       | L0 state active (training sequence completed; PHY Layer up and ready for flow control.                         |
| LED3       | AM29                | Red       | 14-segment Display 'a' segment                                                                                 |
| LED4       | AK28                | Red       | 14-segment Display 'b' segment                                                                                 |
| LED5       | AK32                | Red       | 14-segment Display 'c' segment                                                                                 |
| LED6       | AM30                | Red       | 14-segment Display 'd' segment                                                                                 |
| LED7       | AJ32                | Red       | 14-segment Display 'e' segment                                                                                 |
| LED8       | AL30                | Red       | 14-segment Display 'f' segment                                                                                 |



### **Running the PCI Express Basic Demo**

Once you have installed your ECP5 PCI Express Board in your computer and installed all necessary software, you can run the PCI Express Basic demo which consists of hardware, IP and software. This part of the document describes what you need to know to get started and successfully complete this demo.

### **Before You Begin**

Before beginning this demo, you must do the following:

• Use Diamond Programmer to download the bitstream for this demo to the board.

You can find the x1 bitstream and the XCF file necessary for Diamond Programmer in the <kit\_dir>\Demonstration\PCleBasic\Bitstreams directory path.

For general information on ispDOWNLOAD cable and Diamond Programmer software usage, see EB91, ECP5 PCI Express Board User's Guide.

• Install the board drivers for the application.

You can find the driver files necessary for proper demo installation in the <kit\_dir>\Demonstration\PCleBasic\Driver directory path. See the procedure described in the Installing Drivers section of this document.

#### Resource References

Please be aware of supplementary companion documentation when using this demo.

#### **Hardware Resources**

The PCI Express Basic Demo x1 bitstream is built from the Diamond project located in the Hardware\PCIe\_x1\ECP5\_PCIeBasic\Implementation\ecp5-85F\_PCIeBasic directory. The Verilog source code is located in the project Source\ directory.

The Verilog design architecture is explained in UG15, Lattice PCI Express Basic Demo Verilog Source Code User's Guide. This document describes the purpose and functionality of the Verilog modules used in PCIe Basic Demo design.

#### **Software Resources**

The PCI Express Basic demo uses the Iscpcie2.sys device driver. The source code for this device driver is located in Software\Iscpcie2\_Win7\Driver or Software\Iscpcie2\_Win2kXP\drvr. The architecture of the Iscpcie2 device driver is explained in the Iscpcie2 Driver Reference Manual which can be accessed through the Software\PCIeDocIndex.html link.

The PCI Express Basic demo application source code is located in Software\PCIeBasic\_Win2kXP\BasicGUI\DemoUI. This directory contains the Java project source code to create the user interface. The GUI also uses the PCIeAPI\_Lib\_Win2kXP API library. Both of these codes are common for Windows 7 and Windows XP.

The architecture of the PCIe Basic Demo application is explained in the PCIe Basic Demo Reference Manual and the PCIe API Reference Manual which can be accessed through the Software\PCIeDocIndex.html link.



### **Basic Demo Operations Overview**

The PCI Express Basic demo shows the capabilities of the Lattice FPGA and the PCI Express Endpoint IP core functioning in a PCI Express slot in a Windows PC. The demo is easy to use and requires no test equipment.

This demo software allows you to access memory and registers on the board and provides real-time interaction with the ECP5 PCI Express Board hardware to demonstrate a functional PCI Express communications path between the application and driver software (running on the PC CPU) and the FPGA IP. Device driver and application source code are available so you can modify and extend the behavior of the tests or use them as a starting point for new PCIe designs.

If you experience any problems running this demo, please refer to Appendix A. Troubleshooting.

### **Running the PCI Express Basic Demo Software**

This section describes how to run the PCI Express Basic demo software after installation. You can access the PCI Express Basic demo software from the Windows Start Menu.

To run the PCI Express Basic demo software from your PC:

1. Go to the Demonstration > PCleBasic directory and run PCleBasic.bat.

The graphical user interface opens the PCI Express Basic demo software with the Device Info tab activated as shown in Figure 2.

Figure 2. PCI Express Demo Device Info Page



The Device Info page displays information about the device driver and the device's PCI configuration registers. The data displayed is for informational purposes only and cannot be edited. Descriptions of all of the information you can view in this page are available in the Touring the PCI Express Basic Demo Interface section of this document.



### **Touring the PCI Express Basic Demo Interface**

This section describes the pages and features of the PCI Express Basic demo software interface.

1. In the Device Info page, click on the **Device Info** sub tabs and observe the structure of the information that is displayed in each. Table 2 describes the information available for viewing by clicking each of the sub tabs at the bottom of the dialog.

Table 2. Device Info Page Sub Tab Descriptions

| Sub Tab Page      | Information Description                                                                                                                                                                           |
|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Driver Info       | Obtained from the board PCI Config space registers by the Lattice PCI Express driver when the demo is started. Displays Windows resources assigned to the device driver to access the board.      |
| Config Regs       | Displays the standard PCI Config type 0 registers with each field annotated. Displaying this page causes the driver to issue PCI Config Type 0 read requests and re-displays the register values. |
| Capabilities Regs | Displays the PCI Express capabilities structures that are found in the register range 0x40 to 0xff. The applicable bit fields of the registers are parsed and displayed in readable format.       |
| Extended Regs     | Displays PCI Express extended configuration registers which are not used in this demo. These are inaccessible through the PC.                                                                     |

2. Click on the 14 Seg tab to see the contents of the 14 Segment Control page. In this page, you will be running a demonstration LED sequence and controlling the display on your board from this console. See Figure 3.

The 14-Segment Control page provides a way to interactively light segments on the display. You can preset character sequences from this page or select single characters and run them to light the display.

The states of the LED segments are converted to a 14-bit word value (each segment is controlled by a bit) and written to the LED control register in the GPIO portion of the IP in the FPGA. This demonstrates a memory write across the PCI Express bus.

3. In the 14 Segment Control page, click the **Run** button. Notice on the board how the sequence of LED lights run in a certain pattern on the display.

The 14-segment display has two test modes. In the first mode demonstrated here, a pre-set sequence of segments are lit and characters are written to the display.

This LED sequence run takes approximately 30 seconds to complete. You must observe the 14-segment LEDs to see if it is operating correctly. The correct sequence is:

- a. Light all segments, one at a time, around the perimeter.
- b. Light all inner segments in a clock-wise order.
- c. Turn off all inner segments in reverse order.
- d. Turn off all outer segments in reverse order.
- e. Write the characters "LATTICE\*" one at a time to the display.
- f. The "\*" will be displayed when the test ends.



Figure 3. PCI Express Basic Demo 14 Segment Control Page



See Table 3 for details about features on the 14-Segment Control page.

Table 3. 14 Segment Control Page Features

| Feature     | Description                                                                                          |
|-------------|------------------------------------------------------------------------------------------------------|
| LED Display | Allows you to interactively change the LED display using mouse clicks to toggle segments on and off. |
| RUN         | Starts an LED light sequence or command operation.                                                   |
| SET         | Sets a user-defined LED light command operation based on input characters in the text box.           |
| CLEAR       | Turns off all segments in the display.                                                               |

4. Click on any segment in the interactive segment display in the 14 Segment Control page. Notice that any selection will immediately cause the corresponding segment on the LED to light on your board's LED display.

Clicking on a segment will turn it on or off (toggles). The 14-bit value written to the LED register in the FPGA is shown on the bottom left.

Type any character in the text box and click the Set button. The character will be configured in the display.

This second mode of operation allows a single character to be sent to the display. Any printable ASCII character can be displayed (lower case is displayed as upper case). You cannot write a blank character using Set.

6. Click the **Clear** button. This turns off all segments of the LED display. Right-clicking on the background area behind the segments will clear the entire display.

The interactive 14 Segment Control page demonstration you just performed illustrates that a functional PCI Express communications path exists between the application and driver software that is running on the CPU and the FPGA IP.

7. Click on the **Memory** tab to open the Memory page. The Memory tab has various memory access tests that can be run to show that the IP is accessible from host software via the PCI Express bus. See Figure 4.

The page contains text boxes for entering data to be sent to device registers in the FPGA design. These text boxes are color coded to indicate the data format they accept. See Table 4 for details about these codes.



Table 4. Memory Page Text Box Color Codes

| Color Code | Description                                                                                              |
|------------|----------------------------------------------------------------------------------------------------------|
| Green      | Indicates hex value fields. Do not include any prefixes (0x) or suffixes (H). Only digits are allowable. |
| Yellow     | Indicates character string fields, e.g., ones containing file names, paths, or letter values.            |
| Blue       | Indicates decimal (base 10) value fields.                                                                |

The Memory Page features allow you to test the access to the 16 KB of EBR internal to the FPGA. Accesses are done on a byte basis. All 16 KB memory locations are accessed successively, testing the PCI Express link to the memory interface. See Table 5 to for a list of the actions can be performed in this page.

Figure 4. PCI Express Basic Demo Memory Page





Table 5 provides descriptions of the Memory page features.

Table 5. Memory Page Features

| Feature       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Pattern Tests | Pressing <b>Run</b> starts a test to check that all locations of the EBR can be read and written and that the contents are correct. First, all 16 KB are cleared to 0 and verified. Then various patterns (AA, 55, 01, FF) are written to all locations and verified. If everything passes, PASS is displayed. If a memory location has an incorrect value the test aborts and displays ERRORS! The memory contents are left with an incrementing pattern 00 01 02 that is displayed when the test successfully finishes. |
| READ          | The contents of the EBR memory are read from the value entered in the offset field. 256 bytes are read and displayed in the list box above.                                                                                                                                                                                                                                                                                                                                                                               |
| CLEAR         | Sets all 16 KB to 0.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| FILL          | Writes the byte value entered in the field to all 16KB locations.                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| LOAD          | Loads 16 KB of binary data from the file specified (or as much data as is in the file) into EBR memory, starting at location 0. This can be used to load a known pattern into the EBR memory by using a file created by another tool.                                                                                                                                                                                                                                                                                     |
| SAVE          | Writes all 16 KB of EBR memory to the file specified. This can be used to save the contents of EBR memory for off-line processing (i.e., to verify that the pattern loaded in with LOAD is correctly saved in the EBR).                                                                                                                                                                                                                                                                                                   |

8. Click on the **Counter tab** to open the Counter page. The Counter page allows you to control a 32-bit down counter in the FPGA hardware. The page is illustrated in Figure 5. Table 6 provides descriptions of the page's features.

The counter is driven by the 125 MHz clock that feeds the IP. The counter is started by selecting the **Start** radio button. Counting begins from the value entered into the Reload Value field. The current count value is displayed in the Current Count field.

Figure 5. PCI Express Basic Demo Counter Page





Table 6 provides descriptions of all of the Counter page features.

Table 6. Counter Page Features

| Feature       | Description                                                                                                                                                                                                                                           |
|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| START/STOP    | Starts and stops the 32-bit down counter in the FPGA hardware.                                                                                                                                                                                        |
| Current Count | Displays the current count value.                                                                                                                                                                                                                     |
| Reload Value  | Sets the number from which counting begins.                                                                                                                                                                                                           |
| DIP Switch    | The DIP switch section shows that user changes to the switches on the board are seen by the application software on the PC. The GUI polls the DIP switch register 10 times per second and displays the value read from the 8-bit DIP switch register. |
| Get button    | Used to immediately update the value. This is active if <b>No Polling</b> was selected from the Settings drop-down menu.                                                                                                                              |

9. Finally, click on the **Rd/Wr tab** to open the Read/Write page. The Read/Write page is used for looking at and poking at registers and EBR memory values in the application IP. Refer to Figure 6.

Figure 6. PCI Express Basic Demo Read/Write Page



The Read/Write page is primarily used for debugging and diagnosing the application IP registers. Table 7 provides descriptions of all of the Read/Write page features.

Table 7. Read/Write Page Features

| Feature         | Description                                                       |
|-----------------|-------------------------------------------------------------------|
| Memory Space    | Indicates the base address register (BAR) memory space to access. |
| Data Size       | Indicates bit size. Options are 8-bit, 16-bit, and 32-bit.        |
| Memory Contents | Displays memory contents.                                         |
| READ            | Starts a read data access based on offset and length settings.    |
| WRITE           | Starts a write data access based on offset and data settings.     |



Data accesses can be specified as byte, short or word operations by selecting Data Size. Access is done to the selected Base Address Register (BAR). The memory contents are displayed in the window. In the address, the upper nibble (31:28) specifies the BAR being accessed. The following example shows reading the EBR memory (BAR 1, starting at offset 0x1000) in the application IP and displaying them in word format.

Data can be written to registers using the WRITE button. Specify the BAR Offset to start writing at and the hex data in the Data field. Separate each value with a space. Data size should match the Data Size selected at the top of the page in Memory Settings.

### Rebuilding the PCI Express Basic Demo Design

You can rebuild the PCI Express Basic demo IP reference design by running the source HDL design files through a design flow in the Diamond software. All source HDL files and necessary project files are included in the kit installation. This document assumes that you have already installed Diamond and are familiar with basics tasks. Refer to Figure 1 to understand where various files referenced in this section are located.

We recommend that you copy the files from the installation location to a new working location. This allows you to quickly move back to the original configuration without re-installing this kit.

#### Implementing the PCI Express Basic Demo Design

The top.ldf Diamond project file is included in this kit. This file contains information regarding options to use when implementing the demo design. The top.lpf logical preference file specifies timing constraints and ECP5 I/O pin assignments with respect to the ECP5 PCI Express Board. The working directory is the implementation directory.

To implement the demo design using the HDL source flow:

- 1. Open Diamond.
- 2. Click File > Open Project.
- 3. In the Open Project dialog, navigate to and select the top.ldf file in the <kit\_dir>\Hardware\PCle\_x1\ ECP5\_PCleBasic\Implementation\ecp5-85F\_PCleBasic directory path.
- 4. Click Open. All of the Verilog HDL files are imported into the project.
- 5. Choose **Project > Active Strategy > Translate Design Settings**. Verify that Macro Search Path is set to the directory path **.../Source/ipexpress/ecp5/pciex1 for Windows**. Click **OK**.
- 6. In the File List pane of Diamond, right-click the device name at the top of the list, and chose **Properties** from the drop-down menu.
- 7. In the Project Properties dialog box, make sure that the following properties are selected: **ECP5UM** family, **LFE5UM-85F** device, **-8** speed grade, and a **CABGA756** package. Click **OK**.
- 8. In the Process pane of Diamond, double-click on the **Bitstream** file.



### Modifying the PCI Express Basic Demo Design

This section provides a very simple alteration to the HDL to demonstrate a change in the behavior of the function of the LED light in the demo display. It will involve a small change in the HDL code in the source file.

Note: Since the source is being changed, the resulting netlist may be different and the provided start point for Place & Route may no longer produce a design that meets timing. Running more iterations to achieve timing may be required. See the Place & Route Properties.

To modify the PCI Express Basic demo design:

- 1. Open the **top\_basic.v** file with an ASCII editing tool or the internal ASCII editing tool in Diamond. This file is located in the **kit\_dir\Hardware\PCle\_x1\ECP5\_PCleBasic\Source\ecp5** path, where **kit\_dir>** represents the path **cinstall dir>\DK-ECP5-PCIE-XXX** where XXX is the kit revision number.
- 2. On or about line 390 as shown below, delete the tilde character (~) that appears before the (led\_out\_int) parameter.

```
led_out <= ~(led_out_int);</pre>
```

3. After making this small change, click **File > Save** and close the editor.

This modification to the code will cause the 14-segment LED to operate in reverse, that is, all of the lights will be on when the demo starts instead of off.

- 4. Open the top.ldf project file in Diamond. This file is located in the Hardware\PCle\_x1\ECP5\_PCleBasic\ Implementation\ecp5um-85F\_PCleBasic folder.
- 5. Double click the **Bitstream File** process in the Processes window to generate a top.bit file in the directory <kit dir>\Hardware\Implementation.
- 6. Start Programmer and perform the steps described in the Programming Serial SPI Flash Memory section of EB91, ECP5 PCI Express Board User's Guide to download the new bitstream to the board's SPI flash memory.
- 7. Push the **PROGRAM** button on the board to program your device from SPI Flash memory.
- 8. Reboot the PC so that the BIOS recognizes the new PCI Express endpoint device configuration.
- 9. Verify that the status LEDs are correct and note that all the LED segments are now on.
- 10. Rerun the LED test described in the Touring the PCI Express Basic Demo Interface section of this document. Notice that the state of the 14-segment LED on the board is the inverse of what is displayed in the GUI.

You have now completed the Lattice PCI Express Basic demo and have successfully completed all of the learning objectives of this kit.



# **Running the PCI Express Throughput Demo**

This chapter describes the Lattice PCI Express Throughput demo that you can run within this kit on a Windows system (Windows 7, Windows XP, Windows 2000, Server2003).

### **Before You Begin**

Before beginning this demo, you must do the following operations.

• Use Diamond Programmer to download the bitstream for this demo to the ECP5 PCI Express Board.

You can find the x1 bitstream and the XCF file necessary for Diamond Programmer in the <kit\_dir>\Demonstration\PCleThruput\Bitstreams directory path.

For general information on ispDOWNLOAD cable and Diamond Programmer software usage, see the following:

- Download procedure described in the Programming Serial SPI Flash Memory section of EB91, ECP5 PCI Express Board User's Guide.
- General introduction to device configuration with ispDOWNLOAD cable and Diamond Programmer described in the Device Configuration Software and Cable section of this document.
- Diamond Programmer Help system
- Install the board drivers for the application.

You can find the driver files necessary for proper demo installation in the <a href="https://kit\_dir>\Demonstration\PCleThruput\Driver">kit\_dir>\Demonstration\PCleThruput\Driver</a> directory path. See the procedure described in the Installing Drivers section for guidance.

#### Resource References

Please be aware of supplementary companion documentation when using this demo.

#### **Hardware Resources**

The PCIe Throughput Demo x1 bitstream is built from the Diamond project located in **Hardware\PCIe x1\ ECP5\_PCIeThruput\Implementation\ecp5um-85F\_PCIeThruput**. The Verilog source code is located in the project Source\directory.

The Verilog design architecture is explained in UG07, PCI Express Throughput Demo Verilog Source Code User's Guide. This document describes the purpose and functionality of the Verilog modules used in PCIe Throughput Demo design.

#### **Software Resources**

The PCI Express Throughput Demo uses the Iscpcie2.sys device driver. The source code for this device driver is located in Software\Iscpcie2\_Win7\Driver or Software\Iscpcie2\_Win2kXP\drvr. The architecture of the Iscpcie2 device driver is explained in the Iscpcie2 Driver Reference Manual which can be accessed through the Software\PCIeDocIndex.html link.

The PCIe Throughput Demo application source code is located in

Software\PCIeSFIF\_Win2kXP\SFIF\_GUI\SFIF\_UI. This directory contains the Java project source code to create the user interface. The GUI also uses the PCIeAPI\_Lib\_Win2kXP API library. Both of these codes are common for Windows 7 and Windows XP.

The architecture of the PCI Express Throughput Demo application is explained in the PCI Express Throughput Demo Reference Manual and the PCI Express API Reference Manual that can be accessed through the Software\PCIeDocIndex.html link.



### **Throughput Demo Operations Overview**

The purpose of this demo is to show the performance of the Lattice PCI Express SERDES hardware and PCI Express Endpoint IP core when operating in a PC PCI Express expansion slot. The data rates for writes to the PC system memory and reads from the PC system memory are measured and displayed in a graphical user interface.

The demo performs Direct Memory Access (DMA) operations by transferring data directly to and from the PC memory. The demo uses an IP block named the SFIF (Stored FIFO InterFace) to generate read and write Transaction Layer Packets (TLPs) that will access the PC system memory. The SFIF exercises the PCI Express Endpoint IP core and link with low overhead so the true performance of the PCI Express core and link can be measured.

The PCI Express interface is used for both control plane and data plane traffic. The control plane loads the SFIF memory and sets up the transfer. The data plane transfers the data from the SFIF to the PC memory. Figure 7 shows the relationship of the hardware and software components of the demo. For more details on SFIF IP, register mapping and related topics, see the documents IPUG75, PCI Express 1.1 x1, x4 IP Core User's Guide and UG07, Lattice PCI Express Throughput Demo Verilog Source Code User's Guide.

Figure 7. PCI Express Throughput Block Diagram



The Throughput demo software allows you to set up different types of data transfers to understand the PCI Express link. You can select the type of transfer to perform (e.g., write, read, and write/read) as well as how many bytes of data to transfer. You also have the option of selecting the size of the TLP in which to perform the transfer.

Note: The PCI Express Throughput demo design requires at least 16 posted credits to use 128-byte write TLPs. This requirement is to optimize the throughput of the PCIe link. You can determine the amount of posted credits for the given slot in the GUI. If the posted credits are less than 32, then 64-byte write TLPs are the largest size supported.

If you experience any problems running this demo, please refer to Appendix A. Troubleshooting.



### **Running the Throughput Demo**

This section describes how to run the PCIe Throughput demo after installation. You can access the demo from the Windows Start Menu.

To run the PCIe Throughput demo from you PC:

1. Go to the Demonstration > PCleThruput directory and run Thruput.bat.

The graphical user interface opens the PCI Express Throughput demo software with the Device Info tab activated as shown in Figure 8.

Figure 8. Throughput Demo Device Info Page



The Device Info page displays information about the board's PCI config space registers, PCI Express capabilities, and root complex buffer sizes. Descriptions of the information you can view in this page are available in the Touring the PCI Express Throughput Demo Interface section of this document.



### **Touring the PCI Express Throughput Demo Interface**

This section describes the pages and features in the PCI Express Throughput Demo interface.

 In the Device Info page, click on the **Device Info sub tabs** and observe the structure of the information that is displayed in each. Table 8 describes the information available for viewing in the sub tabs at the bottom of the page.

Table 8. Device Info Page Sub Tab Descriptions

| Sub Tab Page      | Information Description                                                                                                                                                                                                                                                                                                                                        |
|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                   | Provides information about the device driver including the version, the resources used, and the transfer information.                                                                                                                                                                                                                                          |
| Driver Info       | The demo design uses the Iscpci2 driver. The demo requests two BARs (Base Address Registers) and a single interrupt vector. The Xfer Info box provides the buffer sizes for the root complex for Posted and Non-Posted TLPs. This information is important when considering the amount of credit waiting the demo design demonstrates when running a transfer. |
|                   | A root complex with larger buffers will provide better performance when running the demo since it will not have to release credits as quickly to allow the next TLP.                                                                                                                                                                                           |
| Config Regs       | Provides the standard PCI Type0 space configuration register contents. Things such as Device ID and Vendor ID are displayed and the assigned BARs.                                                                                                                                                                                                             |
| Capabilities Regs | Provides the link list of capability structures and their contents. Key information found in this box is the maximum TLP size supported by the root complex and the negotiated link width.                                                                                                                                                                     |

2. Click on the **Run Test** tab to see the contents of the Run Test page. This page operates the demo design. In this page, you will be running the demonstration to compute the throughput of the PCI Express link and display the transfer rates with bar graphs. You can select read, write and read-write throughput tests. See Figure 9.

Figure 9. Throughput Run Test Info Page



3. On the Run Test page, under Setup options, select the following:

Test Mode: Thruput

TLP Type: MWr

For the rest of the options, take the defaults.



4. On the Run Test page, click the **RUN** button. After running your test, notice the status indicators in the Performance section at right. The two top progress indicator bars for MRd (memory read TLPs), and MWr (memory write TLPs) will contain a percentage of blue which indicates throughput. The two progress indicator bars below that show the wait time for the root complex to accept TLPs over the entire time spent running.

See Table 9 for details about features on the Run Test Feature page.

Table 9. Run Test Page Feature Descriptions

| Feature   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Setup     | Sets the configuration for the specific test.                                                                                                                                                                                                                                                                                                                                                                                                        |
| Test Mode | There are two modes of operation, cycles and throughput.                                                                                                                                                                                                                                                                                                                                                                                             |
|           | <b>Throughput</b> - This mode allows the test to run continuously looping through the tx_fifo and updating the performance numbers every second. This test will run until the you click the STOP button.                                                                                                                                                                                                                                             |
|           | <b>Cycles</b> - This mode allows you to set up a specific number of times the tx_fifo will be looped. Once complete, the test will stop automatically and the performance numbers will be displayed based on the entire run. The cycle consists of all MRd TLPs or all MWr TLPs. The purpose is to validate that the correct number of TLPs was sent/received using the counters and View Memory page. See TLP Types in this table for descriptions. |
|           | The key difference between the two modes is how the performance data is displayed. A throughput test will provide new performance data every second. A cycle test will provide performance data after the number of cycles completes, or one second, whichever comes first. Note that cycles will not run longer than one second. Cycles tests run once; throughput tests run continuously until stopped.                                            |
|           | There are four types of TLP types which impact the type of "traffic" sent over the PCI Express link.                                                                                                                                                                                                                                                                                                                                                 |
|           | <b>MWr</b> – Memory Write TLPs to write data from the endpoint to the PC system memory.                                                                                                                                                                                                                                                                                                                                                              |
|           | MRd – Memory Read TLPs to read data from PC system memory to the endpoint.                                                                                                                                                                                                                                                                                                                                                                           |
| TLP Types | MRd+MWr – Both Memory Read and Memory Write TLPs are sent to the root complex.                                                                                                                                                                                                                                                                                                                                                                       |
|           | R+W+Ctl – Read, Write, and Control data are present on the PCI Express link. The Read and Write TLPs are sent from the SFIF while the PC is also modifying the GPIO 14-segment display LEDs. This TLP type shows both data and control plane TLPs sharing the PCI Express link.                                                                                                                                                                      |
|           | The TLP size controls allows you to select the size of the TLPs to be sent from the SFIF. The maximum size of the TLP will be dependent on the root complex. In MRd mode, the maximum TLP size is limited by Max Read Request size (512 bytes). In MWr mode, the maximum TLP size is limited by Max TLP Size (128 bytes).                                                                                                                            |
|           | In Read/Write mode the following sizes are available for MRd TLP and MWr TLP combinations.                                                                                                                                                                                                                                                                                                                                                           |
| TI D 0'   | 512,128 - 512-byte read requests with 128-byte write TLPs                                                                                                                                                                                                                                                                                                                                                                                            |
| TLP Size  | 256,128 – 256-byte read requests with 128-byte write TLPs                                                                                                                                                                                                                                                                                                                                                                                            |
|           | 128,128 - 128-byte read requests with 128-byte write TLPs                                                                                                                                                                                                                                                                                                                                                                                            |
|           | 64,64 – 64-byte read requests with 64-byte write TLP                                                                                                                                                                                                                                                                                                                                                                                                 |
|           | 32,32 – 32-byte read requests with 32-byte write TLPs                                                                                                                                                                                                                                                                                                                                                                                                |
|           | 16,16 – 16-byte read requests with 16-byte write TLPs                                                                                                                                                                                                                                                                                                                                                                                                |



# Table 9. Run Test Page Feature Descriptions (Continued)

| Feature        | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                | This control allows the user to select the ratio of read requests to write TLPs.                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Num TLPs       | <b>1Rd,1Wr</b> – This ratio results in one Read Request and 1 Write TLP back-to-back. The completion data will need to be received before another Read Request can be made.                                                                                                                                                                                                                                                                                                                        |
|                | <b>1Rd,4Wr</b> – This ratio results in one Read Request and four Write TLPs. The completion data must be received before another Read Request can be made. This results in much greater bandwidth since read requests are four times the size of a write TLP. This ratio balances the PCI Express link, but still waits for read data.                                                                                                                                                             |
|                | <b>4Rd,16Wr</b> – This ratio results in four Read Requests and 16 Write TLPs. This ratio allows for four read requests to be outstanding (TAGs). This ratio is only recommended on server class motherboards due to the high bandwidth required. With four reads outstanding the root complex can better utilize the read data.                                                                                                                                                                    |
|                | <b>16Rd,64Wr</b> – This ratio results in 16 Read Requests and 64 Write TLPs. This ratio allows for 16 read requests to be outstanding (TAGs). This ratio is only recommended on server class motherboards due to the high bandwidth required. With 16 read outstanding the root complex can better utilize the read data.                                                                                                                                                                          |
| Cycles         | This control is only available when the Cycle mode has been selected. This controls the number of times the tx_fifo is looped before ending the test. The software starts the SFIF and waits one second while the SFIF transfers data (number of cycles). After one second, the software stops the SFIF and displays the performance. This has the effect of limiting cycles tests to a maximum of one second of operation. The cycles value cannot be larger than 65535 (it is a 16-bit counter). |
|                | In Throughput mode this control is not used. In Throughput mode the SFIF is looping the tx_fifo continuously until the user presses the STOP button.                                                                                                                                                                                                                                                                                                                                               |
| ICG            | (Inter Cycle Gap) This control sets the number of 125 MHz clock cycles between cycles. You can use this control to model TLP traffic patterns that may be appropriate for your system. The ICG value cannot be larger than 65535 (it is a 16-bit counter).                                                                                                                                                                                                                                         |
| Controls       | Stops and starts test. Status shows an image of a running man to indicate test is in progress.                                                                                                                                                                                                                                                                                                                                                                                                     |
|                | Displays the current data rates and other statistics. Data rates are displayed as progress bars, with the rate (Mbps) displayed in the bar. The bars are updated every second when running in Throughput mode or upon completion in Cycle mode. The rates are computed from the hardware counters in the SFIF.                                                                                                                                                                                     |
|                | Write rates are computed from the following SFIF hardware counters: Tx TLP Count and Elapsed Count.                                                                                                                                                                                                                                                                                                                                                                                                |
|                | Write Rate (MB/sec) = (Tx TLP Count * TLP Size) / (Elapsed Count * 8ns)                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Performance    | Read rates are computed from the following SFIF hardware counters: Rx TLP Count and CpID Time-stamp.                                                                                                                                                                                                                                                                                                                                                                                               |
|                | Read Rate (MB/sec) = (Rx TLP Count * RCB_Size) / (Elapsed Count * 8ns)                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                | RCB_Size is the size in bytes of a CpID.                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                | In Throughput mode performance is recalculated every second and the counters are reset. In Cycles mode the performance is calculated once at the end of the run displaying the results for the entire transfer.                                                                                                                                                                                                                                                                                    |
| NP_CA and P_CA | Time spent waiting for the root complex to accept TLPs is computed and displayed as a bar graph in percentage of time waiting over time spent running. Normal efficient operation should show a small percentage of time spent waiting for credits and more time spent sending TLPs.                                                                                                                                                                                                               |
|                | Counters record when the SFIF wants to send a MRd but the credit available ports of the PCI Express core indicates the root complex has not yet processed the read requests. The PCI Express core is waiting to accept an UpdateFC-NP freeing up Non-Posted credits to send another MRd TLP.                                                                                                                                                                                                       |
|                | Counters record when the SFIF wants to send a MWr but the credit available ports of the PCI Express core indicates the root complex has not yet processed the sent write TLPs. The PCI Express core is waiting to accept an UpdateFC-P freeing up Posted credits to send another MWr TLP.                                                                                                                                                                                                          |
| Report         | Logs all of the test information. The Report box provides details about the test. In the Throughput mode this report will be updated every second up to 10 seconds. After 10 seconds the data is no longer updated in the report box to prevent system load and excessive resource usage during long duration tests (over night). In the Cycles mode the report window is updated when the test is complete.                                                                                       |



5. Click on the **View Memory tab** to open the View Memory page. Refer to Figure 10. Notice that this page allows you to inspect the memory contents of the PC system memory buffer and the SFIF rx\_fifo to check for data integrity. See the descriptions of these sub tabs in Table 10.

After inspecting this page, you can move on to the next chapter which describes the PCIe DMA demos.

Figure 10. Throughput Demo View Memory Page



See Table 10 for details about the features of the View Memory page

Table 10. View Memory Page Sub Tab Descriptions

| Sub Tab Page | Description                                                                                                                                                                                                                                                                                                                                                                    |
|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PC Mem Buf   | The PC system memory buffer sub tab allows you to inspect the contents of the PC system memory buffer allocated in the kernel space by the driver and is used for the source of MRd requests and destination for MWr TLPs. This can also be used to verify that the MWr TLPs have worked and that the data was transferred from the ECP5 PCI Express Board into system memory. |
| SFIF Rx FIFO | The SFIF Rx FIFO sub tab displays the parsed and formatted contents of the SFIF rx_fifo. This can be used to verify that a small burst of MRd TLPs have returned the proper data to the board. The TLPs are parsed and time stamped.                                                                                                                                           |



### **Running the PCI Express Scatter-Gather DMA Demos**

This section describes the Lattice PCI Express Direct Memory Access (DMA) demos that you can run within this kit on a Windows system (Windows 7, Windows XP, Windows 2000, Windows Server 2003).

The DMA demos illustrate Lattice PCI Express, DDR3 SDRAM Controller (optional) and Scatter-Gather DMA (SGDMA) IP cores working together to transfer data over the PCI Express bus. The Scatter-Gather DMA operates as a Master DMA, reading and writing data to PC system memory.

One demo illustrates moving large amounts of image data from the ECP5 PCI Express Board to PC system memory and display on the screen. The other demo implements a simple hardware image processor, in which pixel data from a source image on the screen is read by the board, modified by the hardware and written back and redisplayed. A test program is also provided that checks all the driver and IP functionality.

### **Before You Begin**

Before beginning this demo, you must do the following operations.

• Use Diamond Programmer to download the bitstream for this demo to the ECP5 PCI Express Board.

You can find the x1 bitstream and the XCF file necessary for Diamond Programmer in the <kit\_dir>\Demonstration\PCleDMA\Bitstreams directory path.

For general information on ispDOWNLOAD cable and Diamond Programmer software usage, see the following:

- Download procedure described in the Programming Serial SPI Flash Memory section of EB91, ECP5 PCI Express Board User's Guide.
- General introduction to device configuration with ispDOWNLOAD cable and Diamond Programmer described in the Device Configuration Software and Cable section of this document.
- Diamond Programmer Help system
- · Install the board drivers for the application.

You can find the driver files necessary for proper demo installation in the <a href="https://kit\_dir>\Demonstration\PCleDMA\Driver">kit\_dir>\Demonstration\PCleDMA\Driver</a> directory path. See the procedure described in the Installing Drivers section of this document.

#### Resource References

Please be aware of supplementary companion documentation when using this demo.

#### **Hardware Resources**

The PCI Express SGDMA Demo x1 bitstream is built from the Diamond project located in **Hardware\PCIe\_x1\ECP5\_PCIeSGDMA\Implementation\ecp5um-85F\_PCIeSGDMA**. The Verilog source code is located in the project Source\ directory.

The Verilog design architecture is explained in UG06 Lattice PCI Express x4 Scatter-Gather DMA Demo Verilog Source Code User's Guide. This document describes the purpose and functionality of the Verilog modules used in PCIe SGDMA Demo design.

#### **Software Resources**

The PCI Express SGDMA Demo uses the Iscdma.sys device driver. The source code for this device driver is located in Software\lscdma\_Win7\Driver or Software\lscdma\_Win2kXP\drvr. The architecture of the Iscdma device driver is explained in the Iscdma Driver Reference Manual that can be accessed through the Software\PCIeDocIndex.html link.

The PCI Express SGDMA Demo application source code is located in two directories corresponding to the two demos: Software\PCIeDMA\_Win2kXP\ColorBars and Software\PCIeDMA\_Win2kXP\ImageMove. These



applications use the OpenGL APIs for displaying the image data. The demos also use the PCleAPI\_Lib\_Win2kXP API library. Both of these codes are common for Windows 7 and Windows XP.

The architecture of the PCI Express SGDMA Demo application is explained in the PCI Express DMA Demo Reference Manual and PCI Express API Reference Manual that can be accessed through the **Software\PCIeDocIndex.html** link.

### **DMA Demo Operations Overview**

Direct Memory Access (DMA) is a method of transferring data from one memory mapped device to another. The data is transferred by a dedicated device that performs the bus cycle (memory reads and writes). The CPU is not involved in the actual data movement.

Using a dedicated DMA device frees the CPU to do other operations and also shortens the transfer time. If the CPU had to move the data, it would be done in a software loop which requires fetching, decoding and executing each instruction involved in the loop. This could easily expand to 10 or more instruction cycles per datum moved. A DMA engine could perform the same datum move operation in one to three bus clocks (depending on bus architecture).

In modern PC systems the DMA engine, the device responsible for performing the bus cycles to implement the transfer, is located on the add-in card. This is known as Bus Master DMA and is the preferred method of operation. The PCI bus is being phased out and replaced with the PCI Express bus. To take advantage of the high bandwidth that PCI Express offers, DMA is used to transfer the data between the add-in card and the system memory. The Lattice Scatter-Gather DMA IP works in conjunction with the Lattice PCI Express Endpoint IP core to transport the data.

The Scatter-Gather DMA IP core, together with the PCI Express Endpoint IP core, demonstrates moving data between the Lattice FPGA and PC system memory using a ECP5 PCI Express Board. The board uses the PCI Express link as both control (setup and operation of the core) and data path (DMA to/from PC system memory). The PC provides the test platform (power, run-time environment) and the user interface.

A PC platform is used because currently PCs are the only readily available, economical and standard platform utilizing PCI Express. A Windows device driver provides the interface to the board's register and memory space. Application software uses the driver to set up and configure the DMA engine, execute it, and verify the results. The demo system is illustrated in the block diagram shown in Figure 11.



Figure 11. DMA Demo Block Diagrams



The demo hardware has the following objectives:

- Acts as a reference design for using the PCI Express and Scatter-Gather DMA IP cores
- · Performs actual DMA transfers over the PCI Express bus at an optimal rate
- · Provides counters and timers to measure performance
- Provides a platform for demonstration and experimentation

The demo application software has the following objectives:

- Demonstrates accessing, configuring and operating the PCI Express, DDR3 SDRAM Controller and Scatter-Gather DMA IP cores
- Verifies proper operation (ensures all DMA data is transferred from source to destination un-corrupted)
- Demonstrate Windows driver and system programming so users can extend software for their own particular system needs
- System memory allocation (Memory Descriptor Lists)
- · Interrupt handling ISRs and DPCs



#### **Scatter-Gather DMA Overview**

Hardware devices perform Direct Memory Access (DMA) by initiating read/write bus transactions. DMA means transferring data to and from system memory directly, without involving the CPU.

Bus Master DMA means the device (the PCI Express Core on the board) is controlling the bus and doing the data transfers. In order to perform the transfer, an address is needed and a length. The SGDMA is configured by the software driver. The addresses known to software for describing a buffer's location in memory are only relevant in the domain of the CPU. The CPU (and software) view memory as virtual 2 GB address spaces per process. The DMA needs physical memory addresses.

When software allocates a large buffer of memory, the memory manager finds the number of required free pages (4 KB per page) in system memory and makes them appear contiguous to software via virtual memory translation tables in hardware. A 1 MB buffer allocated by the software appears contiguous to the software, but in reality is scattered throughout physical system memory in discontinuous 4 KB chunks. The magic of virtual memory makes it appear contiguous to software.

The SGDMA needs physical addresses to put on the bus and needs contiguous memory. In a simple flat memory architecture, the SGDMA could just take a starting address and a length of 1MB and transfer all data in one continuous operation. In virtual memory machines, the kernel and memory manager need to be enlisted at the driver level to create a map of the virtual memory to physical pages. In Windows, this mapping is known as a Memory Descriptor List (MDL).

The MDL is a Scatter-Gather List that maps virtual memory to physical page addresses. The device driver uses the MDL entries to program the buffer descriptors. Each buffer descriptor is programmed with the physical memory address and the length (usually one page, 4096 bytes). When the Scatter-Gather DMA channel is activated it reads the linked list of buffer descriptors and moves the data to that address, and then moves to the next buffer descriptor and next address until the end of the list is reached. Figure 12 illustrates this operation.

SGDMA Physical RAM User Buffer 3fff 3001a000 BD[0] BD[1] 2ff03000 D BD[2] D SG List 2fe08000 BD[3] Src: 4000 2fdff000 Dst: 2fe03400 0000

Figure 12. Scatter-Gather DMA Buffer Address Mapping

The buffer descriptors (BD[1, 2,...], shown on the left in Figure 12 have their destination addresses programmed to the start of the physical pages in memory. These pages in Physical RAM to the right of the buffer descriptors may not be contiguous or sequential in memory. The memory manager in the PC hardware uses the MDL or Scatter-Gather List (SG List) to make this set of pages appear contiguous to the application running in user space (Virtual Memory mode).

The Scatter-Gather DMA off-loads the processor and kernel by having the ability to perform this scattering of contiguous data (memory on the ECP5 PCI Express Board) to arbitrary memory pages, or for reading, to gather a set of discontinuous memory pages into a contiguous memory on the board.

See the code links to the DMATest.cpp, ColorBars.cpp and ImageMove.cpp files included with this kit. The source code is the best documentation of what is happening behind the scenes in the demos. To access this



documentation, go to the kit **Software** directory and open the **PCleDocIndex.html** document. Under the **Documentation** section click the hyperlink **PCle DMA Demo Reference Manual**. Click on the **File List** book in the navigation pane at left or the **Files** tab in the main pane on the page.

Note: The PCI Express Scatter-Gather DMA demo design requires at least 16 posted credits. This requirement is to optimize the throughput of the PCI Express link. You can determine the amount of posted credits for the given slot using the PCI Express Throughput demo and GUI. If the posted credits are less than 32, then the PCI Express Scatter-Gather DMA demo will not be able to run in the given slot.

If you experience problems running this demo, please refer to Appendix A. Troubleshooting of this document.

### **Running the DMA Demos**

This section describes how to run the DMA demos and refers you to documentation on the demos that describes what these applications demonstrate.

To run the ColorBars graphical DMA demo:

• Go to the Demonstration > PCIeDMA directory and run SGDMA\_CB.bat.

In this demo, image data is transferred from the board to PC memory and displayed. The ColorBars window displays a series of vertical colored bars in a gradient manner. See Figure 13. For details on this demo, see the PCI Express DMA ColorBars Demo section of this document.

To run the ImageMove graphical DMA demo:

• Go to the Demonstration > PCleDMA directory and run SGDMA IM.bat.

In this demo, image data is transferred from the PC to the board and then back to the software, which then displays a modified image on the screen. See Figure 14. For details on this demo, see the PCI Express DMA ImageMove Demo.

Figure 13. ColorBars Demo Window





Figure 14. ImageMove Demo Window



The 14-segment LED displays the real-time interrupt processing during execution of the ColorBars and ImageMove. The inner eight segments are the lower eight bits of the ISR routine counter. The outer eight segments are the lower eight bits of the DPC routine counter, where real processing is done. All segments (inner and outer) should be changing at a rapid rate during demo operation (interrupts after each DMA transfer) indicating that the hardware is operating and interrupts are being serviced.

A demo can error out displaying an error dialog under the following circumstances:

- The board is not recognized by hardware or the operating system.
- The driver is not loaded (bitstream not PCI Express demo).
- The PCI Express link is not a x1.
- · Driver can not access registers.
- Application or driver can not verify IP register IDs.
- · Another demo is running.

#### **Running Multiple DMA Demos**

Do not run more than one demo at a time. The ImageMove and ColorBars demos cannot be run at the same time because they are mutually exclusive. Each needs DMA channels in the Scatter-Gather DMA. The driver marks channels as in-use once a demo "opens" the channels. Starting another demo will fail when it attempts to open the same channels.



### **PCI Express DMA ColorBars Demo**

This program demonstrates the Lattice PCI Express Endpoint IP core and the Scatter-Gather DMA IP core operating on a ECP5 PCI Express Board. It transfers image data from the board to PC memory and software, which then displays it on the screen.

The image source is a block of IP operating as a FIFO. The IP tracks how many reads have been requested, and after eight complete rows have been read, it changes the color data provided with the next eight rows.

Figure 15. ColorBars Program Operation Flow



The image is displayed using OpenGL calls. The display rate is therefore also dependent on the OpenGL library and graphics subsystem hardware. Displaying an image is a quick way to illustrate that data has been moved. It would not be practical to display 1MB to the screen in a text dump, or save it to a file. An image provides a quick, visual way to observe a large transfer of data, and it can run continuously.

The image data is 1MB in size. Each DMA Read request is 1MB in size. After the hardware has transferred the pixel data, the API call returns and the software displays the image. This loop is repeated over and over. The data rate (frame rate) is displayed in the window title bar. The frame rate is roughly the throughput rate in Mbps (each frame = 1 MB). Frame rate is governed by the video refresh rate. Most video systems will not draw frames into video memory faster than the frame rate (waste of operations).

To see the key commands available for the ColorBars demo, refer to Table 11.

Table 11. DMA ColorBars Demo Keyboard Commands

| Key Command     | Description                                                                            |
|-----------------|----------------------------------------------------------------------------------------|
| <esc></esc>     | Terminates program and closes the window.                                              |
| <f1></f1>       | Draw blank image buffer only (do not generate data). This is the fastest rate.         |
| <f2></f2>       | Generate ColorBars data with the software. This will usually be the slowest data rate. |
| <f3></f3>       | Get image data from board from DMA transfer.                                           |
| <f4></f4>       | Draw a frame each second (slowly) so it can be viewed and the changes are visible.     |
| <space></space> | Pause/resume image transfer.                                                           |



### **PCI Express DMA ImageMove Demo**

This file demonstrates the Lattice PCI Express Endpoint IP core and the Scatter-Gather DMA IP core operating on a ECP5 PCI Express Board. It transfers image data from the PC to the board and then back to the software, which then displays a modified image on the screen.

Figure 16. ImageMove Program Operation Flow



The image is displayed using OpenGL calls. The display rate is therefore also dependent on the OpenGL library and graphics subsystem hardware. Displaying an image is a quick way to illustrate that data has been moved. An image provides a quick, visual way to observe a large transfer of data. Each image is 256 KB in size.

Below is the sequence of events as this demo image undergoes processing:

- 1. The image source is generated by rotating the triangle shape using OpenGL transform matrix. The resulting image is displayed on the screen.
- 2. The source image is read from the screen into the source buffer.
- 3. The source buffer is sent to the Image Filter memory on the ECP5 PCI Express Board. The memory is only 64 KB in size, so the image is sent in four chunks.
- 4. After a 64 KB chunk is transferred to the board, the 64 KB chunk is read back, with the pixels modified by the XOR function in the read path.
- 5. After four write/read chunks, the destination buffer contains the modified image and it is displayed on the screen.



Table 12. DMA ImageMove Demo Keyboard Commands

| Key Command     | Description                                              |  |
|-----------------|----------------------------------------------------------|--|
| <esc></esc>     | Terminates program and closes the window                 |  |
| <f1></f1>       | XOR filter set to 0xcc33aa55 (changes resulting display) |  |
| <f2></f2>       | XOR filter set to 0xf0f0f0f0 (changes resulting display) |  |
| <f3></f3>       | XOR filter set to 0x0f0f0f0f (changes resulting display) |  |
| <f4></f4>       | No filter is applied                                     |  |
| <space></space> | Pause/remove image display                               |  |

# **Technical Support Assistance**

e-mail: techsupport@latticesemi.com

Internet: www.latticesemi.com

# **Revision History**

| Date         | Version | Change Summary                            |
|--------------|---------|-------------------------------------------|
| January 2015 | 1.1     | Removed references to Linux installation. |
| August 2014  | 1.0     | Initial release.                          |

© 2015 Lattice Semiconductor Corp. All Lattice trademarks, registered trademarks, patents, and disclaimers are as listed at <a href="https://www.latticesemi.com/legal">www.latticesemi.com/legal</a>. All other brand or product names are trademarks or registered trademarks of their respective holders. The specifications and information herein are subject to change without notice.



# **Appendix A. Troubleshooting**

This appendix outlines some debug procedures to follow when experiencing trouble installing or running a demo on a Windows PC.

### **Troubleshooting Demo Software Installation**

The most likely installation issue that may arise for the kit demo software will be related to permissions. Depending upon the system security policies, you may need to have administrator privileges to install into certain directories, for example, the Program Files directory in Windows.

### **Troubleshooting Driver Installation**

- The board must be connected to the PC and recognized by Windows for the driver to be successfully installed. If
  you do not see the "Found New Hardware" message when logging in after installing the board, check the board
  LEDs. Try a different PCI Express slot.
- Make sure you specify the search location for the driver during installation. Specify that Windows should install from the **Demonstation\<demo\_name>\Drivers** directory.
- · You must have Administrator privileges to install device driver files.

### **Troubleshooting Demo Operation**

- The ECP5 PCI Express Board must be installed in the PC and recognized by Windows for the driver to be successfully installed/loaded. The driver must be loaded by Windows to run the demo. Verify that Windows sees the board and has loaded a driver for it.
- If the GUI displays the error message, "ERROR LOADING LIBRARY:Cpp\_Jni running in View Only mode" when executed, then the driver was not found or loaded. There are two causes:
  - The driver was never loaded (or the board is not installed)
  - The board failed to be detected by Windows.

In either case, the board needs to be installed and seen by Windows and the LSC\_PCle driver needs to be associated with the hardware.

### **Using Device Manager to Debug Installation**

Use Device Manager to get basic information on the hardware you have installed. To access Device Manager, right-click on the **My Computer** desktop icon and select **Properties**. In the System Properties dialog, select the **Hardware** tab and **Device Manager** button.



Figure 17. Device Manager



The Device Manager provides the same basic set of software driver information as in the Computer Management window. The Hardware Wizard allows you to install and remove drivers. You must to have administrator privileges to run the Hardware Wizard and install/remove drivers. Again, the most useful thing is to verify that the Iscpcie and Iscvpci drivers (if enabled) have been installed.