International Journal of Scientific & Engineering Research Volume 4, Issue3, March-2013 1

ISSN 2229-5518

Validation of External Memory by Using PCITree

Software

Savita S.Biradar, Suma Santosh

Abstract -- Over the decades, systems have grown larger with higher channel counts and faster digitization. Due to advancement in technology, there is an urge for managing and processing large amount of data being generated. Currently used communication busses create a multicomputer system but often make tradeoffs in terms of bandwidth or latency capabilities. To overcome and to support enhanced features such as I/O virtualization and processor coprocessor interconnect, third generation I/O interconnect i.e. Peripheral Component Interconnect Express (PCIe) is employed. Hence the design and development of a PCI Express (PCIe) endpoint block is done using Xilinx virtex-5 FPGA platform in order to communicate with root complex using PCIe protocol. PCIe is utilized for communication between the host and the IO subsystem in multicomputer platforms. PCIe is utilized for communication between the host and the IO subsystem in multicomputer platforms. PCIe endpoint is used for the high speed data transfer of 180 Mbps for each lane. Accessing of the data from computer memory for the PCIe Endpoint block is done by using PCITree software.This work uses Verilog to model different blocks of the integrated endpoint block for PCI Express. The RTL code is simulated, synthesized and implemented using the ISE 12.4 from Xilinx and the Virtex-5 FPGA is targeted for implementation.

Index Terms - PCIe, BMD, core generator tool, LogiCORE endpoint, PCITree

—————————— ——————————

1 INTRODUCTION

THE PCIe standard is a next generation advancement of the older PCI and PCI-X parallel bus standards. PCIe is a high performance, general purpose interconnect, faster and serial bus architecture with dedicated and dual unidirectional I/O, intended to use for a computing and communications platforms. PCIe is a packet based, point to- point serial interface and it is backward compatible with PCI and PCI-X configurations, application software and device drivers [1].

The group PCI-SIG (PCI Special Interest Group) has been
working on PCIe and came up with the first second and recently the third generation of I/O interconnect. This is now the most efficient and widely used I/O interconnects used worldwide.
PCIe is the third generation I/O interconnects of today’s Single host computing platform, is being enhanced to support features such as I/O virtualization and host-to-host communication. The effective bandwidth of PCIe is lesser than the raw bandwidth caused by the overhead of the 8B/10B encoding and decoding used by the protocol.
In the past serial communication was most widely used due its disadvantage of slower data transfer and to have faster transmission rates parallel communications are being used. Perhaps with the increased speed of parallel data transfer the interference is increased.
The shortcomings of PCI are as processors, video cards, sound cards and networks became faster and more powerful, PCI bus remained the same for 32 bits having a fixed width and it can handle only 5 devices at a time has a bandwidth of 132
MBPS. The 64-bit PCI-X bus provides more bandwidth which
is of 264 MBPS, but if width becomes greater which compounds some of PCI's other issues. A new protocol called PCIe eliminates a lot of these shortcomings, provides more
bandwidth and is compatible with existing operating systems
[2]. It provides 250 MBPS for X1-lane in a single direction. PCI and PCIX bus is not easily scaled up in frequency and down in voltage because it utilizes synchronously clocked data transfer and parallel bus implementation. PCIe provides a serial architecture that minimizes many of the drawbacks of parallel bus architectures by using clock data recovery (CDR) and differential signalling [3].
Instead of one bus that handles data from multiple sources,
PCIe switch is used to control several point-to-point serial connections. These connections are separated from the switch and they lead directly to the devices where the data requires to go. Devices no longer share bandwidth as in other normal buses as each and every device has its dedicated connection.
Functionality defined in the specifications which are maintained by the PCI-SIG is used in the virtex-5 FPGA integrated endpoint block. The functionality of the Virtex-5
FPGA supports x8, x4, x2, or x1 lane width [4], Block RAMs
are used for buffering purpose, Management interface is being used to access configuration space, and 128 to 4096 bytes is the size of the payload supported. Base address registers are (BARs) are configurable for both memory and I/O devices. PCIe endpoint block support s up to two virtual channels (VCs) [5].
LogiCORE endpoint block plus wrapper for PCIe designs is used to design the Virtex-5 FPGA integrated endpoint block for PCIe designs from CORE Generator tool. The CORE Generator tool generates wrapper around the integrated endpoint block and it automatically connects the block RAMs, RocketIO transceivers, reset and clock modules. This wrapper simplifies the design of the system because it provides an easy-to-use interface [6]. An integrated endpoint block for PCIe designs is the feature of Virtex-5 FPGA family is

IJSER © 2013

http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 4, Issue3, March-2013 2

ISSN 2229-5518

implemented in this paper.

2 TOPOLOGY OF PCIe SYSTEM

3 ARCHITECTURE OVERVIEW

Fig.1: Block Diagram of PCIe Endpoint

2.1 Root Complex

Root Complex is the heart of the PCIe bus. Root complex is made up of one or more Host Bridges. One or more root ports are present in each host bridge. It is used to connect other PCIe devices, such as Endpoints or Switches, to the Root Complex.
Root complex is used to link the CPU and memory of the
system usually through a bus of the system.

2.2 PCIe Endpoint

The PCIe endpoint device is not having any downstream devices attached to it. It functions as a leaf node in the PCIe device hierarchy. Graphics processing unit, network, and storage controllers are the examples of PCIe endpoints.
A PCI Express Endpoint can be integrated directly into the Root Complex, or it can be connected to a PCI Express Root Port or PCI Express Switch.

2.3 PCIe Switch

PCIe switch is used to connect a root complex on the upstream side to the endpoints on the downstream side. PCIe switch provides connectivity to PCIe endpoints, Bridges and slots. One upstream port and multiple downstream ports are present in each PCIe switch.

Fig 2: Virtex-5 FPGA Integrated endpoint block diagram

Figure 2 shows the architecture overview of Virtex-5 FPGA integrated endpoint block diagram. The upper layer in the architecture is the transaction layer. Packets are accepted and issued the packets in this layer and the presentation of these packets in specific format is called as transaction layer packets (TLPs). TLPs sent by the user and it schedule them for transmission is done at the transaction layer interface. The process of turning the request or completion from the core into PCIe transactions is responsible at transaction layer. The transaction layer receives request from the core and checks whether the information is going into transaction is responsible at the transmitter side. The transaction layer accepts the incoming PCIe transactions from its data link layer at the receiver side. A TLP is composed of a header, data payload and end-to-end CRC (ECRC). Header is used to identify the type of transaction which is present in each TLP. Ensuring that a packet is not transmitted unless the receiving device has satisfactory buffer space to accept it is done by the flow control mechanism. [7].
Data link layer takes the TLPs from transmit side of the
transaction layer. 12 bit sequence number is added in front of the TLP and Link CRC to the end. TLP is forwarded to the physical layer from data link layer on the transmit side. It accepts the packet from physical layer and checks the sequence number and LCRC with the functional sequence number and LCRC on the receive side of the data link layer. TLP are moved and ACK signal is generated on the receive side of the transaction layer if the sequence number and LCRC
matches. It generates NAK (negative acknowledgement)

IJSER © 2013

http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 4, Issue3, March-2013 3

ISSN 2229-5518

signal if there is any mismatches or error detections in sequence number or LCRC. Data Link Layer is not going to send the “Invalid or Bad” TLP to receive side of Transaction Layer. Valid TLP to the transaction layer is send by the retry buffers at the data link layer [8].
The lowest layer of PCIe is the Physical Layer. Sending and
receiving of all data across the PCIe link is the main responsibility of physical layer. The information taken from the data link layer is converted from parallel data to serial data format on the transmit side of physical layer. Adding of framing characters is done for the starting and ending of the packet. The incoming serial data from PCIe link is converted into parallel data and the frames are removed and the packets are sending back to Data link layer.
Integrated endpoint block supports one physical layer (PL) lane module. The PL lane module applies the scramble codes which are generated by the physical layer to the transmit data at the transmission side of its operation. It multiplexes this with the ordered set data received from the physical layer module, and then passes the packet to the transceiver side interface for transmission. The PL lane module receives TLP bytes from the transceiver interface, decodes ordered sets from this data, and descrambles DLLP and TLP data from the resulting data stream is done at the receive side of its operation. The functionality of physical layer is 8B/10B encoding and decoding [7].
The Configuration and Capabilities module provides the
storage area for the different registers within the configuration space, including: Legacy PCI V3.0 Type 0 configuration space header, legacy capabilities, PCIe, power management, message signaled interrupts (MSIs), PCIe extended capabilities and device serial number.
For user application DMA is used which is a local link interface. Local link interface is a standard IP interface for Xilinx used in the PCIe design for the data path and user flow control receiver interface. Local link interface is a synchronous interface. Only on clock edges signals are sampled. Local link user interface is a high performance, synchronous, Xilinx standard point-to-point interface and it is designed for the user interface to system interconnects solutions. The transfer of packets is defined by the signals of the protocol. Bridge for other Xilinx IP is local link interface.
Efficient transfer of data to and from host CPU system memory is done by DMA technique. DMA technique has many advantages compared to standard programmed I/O data transfers. Bus master DMA (BMD) provides higher throughput and performance, lower overall CPU utilization. In systems using PCIe two basic types of DMA applications are found. These are a system DMA application and a BMD application [10]. Very few root complexes and OS support
system DMA’s use. The most common type of DMA found in
PCIe based systems is BMD application. A BMD is the endpoint device containing the DMA engine that controls moving data to or requesting data from system memory. The BMD Design connects up to the transaction interface of the endpoint block for PCIe [4]. Memory writes TLP’s are used to transfer data from endpoint into main memory via RAM, memory controller. Programmed I/O is used for single data transfer where as BMD engine is used for 32 bit or 64 bit data transfer. Memory write and memory read TLP’s are sent to the endpoint via Bus master DMA [7].

4 RESULTS

Fig 3: Accessing of the memory using PCITree

In the Figure 3, RAM, Memory Controller is selected from the Host CPU list, and then the configuration register space is edited and selected for BAR0 and for 64 bits memory. Number of configuration registers are 16. Memory space range for BAR0 is 2Kbytes and Memory is edited so that data has been accessed to that memory is shown in the Figure 8.7. Data can be toggled, counted and verified by using PCITree in the BAR space. The memory can be changed by refresh view which is shown in the above Figure. Auto read memory is selected in the memory.

IJSER © 2013

http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 4, Issue3, March-2013 4

ISSN 2229-5518

[8] LogiCORE IP Endpoint Block Plus v1.10 for PCI Express, UG341 April

24, 2009

[9] LIU N. Design and Implementation of High-Speed Data Transfer System Based on PCI Express Interface. Taiyuan. Middle North University, 2008: 1-5(in Chinese)

[10] Jake Wiltgen and John Ayer, Bus Master DMA Performance

Demonstration Reference Design for the Xilinx Endpoint PCI Express

Solutions, XAPP1052 November 4, 2010

Figure 4: Accessing of the memory using PCITree

The Figure 4 shows how the data has been accessed by memory when the memory is x00000000 and memory count that is 1 double words is shown in the Figure 8.8. Memory display range is 1024 bytes. Count is 17 DWords is displayed in the BAR space menu after selecting count.

5 CONCLUSION

The Virtex-5 Integrated PCIe endpoint block combined with the GTP transceivers and block RAMs, provides an extremely high level of integration and builds high-performance, fully compliant PCIe system in a single device.
PCIe endpoint block is used for high speed data transfer of
180 Mbps with 8B/10B encoding. The communication between the ML506 and the host PC is done with the PCITree driver, with the board configured with the endpoint block Plus for PCIe and its BMD design. Indirect access of computer memory by using PCIe protocol is implemented. FPGA can provide fully integrated PCIe solution and DMA engine can obtain higher bus throughput and lower CPU utilization.

REFERENCES

[1] Peripheral Component Interconnect Special Interest Group, “PCI Express External Cabling 1.0 Specification”, http: //www.pcisig.com/ specifications/ pciexpress/ pcie_cabling1.0

[2] Intel white paper. "Advanced Switching for the PCI Express

Architecture". www.intel.com, 2002

[3] Endpoint Block Plus v1.10 for PCI Express, DS551 April 24, 2009

[4] Peripheral Component Interconnect Special Interest Group, “PCI Express Base Specification 1.1”, http://www.pcisig.com/specifications/pciexpress/base

[5] PCI SIG, PCI Express Base Specifications Revision 1.Oa, PCISIG, 2003. [6] Virtex-5 Family Overview, DS100 (v5.0) February 6, 2009

[7] Virtex-5 FPGA Integrated Endpoint Block for PCIExpress Designs,

UG197 (v1.5), July 22, 2009

IJSER © 2013

http://www.ijser.org