Altera FPGA PCI Express core with Chaining DMA
Field programmable gate arrays (FPGA) more and more often come with a PCI Express implementation, either a soft core (i.e. as programmable logic) or as a hard core (i.e. as silicon on-chip). Altera's PCI Express Megacore can be used to instantiate such a core for most FPGA's such as Cyclone II, III, Arria I, II, Stratix II, III or IV. It uses either the on-board transceivers or an external PCIe PHY chip.
The Megacore will instantiate an example end point reference design called "Chaining DMA" which includes a small on-chip memory, as well as a chaining DMA controller that fetches a descriptor table from Root Complex memory and performs the DMA copies in the table.
This Linux device driver controls the Chaining DMA application and acts as a working reference design.
- The driver has appeared in the drivers/staging area of the linux-next GIT tree.
- Implement the character device file operations, have read and write map to synchronous DMA transfers
- Implement asynchronous building of descriptor tables while the DMA engine is running.
- Very basic character device interface.
- Altera Cyclone II PCI Express "Sendero" development board called, with a Philips/NXP PX1011A x1 PHY.
- Altera's Arria GX PCIe development board, http://www.altera.com/products/devkits/altera/kit-arriagx.html
- Numerous other boards
PCI Express core configuration
- The current driver-in-development is targetting cores generated with the PCI Express Compiler version 8.1. Goal is to detect and support newer versions.
- The core can be configured as "Legacy" or "Native" PCI Express End Point. Goal is to support both options.
- BAR size must be configured for 32kiB or more.
- BAR size must be configured for 256 bytes or more. This is where the Root Complex (i.e. CPU) memory address of the DMA descriptor tables is written and where the DMA is initialized.
- BAR address sizes can be 32-bit or 64-bit. Goal is to properly detect and support both.
- The Device ID and Vendor ID must be unchanged, i.e. 0x???? and 0x????. You may change the Sub system vendor and device ID to your liking.
- PCI Express Compiler User Guide 8.0, especially chapter 7 applies. http://www.altera.com/literature/ug/ug_pci_express.pdf
- AlteraForum postings, especially this thread: http://www.alteraforum.org/forum/showthread.php?t=2987&page=1
DMA Header (in End Point memory BAR)
|address||field||DMA Read or Write||comment|
|0x00||Global Control & Number of Descriptors||W|
|0x04||Bus Address (upper) of Descriptor Table||W||Points to a table in Root Complex memory|
|0x08||Bus Address (lower) of Descriptor Table||W|
|0x0c||Reserved & Last Descriptor Available (RCLAST)||W||RCLAST = 0 means descriptor #0 is ready for processing by the End Point|
|0x10||Global Control & Number of Descriptors||R|
|0x14||Bus Address (upper) of Descriptor Table||R|
|0x18||Bus Address (lower) of Descriptor Table||R|
|0x1c||Reserved & Last Descriptor Available (RCLAST)||R||RCLAST = 0 means descriptor #0 is ready to be acted-upon by the End Point|
- The fields must be written by DWORD writes, i.e. in Linux use iowrite32().
- The fields only have write access. Reading from these address will return a PCIe error (this can hang your system!).
- Writing to the 0x0c or 0x1c location starts the corresponding DMA operation.
- Does the design support DMA read and write operation concurrently?
- The Root Complex may increment RCLAST during the DMA transfer (this is not tested yet).
DMA Table (in Root Complex memory)
Each Table starts with four 32-bits words (16 bytes) in which the DMA controller will write its progress, followed by an array of descriptors, each four 32-bits words (16 bytes) in size.
|0x0c||Reserved & Last Descriptor Completed (EPLAST)||R/W|
|0x10||Control & Transfer Length (DWORDS)||R/W||Descriptor #0|
|0x14||End Point address||R/W||Descriptor #0|
|0x18||Bus Address (msb) for Root Complex memory||R/W||Descriptor #0|
|0x1c||Bus Address (lsb) for Root Complex memory||R/W||Descriptor #0|
|0x20||Control & Transfer Length (DWORDS)||R/W||Descriptor #1|
|0x24||End Point address||R/W||Descriptor #1|
|0x28||Bus Address (msb) for Root Complex memory||R/W||Descriptor #1|
|0x2c||Bus Address (lsb) for Root Complex memory||R/W||Descriptor #1|
- The total table may not exceed 4096 bytes or cross 4096 boundaries. pci_alloc_consistent(..., 4096, ...) will do that for us.
- 4096 bytes gives 255 descriptors. Suppose that each descriptor describes a 4096 byte copy, this gives 255 * 4096 is just a little less of 1 MiB per DMA operation.
Linux Kernel API's
Kernel configuration (Kconfig) entry
config ALTPCIECHDMA tristate "Altera PCI Express Chaining DMA Test Driver" ---help--- The Altera PCIe Chaining DMA test driver will perform tests against FPGA/ASIC devices that have Altera's PCI Express core with the Chaining DMA application generated by the Megacore. Devices range from Cyclone II FPGA with soft PCIe IP core up to a Stratix IV with a silicon PCIe core. This driver controls the DMA engine by performing DMA transfers in loop-back fashion and doing memory compares to verify the loop- back was succesfull. The driver acts as a test driver to verify your PCIe core. It may be used as a basis for your custom logic.
Kernel Makefile entry
obj-$(ALTPCIECHDMA) += altpciechdma.c