Eclypse DDR Streaming Project User Manual

This document describes an embedded measurement system that can be used to perform simple manual acquisitions and that can be extended by users to prototype their own applications.

Overview

  • Support all variants of the Zmod Scope, Digitizer, and AWG by splitting system functionality into modules that can be easily added, removed, or duplicated by users familiar with Zynq development.
  • Be modular enough that users familiar with Zynq development can make meaningful changes to the project.
    • Users should be able to add custom DSP elements to the design between modules with well-documented interfaces.
    • Users should be able to add custom trigger functionality or control trigger functionality from custom DSP elements at the highest level of block design hierarchy.
  • Provide sufficient base level specifications to capture meaningful real-world signals from real sensors (ideally 100+ ms record lengths).
  • Support the maximum sample rate achievable by the Zmod hardware with documented minimal additional configuration performed by users familiar with Zynq development. Support the widest variety of hardware possible without additional configuration.
  • Expose all hardware configuration interfaces at the software level.
  • Users should be able to export data to host PC software for further analysis, however this does not need to support full sample rate streaming speeds.
  • Support taking multiple acquisitions without rebooting or reprogramming.
  • For AWG output, user control should be implemented to turn off the sample application when finished, to help avoid inadvertent shorts.

The system will be provided to users in the form of one preconfigured project, with a downloadable prebuilt Vivado project archive file and Vitis workspace containing a single platform based on a hardware specification exported from said Vivado project and multiple systems and applications to be described later.

Users will be able to run and debug each software application on their own hardware out of the box with no changes under a specified subset of hardware configurations. Minimal changes will be required to run with other hardware configurations. To this end, block designs and software applications must be modular and hierarchical, such that users can both tweak the internal workings of high-level modules by changing IP parameters or modifying drivers, add new functionality (“User IP”) to block designs, as well as quickly copy, duplicate, remove, or replace functionality related to their chosen zmods. The processes required to switch hardware configurations (ie replacing a zmod with another zmod, or removing a zmod) must be documented.

The default hardware configuration will support a Zmod Scope 1410-105 and Zmod AWG 1411-105. The sample rates for both input and output will be fixed to 100 MS/s by default, with well-documented instructions for increasing sample rate.

Block design hierarchies, Vivado IP, and RTL modules will be used to implement hardware modules consistent with these goals.

Software drivers will be provided for *both* each AXI-connected IP and each hierarchy, consisting of header and source files placed in a dedicated subdirectory of application sources, such that they can easily be copied into another project. Software must be designed such that hardware configurations with multiple pipelines of the same type are supported streaming data simultaneously – no long blocking functions.

The Vitis workspace will only include baremetal software applications, and further Petalinux support should only be considered after baremetal software is functional, to limit initial project complexity.

Architecture Overview

The following modules will be provided. Each is described in detail in a later section of this document.

Note: The word module is used loosely to refer to a collection of functionalities and does not imply how it will be packaged (i.e., not necessarily a Verilog module).

  • Circle buffers implemented in software.
  • PS-PL and PL-PS data transfer hierarchies with associated software drivers.
  • A trigger generator and detector implementing several simple triggers including a software manual trigger and basic level triggers with support for user-defined triggers.
  • Several front-end hierarchies, such that each Zmod is supported by exactly one (but one could cover multiple Zmods where viable), with associated software drivers, containing:
    • A Low-level IP including ports for the syzygy connector, configuration interfaces, and AXI4-Stream interface for data
    • An AXI Adapter connecting configuration interfaces to AXI4-Lite
  • Additional drivers and software utilizing dpmutil (Digilent platform management utility) to enumerate connected Zmods and read data including calibration coefficients from SYZYGY DNA.

Provided Software Applications

Several software applications will be included in the workspace:

  • Acquisition example
    • Performs one single-shot transfer and prints the data to a serial console.
  • Generation example
    • Fills an output buffer with a sine waveform and repeatedly plays it back over the I/O until user input halts it.
  • DNA reader
    • Enumerates connected Zmods and prints calibration coefficients read from SYZYGY DNA.

Acquisition Sequence

Boot sequence

  1. The board finishes booting, and the bitstream is loaded
  2. The Scope or Digitizer low-level IP comes out of reset and performs the initial configuration of the Zmod connected to the port
  3. Completion of initialization is indicated to the PS by a done status bit becoming asserted

Acquisition Sequence

Multiple acquisitions must be able to be performed in sequence, such that the bitstream does not need to be reprogrammed every time. As such an acquisition must return the PL hardware to a known state where another acquisition can be performed. An acquisition is performed as follows:

  1. The PS performs additional initial configuration, including setting Zmod registers via SPI, setting PL configuration registers connected to I/O lines.
  2. The PS initializes software buffers.
  3. The PS sets trigger configuration registers and remembers the intended window position.
  4. The PS initiates a PL-PS transfer.
  5. The PS enables the low-level IP AXI stream interface allowing data to flow through the pipeline, through the trigger detect module, into the DMA.
  6. The PS repeatedly calls PL-PS management functions or receives interrupts to manage scatter gather block descriptors and watch for an end-of-transfer condition.
    • If a manual trigger is used, the PS does this while also checking for user input, then setting the PL manual trigger register.
  7. Once the trigger has been received and the trigger module has injected a tlast signal, the PS receives the final transfer. The trigger module has automatically begun discarding incoming samples to keep the pipeline clear, setting its configuration registers at the start of the next transfer will activate it again.
  8. The PS performs any desired post-processing on the data, finding the true trigger position in the block with the trigger, finding the head and tail of the buffer in the head and tail blocks, and sending data to a Host PC for further analysis.
  9. The PS cleans up the scatter-gather BDs and gets ready to perform another transfer.

Wavegen Sequence

The wavegen sequence is somewhat simpler, as the PS is in control of the data source.

Boot Sequence

  1. The board finishes booting and the bitstream is loaded.
  2. The AWG low-level IP comes out of reset and performs initial configuration, which is indicated

to the PS by a done status bit becoming asserted.

Wavegen Sequence

  1. The PS performs additional initial configuration, including setting Zmod registers via SPI, setting PL configuration registers connected to I/O lines.
  2. The PS initializes a software circular buffer and fills it with a chosen waveform generated in software.
  3. The PS initiates a PL-PS transfer.
  4. The PS services the scatter-gather engine by enqueueing new block descriptors whenever needed and waits for user input indicating that the output should be disabled.
  5. The PL-PS transfer is gracefully halted, resources are cleaned up, and the last data flows out through the zmod.
  6. FIXME Is there an output enable signal which should be turned off?

PS-PL & PL-PS Data Transfer

These hierarchies are responsible for moving data between AXI stream interfaces in the FPGA fabric and DDR memory.

They should be capable of sinking or sourcing data at the full sample rate of the fastest Zmod (125 MS/s at time of writing) through its AXI4-Stream interface. They should support arbitrarily long data transfers through the AXI4-Stream interface, in order to allow for circle buffers in DDR memory to be used. They should be capable of handling a fully-saturated single 32-bit AXI4-Stream interface clocked using a clock that matches the sample rate.

In order to meet these requirements, each hierarchy consists of a Xilinx AXI DMA IP core, configured in scatter-gather mode, and a clock converter.

Two techniques are used to increase the bandwidth of the AXI S2MM/MM2S bus to account for the combination of AXI4 protocol overhead (in the form of address/control, and response beats) and fully-saturated data interface.

First, the AXI DMA operates in a faster clock domain than the data interface. An AXI4-Stream Clock Converter IP is used to transfer data between clock domains.

Second, the AXI DMA's memory map data width is 64 bits, to match the width of the Zynq PS HP AXI interfaces, and to effectively double the overall data bandwidth of the DMA as compared to using a 32-bit default.

This 64-bit data width (without allowing unaligned transfers) means that software buffers must be aligned to 8-byte boundaries.

The hierarchy is also responsible for moving data between a sample clock domain used by trigger hardware and the low-level frontend and a clock domain used by the PS slave interfaces.

In order to ensure that bandwidth is available, a single PS HP AXI slave port will be allocated to each data transfer hierarchy’s S2MM/MM2S interface. With multiple hierarchies present in the system, their scatter-gather ports will share a single HP interface. FIXME it should be determined whether the scatter-gather interface may get too congested when using multiple hierarchies simultaneously.


Trigger Detector

The trigger detector takes an AXI4-stream input, provides an AXI4-stream output, and accepts a 32-bit trigger input. Each of these interfaces are synchronous to each other. Several ports controlled by configuration registers are used by the PS to control the system.

The trigger detector starts out in an idle state where data is allowed to flow in through the input interface where data is then discarded. This allows the front end to start up at any time while maintaining the requirement that data provided to the S2MM/MM2S transfer cores be consecutive. Once the PS issues a start signal, the trigger detector enters a prebuffer state, where a PS-specified amount of data is passed through. Prebuffering data before accepting trigger signals ensures that a complete buffer of data will be acquired. Once prebuffering is complete, the trigger detector enters an await state, where it continues passing data downstream while waiting for any enabled trigger line to be asserted, at which point it counts out a final number of beats before asserting completing the transfer, and returning to idle. See the state diagram, below:

<html center>

The diagrams below, which “unroll” the state machine over time, demonstrate why the prebuffer and postfill counts must be accurate in order to capture an entire acquisition regardless of when the trigger event might occur relative to the start signal.

From the perspective of the AXI4-stream signals, the trigger detector is responsible for ensuring that the input interface's tready signal is always held high, the output interface's tvalid and tdata are forwarded from the input interface while transfers are ongoing, and asserting the tlast signal as specified by the count ports.

The PS uses configuration ports to specify how long the prebuffer and fill states last, in clock cycles. Note that the prebuffer state is shortened internally to the trigger detector to account for some pipeline latency between the control path (state machine and counter) and the data path (input stream interface to output stream interface). This “shortening” is transparent to the software: for a specific buffer length, the software should set the fill count (“TRIGGERTOLASTBEATS”) and prebuffer count (“PREBUFFERBEATS”) so that they sum to the buffer length. This also means that prebuffer length < 2 should not be specified.

The trigger_enable port is used to mask incoming trigger signals, such that only triggers on lines where the enable is '1' count. The trigger_detected port is connected to a register that stores the trigger event that caused the state to transition from await to fill, such that it can be used to detect which trigger occurred when multiple are enabled.


Trigger Generator

An example trigger generator module is provided, which performs simple DSP on incoming signals to generate the trigger signal accepted by the detector. This generator implements the following triggers:

  • A software-controlled manual trigger.
  • Rising and falling level triggers for both data channels.
    • Only 2's complement data format is supported.

Additional user configuration registers are provided to control user-extended triggering logic.

FIXME more info


Low-level IP and Front-end Hierarchy

The frontend hierarchy is responsible for implementing an interface between the Syzygy connector and an AXI4-Stream interface. It should also provide interfaces to all other configuration interfaces for the Zmod. It performs initial configuration of the Zmod at time of boot and then hands over control of the serial interface used for configuration to an indirect access protocol (IAP) port based on the AXI4-Stream protocol.

DSP elements are included which perform corrections based on calibration constants provided to the hierarchy from the PS, which can be read using the DNA reader application and manually entered into application software by users. The hierarchy is responsible for defining the packing of samples in 32-bit AXI4-Stream data beats. It has:

  • a Syzygy interface to physical pins
  • an AXI stream interface for data which is not enabled until the PS has received indication that initial configuration has been performed and acknowledged
    • tvalid will not be asserted until software has polled the ready / init done bit, seen it to be high, and asserted the software enable signal
  • IAP port for issuing commands accessible through an AXI4-Lite interface
  • additional configuration ports accessible through an AXI4-Lite interface

This functionality will be split into at least two IP, to allow the low-level IP to provide simple handshaking interfaces for use in PL-only projects.

The low-level module is responsible for managing the timing and location constraints applied to the Syzygy interface, interacting with the board files to pull any information common to multiple Zmods into the IP configuration at the time that the script is run.

In order to limit user constraint management requirements when using Zmods with less-than-14-bit resolution, a slice IP is used to cut the relevant bits out of the Zmod Scope data bus.