Things to remember

  • Per Oliver K -- need one RT per chamber and one time offset per mezz card
  • Consider sector logic in same ATCA crate with MDT trigger --> backplane comms?
  • Dan's segment finder could pretty easily output multiple segments per ROI
  • Confirm that 200kHz ROI are all ATLAS and can be divided by 32 for A/C and sectors
  • Summary for ATLAS ATCA from EDMS ATU-GE-ES-001
    • Max power 400W (blade) + 50W (RTM)
    • Dual star backplane OK (full mesh option)
    • Prefer fiber to each blade for TTC
    • GbE base interface required
    • LAPP IPMC (presumably CERN is OK)?



Preparation for Dan's talk. Some observations:

In the TDR on page 190, it is mentioned that the blade FPGA is VU125 and the mezz card has 3x KU115. Wu proposes a KU15P + ZU11EG.


Main blade FPGA (not Proto0) could be KU15P-A1760 which has a total of 76 MGT. Started on mechanical dwg with 3 mezz cards in ohm:Proj/ATLASP2/Trigger



  • Ports
    • 1000Base-T (IPMC)
    • 1000Base-T (RJ-45)
    • 1000Base-T (ATCA 1)
    • 1000Base-T (ATCA 2)
    • RGMII/SGMII (Zynq)
  • Switch candidates:
PartDatasheetUsed OnNotes
Broadcom BCM53134M NDA ICL Serenity works fine but hard to get docs
Microchip KSZ9897 Yes
Microsemi VSC7512 NDA Wisconsin prototype plan for WI mezzanine board

CERN IPMC has on-board PHY (100Base-TX). They suggest a TI dumb switch TS3L110 with control by an IPMC GPIO for backplane/RJ-45 selection.

Zynq can do anything but easiest is GMII, RGMII or SGMII


Thinking again about FPGA choice for prototype. Need bare minimum of 12, pref 24 MGT. Revisit Link counting

Target Tx Rx Notes
MDT CSM 3 9 3 MDT chambers
S.L. 4 4 2 SL boards
AXI C2C 1 1 to Zynq
FMC 1 1 spare
FMC / AM 3 3 need 4 total for AM
Total 14 20 including AM

Rough count of SelectIO

Target Pairs SE Notes
AXI C2C Zynq 20
DDR Memory 6 44 est for DDR3 x16
FMC HPC 76 plus some clocks

Some FPGA options

Part MGT LCells 36k BRAM
KU040 20 530k 600
KU060 32 726k 1080

Draft document on Prototype 0 in Gitlab:


Meeting at CERN, notes: AtlasL0MDTPrototypeNotes]

2017-12-19 Prototype Goals/Requirements

  • FPGA 1: Input processing
    • MDT hit inputs
      • Receive a significant number of xGBT links (ideally up to 40 or so) for testing
      • Decode eLink, TDC protocol
    • DAQ Path
      • Receive L0A from TTC system (details needed)
      • Delay hits for 10uS or 35us L0 delay (see talk) -- needs ~450 or 750 BRAMs for 10uS or 35uS
      • Match hits to L0A and buffer in DDR memory
      • Output to FELIX (simulated?) on many parallel links
    • Trigger Path
      • Receive ROI from SL
      • Process ROI and assign processor
      • Extract hits and send to segment finder
      • Transmit on low-latency links to (virtual) MC
  • FPGA 2: (Virtual) DB Functions
    • Manage multiple segment finder processors (maybe resource limited to 1)
    • Receive low-latency hits
    • Proccess in segment finder
    • Transmit on low-latency links to track finder
  • FPGA 3: Track Finder
    • Receive low-latency segments from (virtual) MC
    • Calculate P(t)
    • Output to SL

2017-12-18 More prototype thoughts

  • Input FPGA could be a ZU19EG
    • Total 72 GTH+GTY, all speeds can run > 10.3125 Gb/s
    • 1M FF, 512k LUTs
    • but only one DRAM interface, so FELIX output with buffering would be a problem.

2017-12-15 Prototype architecture

Required functions:

  • Input FPGA:
    • Large group of xGBT-FPGA receivers (36 or so)
    • DDR memory, DAQ output to FELIX
    • Inputs from SL
    • Hit extraction, calibration
  • "Virtual Mezz" or real mezz
    • Segment finding (Zynq or Legendre)
    • Requires ~ 100k LUTs and 100k regs for Legendre
  • Track fitter
    • Track fitting, coordinate transformation
    • Outputs to SL

2017-12-15 Zynq selection

7000 series: 7030 and above have GTX SERDES, speeds below. Need an FF package and -2 speed grade to get above 10 Gb/s. As of 12/15/17, the -2FFG676 and -2FFG900 are both in stock at DigiKey for ~ $2k.

7035 and up have 16x SERDES. 8 usable in FFG676 package, 16 in FFG900.

7045 and 7100 both have 200k+ LUTs so barely enough for Legendre.

2017-11-12 AM mezzanine info

For FPGA version:

I/OSERDES run at 1000-1250 Mb/s on U(+) parts. This is conveniently 3x the 360MHz pipeline speed, so one set of 32-36 pairs could handle 3X segment finder inputs. We need 3 of these for the 3 stations, I guess, so a total of 96-108 pairs.

FMC-HPC pair counting:

LA00-LA33    (34 pairs)
HA00-HA23    (24 pairs)
HB00-HB21    (22 pairs)

    Total    80	pairs

Plus 20	additional SERDES pairs

AMC PRM info from Y.Okumura 4/26/17:

* Full implementation of LPC (C+D+G+H)
   - LVDS lines FMC LPC (34 pairs : LA00 - LA33)
   - I2C interface 
   - JTAG interface
   - High speed line (DP0 C2M & M2C)
   - clock interface
* Partial implementation of HPC (only two additional high speed lines on A)
   - DP1 (C2M & M2C) and DP2 (C2M & M2C)

2017-04-10 Zynq ARM-cortex9 info

  • Zynq ARM core speeds
  • Integer multiplication
    • Page 202
    • Ignoring instruction pipeline status (stalled)

instruction time (cycles) latency(cycles) notes
MLA 2 4 (32 * 32) + 32 -> 32
(S/U)MLAL 3 5 (32 * 32) + 64 -> 64
  • Floating point multiplication
    • Page 27
    • Ignoring instruction pipeline status (stalled)
FPU instruction time (cycles) latency(cycles) notes
FMUL (single/double) 1/2 5/6 multiply
FMAC (single/double) 1/2 8/9 multiply accumulate
FDIV (single/double) 10/20 15/25 divide
FSQRT (single/double) 13/28 17/32

2017-03-07 ATCA blade layout

  • Considering blade with 144 fiber I/Os. This means 24 miniPODs.
  • Looking at (2) XCVU125 in C2104 package.
  • RTM connector e.g. 1469286-1 with 48 pairs, 30 mm high.
  • Airmax connectors
    • 10034475-101LF can e.g. fit 50 pairs in 20mm using 5 rows (!)

2017-02-16 Legendre resource use

Experimented with various things in Vivado. Bottom line is it takes almost 1 LUT per bit for 4-bit cells for a counter which stops at 0xf. Removing that requirement reduces it by 30%. Not sure why, since that should just affect LUT contents. Increasing counter bits to 5 and using highest bit only as overflow reduces LUT use by 11%.

2017-01-18 prep for summary talks

2017-01-11 Revisit ROI rates

ROI rate for barrel from RPCs is ~ 200kHz for all ATLAS. Per 1/32 sector A/C this is 6.25kHz.

Assume one ROI is 35BX long (875ns). Probability of one ROI @ 6250Hz in one 35BX period is 1/183. So for poisson PMF lambda - 1/183. So P(k) = ((lambdak)*exp(-lambda)) / k!

2016-12-16 chat with Kai Chen about GBT

My questions and his answers:

Q: Are Tx and Rx separate, can be in separate FPGAs?

A: Yes, no problem. Can choose to implement Tx, Rx or both in GBT-FPGA

Q: How to decode e-ports (e-links)?

A: Decoding to the word level (8 bits for 320 Mb/s option) is provided. Framing for larger words must be done by the user

Q: What is largest number of links implemented so far?

A: BNL has implemented 48 links in a KCU115 for ATLAS LAr

Q: What is the latency and resource use for the cores?

A: He will send details

Other useful information provided:

  • KCU115 are $7000 in singles, $5000 in qty (64 links)
  • If we do not need the reduced/fixed latency option, all GBT-FPGA Rx can use same receiver clock using internal buffer, so no clock-domain crossing is required.
  • May need GBT-SCA ASIC on the front-end to configure/control the GBTX ASIC

2016-11-15 with Thiago here

Some observations:

  1. ROI can be


LeGendre? notes:

  1. Establish "Legendre" coordinate space as polar (r, phi) WRT chamber origin (or is this the seed track origin?)
  1. Use seed track angle/intercept to set up r, phi binning for histogram
  1. Iterate over phi bins, evaluate _r = x0 * cos(phi) + y0 * sin(phi) +/-R_ for each tube hit, where R is the drift tube radius and x0/y0 are the center of the tube in local coords.
  1. Increment 2D histogram at each r, phi.
  1. Find the maximum bin


Local discussion w/ John+Kevin. RPC/TGC resolution ~ 3mrad. This is ~1.5mm at 500mm height of BOL, e.g.


Separate HE per station? For typical sector-

  Sta  Cbrs  Tubes   Mezz   GBT  Rate[1]
  BO   6     2544    106    18   100 MHz
  BM   6     1728    72     14    66 MHz
  BI   6     1824    77     15    75 MHz

[1] Rate is n(tubes) * 200kHz * 20% for time match to 200kHz ROIs



 6500 tubes/sector
 54 links @ 4.8Gb/s (120Mhz 8b10b 40-bit words)
 200kHz / tube  144 tubes/link --> 30MHz hits / link
 trigger 200kHz per sector
 50Bx window

Tube# is 13-bit (0-8191). Assume sort by eta. Each ROI would be translated to tube# range (eta range).

As ROIs arrive, keep early/late BX window updated for each of ~4 ROIs.

Need to process ~1GHz total MDT hit rate, compare each against BX range and tube# (eta) range. Could be parallel or sequential.

Assuming all agree on BC0, need only 6-8 bits of BCN.

200kHz ROI rate with 50BX time window --> 20% live time for >0 active ROIs. So BX selection rejects 80% of hits resulting in ~200MHz rate of hits matching only the BX selection.

Maybe 4 pipelines running at 250MHz would do the job, each comparing hits against upper and lower tube#/eta limits.


Hit extraction ideas: Buffer ROIs and compare hits as they come in. Given an average rate of 200kHz of ROIs, one can calculate using Poisson statistics the probability of seeing n ROI inputs in a single BX period is:

k  P(n)       Rate(n)
1  4.98E-03   1.99E+05
2  1.24E-05   4.98E+02
3  2.07E-08   8.29E-01
4  2.59E-11   1.04E-03
5  2.59E-14   1.04E-06

A circular buffer with 4 or 5 ROIs x 50BX would be adequate. A time window would be maintained by updating begin/end registers as the ROIs arrive and exit the circular buffer. MDT hits would be compared as they enter against the begin/end registers, then against spatial coordinates of any live ROIs.

Alternatively one can estimate the probability of seeing n ROIs in a 50BX window as (1/400)n

P(n) at rate
100kHz 200kHz 500kHz
1/bx 400 200 80
1 2.50E-03 5.00E-03 1.25E-02
2 6.25E-06 2.50E-05 1.56E-04
3 1.56E-08 1.25E-07 1.95E-06
4 3.91E-11 6.25E-10 2.44E-08
5 9.77E-14 3.13E-12 3.05E-10
6 2.44E-16 1.56E-14 3.81E-12
7 6.10E-19 7.81E-17 4.77E-14
8 1.53E-21 3.91E-19 5.96E-16

So one need keep only ~4 ROIs active and compare the MDT hits against them. This scheme seems like a winner.



Proposed overlap scheme would require double the number of links per sector, taking inputs from sector n and n+1. The worst-case is sector 10+11 and 15+16 which would both require 104 total links. (this assumes the GBT links are split optically). Alternatively the links could be received by one sector and sent to the next one, but this would increase latency.


Data extracted from parameter book. Counts for one sector, one end, maximum.

Region Max Chambers Max Tubes Max GBT
Barrel 22 6432 54
Endcap 13 4320 34

(Endcap is EE, EM, EO only)


Thoughts: Average tube hit rate ~100kHz per current estimates. One sector has ~6000 tubes (verify!) so total rate is 600MHz of hits in the sector.

This means an average of 600 "live" hits in a us. Kostas claims for 2-way coincidences in RPC we need to handle 200kHz of trigger hits per sector (one per 5us).

In a simple-minded scheme one would need to compare each ROI against the 600 live MDT hits. To me this implies storing the MDT hits in a multi-dimensional array by time and two spatial coordinates which can be trivially compared against the ROIs. Have to figure out how to automatically "age out" the stale MDT hits.


Altera Arria 10 (GX) family is interesting. The 10AX090U (aka GX900) has 96 transceivers (17Gb/s) and goes for around $5k (10AX090U4F45E3SG is $5100 at Arrow)

2016-09-27 FPGA selection for pre-prototype

Part Delivery Price GTH GTY
XCVU065-2FFVC1517 stock $7700 20 20
XCVU080-1FFVA2104C 16 weeks $8050 28 24
XCVU080-1FFVD1517C 16 weeks $8200 32 32

2016-08-30 discussions at CERN

Last modified 2 months ago Last modified on Oct 19, 2018, 8:43:31 AM