bu_cms_history/Post_GRuMM_Tests

SiteMap (Historical BU CMS wiki main page)

2008-04-02 (hazen, BU)

Wu has a new version (0x2c23) which claims to fix data corruption. Trying it now.

Also, updating to 6.1.1.

2008-04-01 (hazen, BU)

  ./setup_tts -p 10 -u 1 8 -u 2 25 -u 3 100 -u 4 250 -b
     ...
  ./setup_tts -p 80 -u 1 8 -u 2 25 -u 3 100 -u 4 250 -b

Above settings give about 6kHz..52kHz. AOK taking data with slink.

With -p 100 (65kHz) we start to get trouble. There are a few mis-match counts on the DCC. The data is OK, with only the occasional BcN mis-match (EvN and OrN are OK).

Try -p 110 (72kHz). Same. Try -p 150 (97kHz). Same.

 ./setup_tts -p 150 -u 1 8 -u 2 25 -u 3 100 -u 4 250 -b -m 127 -l 64

This has a long latency (128 * 64) = 8k BX = 205 uS. Then we get wierd data corruption. Mis-matched EvN in data, but not shown on DCC monitoring counters. No problem with -L 32 (100uS).

  ./setup_tts -p 150 -u 1 8 -u 2 25 -u 3 100 -u 4 250 -b -m 127 -l 48

This is about the threshold. Only 2 errors in about 20k events. Save as '''/export/dcc_100kHz_100us.dat''' on cms1.

2008-03-31 (hazen, BU)

Created '''src/Fanout/FanoutTest.cc''' which reads TTS broadcast command history, and checks all BCZ messages for correct OrN and BcN.

Lots of ways to provoke errors. One way is to put a single L1A at bunch 3372 or after.

  ./setup_tts -r 20 -n 0 -d 3372 -t 1 -s 100 -o 3563

Another way is to send lots of triggers in an orbit:

  ./setup_tts -r 20 -n 0 -d 500 -t 20 -s 100 -o 3563

The errors show up as a BCZ command (command=01) seen at BcN 0xbd7 instead of the correct 0xdeb.

Set fanout to send BCZ on test connector (set bits 0, 1 to '00' at offset 0x30). Now I can see clearly that the problem is a missing orbit signal. Can also see that the BCZ comes about 52 BX after orbit input to TTCvi.

Change Xilinx firmware blanking interval as follows:

  front_porch = 250          BX after /ORBIT to inhibit L1A
  back_porch = 500           BX before /ORBIT to inhibit L1A

Because of offset in TTCvi, this really means no L1A about 200 BX after BCZ and no L1A for about 550 BX before BCZ.

2008-03-25 (hazen, BU)

Working on random trigger generator. Rohlf reminded me of the easy way to do it. Simply calculate a uniform random U (scaled 0...1) each BX. Then, generate a trigger if U > 1/N, where N is the average trigger spacing (400 for 100kHz). This requires generating a new U every clock.

The generator proposed is this one from ''Numerical Recipes'':

   U = 1664525L*U(0) + 1013904223L;    (modulo 2**32)

This requires one 32x32 multiply with a 32-bit product, and one 32+32 add with a 32 bit sum.

Then we apply the trigger rules:

 Trigger rules implemented for March GR:
  1) no more than 1 L1A in 8 bx (CSC request, TDR rule : no more than 1 L1A  in 3 bx)
  2) no more than 2 L1A in 25 bx (TDR)
  3) no more than 3 L1A in 100 bx (TDR)
  4) no more than 4 L1A in 250 bx (TDR)

Christian, Ivan

2008-03-24 (hazen, BU)

Running at 1 trigger per orbit, no SLINK. Six htrs (12 spigots) active.

Now enable:

  DCC_Config
  DCC_Counters
  DCC_Debug
  DCC_Error_control_configuration
  DCC_Error
  DCC_Firmware_Revisions
  DCC_Input_Spigot_Error_Counters

Errors are back. Turn OFF:

  DCC_Config
  DCC_Counters
  DCC_Debug

No Errors. Turn on DCC_Debug. Now we have on

  DCC_Debug 
  DCC_Error_control_configuration
  DCC_Error
  DCC_Firmware_Revisions
  DCC_Input_Spigot_Error_Counters

Errors are back. Turn off all except DCC_Debug. Still have errors.

Remove all lines in CSV file with "90 Debug" and rebuild hcalHW. No Errors.

Put back CSV file lines in groups:

 "log3_fmem","command_and_status_register",
 "log3_fmem","CR_DDR_test",
 "log3_fmem","CR_slink_width",
 "log3_fmem","TTCrx_fifo_read_pointer",
 "log3_fmem","TTCrx_fifo_write_pointer",

 "log1_mem","mask",
 "log1_mem","LRBA_Address",
 "log1_mem","LRBB_Address",
 "log1_mem","LRBC_Address",
 "log2_mem","mask",
 "log2_mem","LRBB_Address",
 "log2_mem","LRBA_Address",
 "log2_mem","LRBC_Address",

 "log3_fmem","Slink_fifo_read_pointer",
 "log3_fmem","Slink_fifo_write_pointer",
 "log3_fmem","TTCrx_ID_register",
 "log3_fmem","monitor_buffer_read_pointer",

Seems that the problem is the middle group. Aha! Reading log1_xxxx, log2_xxxx or mipx_xxxx is of course likely to cause PCI bus conflicts. Remove them all, start a long run

2008-03-21 (hazen, BU)

Learned more about monitoring. See Monitoring . Changed all DCC-related items in ...dist/etc/hcal/monvis/hcal-MonLogger.xml to update every 30 seconds, and all flashlists in dist/share/hcalxt/flash to read HW every 10 seconds.

To turn off monitoring:

 su - daqowner
 cd dist/share/hcalxt/flash
 ../turn_off_flash.csh
 rm *.bak

Substitute turn_on_flash.csh to reverse. "On" means collect ever 10 seconds. "Off" means collect every 10 ''years''.

Now, some confusing tests. First, with monitoring "on" (CAEN lights flash as expected every 10 seconds). Trigger rate about 700Hz.

  ./setup_tts -n 2 -s 50 -t 1 -r 100

After a good long time (10's of minutes) we see this:

 0008 00000000
 (spy prescale)  0018 001c9090
 (sych ctrl)  0080 00000000
 (page) 0000\0422
 (mon words)
 0088 26102610
 (ttc)   00de00de
 (slink)
 HTR mis  0000 0000 ffff 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
 HTR blk  ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000
               RDY on: 0000000a 36d1979d                BSY on: 00000000 00000000
               OFW on: 00000000 00000000                SYN on: 00000000 00000000
               RUN on: 0000000a 36d1bd4d
             L1A Trig: 0003759f          Events Built: 0003759f
         SLink Events: 0003759f            VME Events: 00000080
             Cal Trig: 00000000       CT EvN Mismatch: 00000000       CT BcN Mismatch: 000\00000
      L1 EvN Mismatch: 0002651e       L1 BcN Mismatch: 0002651f       Bunch count err: 000\00001
 HTR   :  OW BZ EE RL LE LW OD CK BE xx xx xx TM HM CT
 HTR  0:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  1:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  2:  01 -- -- -- -- -- ff 01 -- ff ff ff -- -- --
 HTR  3:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  4:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  5:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  6:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  7:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  8:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  9:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR 10:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR 11:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --

This is confirmed in the data by dump_FED.exe:

 ERR Event Number FED= 0x011082 (69762)  HTR(2) EvN=0x011081 (69761) last=0x011082

 FED:  27 EvN: 011082  BcN: 039  OrN: 00243106  TTS: 8
  0: id: 01f Size: 196 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 360 ns: 15
  1: id: 01e Size: 196 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 360 ns: 15
  2: id: 001 Size: 064 EvN: 011081 BcN: 039  OrN: 05  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
  3: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8240 ntp: 12  ndd: 96 ns: 4
  4: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
  5: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
  6: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
  7: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8240 ntp: 12  ndd: 96 ns: 4
  8: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8240 ntp: 12  ndd: 96 ns: 4
  9: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
 10: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
 11: id: 001 Size: 064 EvN: 011082 BcN: 039  OrN: 06  HDR: 8e40 ntp: 12  ndd: 96 ns: 4
 Processed 0x37c41 (228417) events

Try again. Record 300k+ events (1.5GB file). No errors.

However, if we read the LRB directory (run hammer_lrbs.dcc while taking data) we consistently get EvN mismatches.

2008-03-20 (hazen, BU)

http://www.umass.edu/wsp/statistics/lessons/poisson/index.html

Wrestle with DAC with Jeremy. Got more or less working configuration BU_new, with monitoring. Using hcal_3_11_7.

2008-03-19 (hazen, BU)

''Post-GRuMM testing''

Run at 97kHz with

  ./setup_tts -d 100 -s 50 -r 1 -n 10 -t 7
  DCCdiagnose.exe < normal.dcc

Hook up 6 HTRs (12 spigots) in slots 15,16,17,18,19,20 All with 5A firmware. Default HTR settings (just htr.exe -1). Use sixhtr.dcc.

NOTE: sixhtr.dcc has sync fixup in DCC enabled!

AOK up to 73kHz (5 triggers per orbit, 50 BX spacing). At 85kHz (6 trig/orbit) the DCC goes SYN.

Turn on backpressure (./setup_tts -b) at 85kHz... trigger rate limited to 81kHz and AOK. All HTR error counters are zero.

 TTS: 1000 RDY
   MB full
 0004 2c1d0001
:  run mode
 0008 00000000
 (spy prescale)  0018 00189090
 (sych ctrl)  0080 00000000
 (page) 0000025a
 (mon words)
 0088 22602260
 (ttc)   00000000
 (slink)
 HTR mis  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
 HTR blk  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
	       RDY on: 00000000 200578bd                BSY on: 00000000 00000000
	       OFW on: 00000000 07645b58                SYN on: 00000000 00000000
	       RUN on: 00000000 276a01a5
	     L1A Trig: 000b7dda          Events Built: 000b7dda
	 SLink Events: 00000000            VME Events: 00000080
	     Cal Trig: 00000000       CT EvN Mismatch: 00000000       CT BcN Mismatch: 00000000
      L1 EvN Mismatch: 00000000       L1 BcN Mismatch: 00000000       Bunch count err: 00000000
 HTR   :  OW BZ EE RL LE LW OD CK BE xx xx xx TM HM CT
 HTR  0:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  1:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  2:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  3:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  4:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  5:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  6:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  7:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  8:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 >>>0bc0: 000b7dda 00000000 00000080 000b7dda 00000000 00000000 00000000 00000000
 0be0: 00000000 00000000

Try a torture test!

  ./setup_tts -d 100 -s 50 -r 1 -n 10 -t 50 -b

Now we consistently see-

 HTR mis  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
 HTR blk  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
	       RDY on: 00000000 05e7e6de                BSY on: 00000000 00000000
	       OFW on: 00000000 21818d97                SYN on: 00000000 00000000
	       RUN on: 00000000 2769a00d
	     L1A Trig: 000b7dfe          Events Built: 000b7dfe
	 SLink Events: 00000000            VME Events: 00000080
	     Cal Trig: 00000000       CT EvN Mismatch: 00000000       CT BcN Mismatch: 00000000
      L1 EvN Mismatch: 00000000       L1 BcN Mismatch: 0001ea5a       Bunch count err: 000053c0
 HTR   :  OW BZ EE RL LE LW OD CK BE xx xx xx TM HM CT
 HTR  0:  17 -- -- -- -- -- ff -- ff ff ff ff -- -- --
 HTR  1:  17 -- -- -- -- -- ff -- ff ff -- -- -- -- --
 HTR  2:  17 -- -- -- -- -- ff -- ff ff ff ff -- -- --
 HTR  3:  17 -- -- -- -- -- ff -- ff ff ff ff -- -- --
 HTR  4:  17 -- -- -- -- -- ff -- ff ff -- -- -- -- --
 HTR  5:  17 -- -- -- -- -- ff -- ff ff -- -- -- -- --
 HTR  6:  17 -- -- -- -- -- ff -- ff ff -- -- -- -- --
 HTR  7:  17 -- -- -- -- -- ff -- ff ff ff ff -- -- --
 HTR  8:  17 -- -- -- -- -- ff -- ff ff -- -- -- -- --
 >>>0bc0: 000b7dfe 00000000 00000080 000b7dfe 00000000 00000000 00000000 00000000
 0be0: 0001ea5a 000053c0

It's odd that the OW count is small. What about "BE"? Perhaps that is a clue.

Try 3 triggers spaced 10 bx every orbit:

  ./setup_tts -d 100 -s 10 -r 1 -n 10 -t 2 -b

 TTS: 1000 RDY
   MB full
 0004 2c1d0001
:  run mode
 0008 00000000
 (spy prescale)  0018 00189090
 (sych ctrl)  0080 00000000
 (page) 0000025a
 (mon words)
 0088 2e902ea0
 (ttc)   00000000
 (slink)
 HTR mis  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
 HTR blk  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
	       RDY on: 00000000 2768f8fd                BSY on: 00000000 00000000
	       OFW on: 00000000 00000000                SYN on: 00000000 00000000
	       RUN on: 00000000 27692645
	     L1A Trig: 00052917          Events Built: 00052916
	 SLink Events: 00000000            VME Events: 00000080
	     Cal Trig: 00000000       CT EvN Mismatch: 00000000       CT BcN Mismatch: 00000000
      L1 EvN Mismatch: 0001b85c       L1 BcN Mismatch: 0001b85c       Bunch count err: 00000000
 HTR   :  OW BZ EE RL LE LW OD CK BE xx xx xx TM HM CT
 HTR  0:  -- -- -- ff -- -- ff -- -- ff ff ff -- -- --
 HTR  1:  -- -- -- ff -- -- ff -- -- ff -- -- -- -- --
 HTR  2:  -- -- -- ff -- -- ff -- -- ff ff ff -- -- --
 HTR  3:  -- -- -- ff -- -- ff -- -- ff ff ff -- -- --
 HTR  4:  -- -- -- ff -- -- ff -- -- ff -- -- -- -- --
 HTR  5:  -- -- -- ff -- -- ff -- -- ff -- -- -- -- --
 HTR  6:  -- -- -- ff -- -- ff -- -- ff -- -- -- -- --
 HTR  7:  -- -- -- ff -- -- ff -- -- ff ff ff -- -- --
 HTR  8:  -- -- -- ff -- -- ff -- -- ff -- -- -- -- --
 >>>0bc0: 00052916 00000000 00000080 00052917 00000000 00000000 00000000 0001b85c
 0be0: 0001b85c 00000000

Hmm... the RL reported by HTR are confirmed y the large L1 EvN Mismatch. Increasing the spacing to 20 gets rid of the RL counts:

   ./setup_tts -d 100 -s 20 -r 1 -n 10 -t 2 -b

 TTS: 1000 RDY
   MB full
 0004 2c1d0001
:  run mode
 0008 00000000
 (spy prescale)  0018 00189090
 (sych ctrl)  0080 00000000
 (page) 0000025a
 (mon words)
 0088 2ec02ec0
 (ttc)   00000000
 (slink)
 HTR mis  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
 HTR blk  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
	       RDY on: 00000000 2768a695                BSY on: 00000000 00000000
	       OFW on: 00000000 00000000                SYN on: 00000000 00000000
	       RUN on: 00000000 2768d275
	     L1A Trig: 00052914          Events Built: 00052914
	 SLink Events: 00000000            VME Events: 00000080
	     Cal Trig: 00000000       CT EvN Mismatch: 00000000       CT BcN Mismatch: 00000000
      L1 EvN Mismatch: 00000000       L1 BcN Mismatch: 00000000       Bunch count err: 00000000
 HTR   :  OW BZ EE RL LE LW OD CK BE xx xx xx TM HM CT
 HTR  0:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  1:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  2:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  3:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  4:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  5:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  6:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  7:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  8:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 >>>0bc0: 00052914 00000000 00000080 00052914 00000000 00000000 00000000 00000000
 0be0: 00000000 00000000

Use this setting (130kHz) with backpressure.

  ./setup_tts -d 100 -s 50 -r 1 -n 10 -t 10 -b

Using this script which reads DCC and LRB status repeatedly:

 #
 # DCC script to take data with DCC
 #
     ttc/write 0x82 0xf000        # reset BGO fifos
     ttc/write 0x80 0xff64        # enable external orbit, disable triggers
     ttc/write 0x92 10            # inhibit 0 delay (250ns)
     ttc/write 0x94 10            # inhibit 0 duration (250ns)
     ttc/write BData0 0x00800000  # write one word (BCR, cmd=01) to fifo 0
     ttc/write 0x90 0xd           # enable BG0 channel 0
 #    dcc/synch 0x00189090
     dcc/init 0 1 2 3 4 5 6 7 8 9 10 11
 # enable "stop event builder when VME full"
 #    pci/write log3_fmem 4 0x4000
 # enable SYN sTTS output on HTR RL error bit (bit 3)
 #    pci/write log3_fmem 0xc0c 0x02
     dcc/start                    # set DCC to run mode
     pci/read log3_fmem 0x18
     lrb/stat 1                   # read LRB status
     ttc/cmd 2                    # send ECR
     ttc/cmd 0x28                 # send orbit reset
  #   dcc/open test.dat            # open data file
     ttc/trig 1                   # enable L1A

     sleep 0.1      
     dcc/status
     lrb/stat 1
       .... repeat 20 times prev 3 lines

    ttc/trig 4                   # disable L1A
     sleep 0.3                    # wait briefly for things to finish
     dcc/status
 #    dcc/dump 9999               # dump all events to file
 #    dcc/status                  # display final DCC status
     pci/r log3_fmem 0xbc0 10     # dump counters for Wu
     quit

Finally we hit pay dirt! Occasionally we see evidence of corrupted data, like below:

 TTS: 1000 RDY
   MB full
   TTCrx BCn err
 0004 2c1d0001
:  run mode
 0008 00000000
 (spy prescale)  0018 001c9090
 (sych ctrl)  0080 00000000
 (page) 0000025a
 (mon words)
 0088 20202030
 (ttc)   00000000
 (slink)
 HTR mis  ffff 0000 ffff 0000 0000 0000 ffff ffff 5975 0000 0000 0000 0000 0000 0000
 HTR blk  ffff ffff ffff ffff ffff ffff ffff ffff ffff 0000 0000 0000 0000 0000 0000
	       RDY on: 00000000 04e72964                BSY on: 00000000 00000000
	       OFW on: 00000000 02f1a1e9                SYN on: 00000000 00000000
	       RUN on: 00000000 07d8f0fd
	     L1A Trig: 000201fe          Events Built: 000201fd
	 SLink Events: 00000000            VME Events: 00000080
	     Cal Trig: 00000000       CT EvN Mismatch: 00000000       CT BcN Mismatch: 00000000
      L1 EvN Mismatch: 0001b5c6       L1 BcN Mismatch: 0001b5c6       Bunch count err: 00000001
 HTR   :  OW BZ EE RL LE LW OD CK BE xx xx xx TM HM CT
 HTR  0:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  1:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  2:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  3:  -- -- -- -- -- -- ff -- -- ff ff ff -- -- --
 HTR  4:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  5:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  6:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 HTR  7:  01 01 01 01 01 01 ff -- -- ff ff ff -- -- --
 HTR  8:  -- -- -- -- -- -- ff -- -- ff -- -- -- -- --
 >>>0bc0: 000201fd 00000000 00000080 000201fe 00000000 00000000 00000000 0001b5c6
 0be0: 0001b5c6 00000001