bu_cms_history/SN_059

SiteMap (Historical BU CMS wiki main page)

S/N 059

2007-09-18 (hazen, CERN)

Dick Says:

  Check out loading fw into DCC s/n 59, which had problems in USC55
  - things do not go well - fails on first LRB verification then core dumps, see
  http://cmsdoc.cern.ch/cms/HCAL/document/904/log/2007-aug-10/DCCprogrammer_failure.txt

Check it out with DCCprogrammer:

 ./DCCprogrammer.exe sbs:0 8 -i

 HCAL DCCprogrammer v1.2 rev 12 Feb 2007
 Bus:device sbs:0 specified
 Slot number 8 specified
 Looking for HAL addresstable files in directory /home/daqowner/dist/hal/
 (change by setting PROGRAMMER_HAL_PATH environment variable)
 Device to open: /dev/btp96
 DEBUG - DCCConfigInfo::DCCConfigInfo()
 INFO - Set up logic board at relative address 0x00240000, absolute 0x28240000.  DCC firmware revision = 0x2c10
 INFO - This is a DCC v4
 ** Flash access OK **
 ** Firmware Revisions:
         LOG1: 0x000c (12)
         LOG2: 0x000b (11)
         LOG3: 0x0012 (18)
         MIP1: --------
         MIP2: --------
         MIP3: --------
         MIP4: --------
         MIP5: 0x001b (27)
       XILINX: 0x2c10 (11280)
         CPLD:   0x00 (status bits=0x00)
    serial no: 59

Hmm... looks generally OK, except that MIP 1-4 are missing, and other firmware is a bit out of date.

  ./DCCprogrammer.exe sbs:0 8 -p LOG3 pci3v15.hex -y

  ./DCCrepair.exe sbs:0 8 -j LOG3 dcc_confv2.stapl -y

The updates go ok, but still missing LRB 1-4:

  ** Flash access OK **
  ** Firmware Revisions:
        LOG1: 0x000c (12)
        LOG2: 0x000b (11)
        LOG3: 0x0015 (21)
        MIP1: --------
        MIP2: --------
        MIP3: --------
        MIP4: --------
        MIP5: 0x001b (27)
      XILINX: 0x2c10 (11280)
        CPLD:   0x02 (status bits=0x00)
   serial no: 59

Check each LRB to be sure it answers JTAG:

 ./DCCrepair.exe sbs:0 8 -j mip1 idcode.jam -y

  INFO (JAM PLAYER): DevSel = 0xc Nbit = 0 TCK_freq = 0x1
  CRC mismatch: expected BAD6, actual 76E6
  ******************************************************************************
  * Altera Chain Interrogation Version 2.02                                    *
  *   Copyright (c) 1999-2001 Altera Corporation.  All Rights Reserved.        *
  *   Modified 20 Aug 2007 by E. Hazen to recoginize HCAL/DCC Devicessss       *
  ******************************************************************************
  Chain Continuity Checker
  DCC JAM Player running. Please wait
  Chain Continuity during IR is not stuck at zero or one
  ******************************************************************************
  Chain Length -- Load IR of all ones then count DR length
    Number of Devices is 1
  ******************************************************************************
  IR Length Calculator
    Instruction Register Length is 10
  ******************************************************************************
  IDCODE Reader
  ---------- | ---- ------------------- ------------- - |
  TDO -> TDI | Rev  Device              Mfgr          1 |
  ---------- | ---- ------------------- ------------- - |
  Device #1  | 0000 0001 0000 0000 0010 0000 1101 110 1 |
  ---------- | ---- ------------------- ------------- - |
  ******************************************************************************
  Device Identifier -- Search for device name from list of device IDCODE values
  ---------- |      ------------------- -------------   |
  TDO -> TDI |      Device              Mfgr            |
  ---------- |      ------------------- -------------   |
  Device #1  |      EPC2                Altera          |
  ---------- |      ------------------- ------------- - |
  ******************************************************************************
  Exit code = 0... Success

Did this for all LRBs... look OK. Try reprogramming LRB1 for good measure using:

  ./DCCrepair.exe sbs:0 8 -j mip1 lrbv27.jam -y

Doesn't help, MIP1 still missing. Check also with DCCrepair scan and don't see any of the MIPs except MIP5. So, the problem could be on the logic board, motherboard (or very unlikely) all the LRBs. Can't think of a single-point failure.

Try reprogramming PCI1 and PCI2 from JTAG to see if it helps:

 ./DCCrepair.exe sbs:0 8 -j log3 pci1vc.jam -y
 ./DCCrepair.exe sbs:0 8 -j log3 pci2vb.jam -y

Then a PCI scan:

 ./DCCrepair.exe sbs:0 8 -c -b -p

 Bus Dev Alias PCI ID     Device ID  CSR      BAR0     BAR1     BAR2     BAR3     BAR4     BAR5
  0   0  br3 ac21104c  PCI bridge  02100143 00000000 00000000 00020100 02000101 00000000 00000000
  0   1 log3 00030072    DCC LOG3  04000000 00000008 00000000 00000000 00000000 00000000 00000000
  0   2  uv2 00020201          ??  00402f21 00020201 00402f21 00020201 00402f21 00020201 00402f21
  0   3   bc ffffffff          ??
  0   4   lc 00040070  Local Ctrl  04000147 00000000 00001001 00000000 00000000 00000000 00000000
  1   0  br2 ac21104c  PCI bridge  02100143 00000000 00000000 00020201 02000101 00000000 00000000
  1   1 log2 00020072    DCC LOG2  04000000 00000000 00000000 00000000 00000000 00000000 00000000
  1   2 mip3 ffffffff          ??
  1   3 mip4 ffffffff          ??
  1   4 mip5 00400055         LRB  04000000 00000000 00000000 00000000 00000000 00000000 00000000
  2   0   -- ffffffff          ??
  2   1 log1 00010072    DCC LOG1  04000000 00000000 00000000 00000000 00000000 00000000 00000000
  2   2 mip0 ffffffff          ??
  2   3 mip1 00400055         LRB  04000000 00000000 00000000 00000000 00000000 00000000 00000000
  2   4 mip2 ffffffff          ??

Aha! MIP1 is back. Probably had nothing to do with the PCI1/2 programming... I expect the LRB just needed to have it's firmware restored, and reloaded from flash.

Reprogram mip2-mip4 similarly:

  ./DCCrepair.exe sbs:0 8 -j mip2 lrbv27.jam -y
  ./DCCrepair.exe sbs:0 8 -j mip3 lrbv27.jam -y
  ./DCCrepair.exe sbs:0 8 -j mip4 lrbv27.jam -y

Now DCCprogrammer -i reports that all is well:

  ** Flash access OK **
  ** Firmware Revisions:
        LOG1: 0x000c (12)
        LOG2: 0x000b (11)
        LOG3: 0x0015 (21)
        MIP1: 0x001b (27)
        MIP2: 0x001b (27)
        MIP3: 0x001b (27)
        MIP4: 0x001b (27)
        MIP5: 0x001b (27)
      XILINX: 0x2c10 (11280)
        CPLD:   0x02 (status bits=0x00)
   serial no: 59

The mystery is how the LRBs 1-4 suffered a massive firmware loss?