NOTE: Old contents of BU CMS wiki. These (green) pages cannot be edited See www.hcal.info for more links to active TWikis |
Dick Says:
Check out loading fw into DCC s/n 59, which had problems in USC55 - things do not go well - fails on first LRB verification then core dumps, see http://cmsdoc.cern.ch/cms/HCAL/document/904/log/2007-aug-10/DCCprogrammer_failure.txt
Check it out with DCCprogrammer:
./DCCprogrammer.exe sbs:0 8 -i
HCAL DCCprogrammer v1.2 rev 12 Feb 2007 Bus:device sbs:0 specified Slot number 8 specified Looking for HAL addresstable files in directory /home/daqowner/dist/hal/ (change by setting PROGRAMMER_HAL_PATH environment variable) Device to open: /dev/btp96 DEBUG - DCCConfigInfo::DCCConfigInfo() INFO - Set up logic board at relative address 0x00240000, absolute 0x28240000. DCC firmware revision = 0x2c10 INFO - This is a DCC v4 ** Flash access OK ** ** Firmware Revisions: LOG1: 0x000c (12) LOG2: 0x000b (11) LOG3: 0x0012 (18) MIP1: -------- MIP2: -------- MIP3: -------- MIP4: -------- MIP5: 0x001b (27) XILINX: 0x2c10 (11280) CPLD: 0x00 (status bits=0x00) serial no: 59
Hmm... looks generally OK, except that MIP 1-4 are missing, and other firmware is a bit out of date.
./DCCprogrammer.exe sbs:0 8 -p LOG3 pci3v15.hex -y
./DCCrepair.exe sbs:0 8 -j LOG3 dcc_confv2.stapl -y
The updates go ok, but still missing LRB 1-4:
** Flash access OK ** ** Firmware Revisions: LOG1: 0x000c (12) LOG2: 0x000b (11) LOG3: 0x0015 (21) MIP1: -------- MIP2: -------- MIP3: -------- MIP4: -------- MIP5: 0x001b (27) XILINX: 0x2c10 (11280) CPLD: 0x02 (status bits=0x00) serial no: 59
Check each LRB to be sure it answers JTAG:
./DCCrepair.exe sbs:0 8 -j mip1 idcode.jam -y
INFO (JAM PLAYER): DevSel = 0xc Nbit = 0 TCK_freq = 0x1 CRC mismatch: expected BAD6, actual 76E6 ****************************************************************************** * Altera Chain Interrogation Version 2.02 * * Copyright (c) 1999-2001 Altera Corporation. All Rights Reserved. * * Modified 20 Aug 2007 by E. Hazen to recoginize HCAL/DCC Devicessss * ****************************************************************************** Chain Continuity Checker DCC JAM Player running. Please wait Chain Continuity during IR is not stuck at zero or one ****************************************************************************** Chain Length -- Load IR of all ones then count DR length Number of Devices is 1 ****************************************************************************** IR Length Calculator Instruction Register Length is 10 ****************************************************************************** IDCODE Reader ---------- | ---- ------------------- ------------- - | TDO -> TDI | Rev Device Mfgr 1 | ---------- | ---- ------------------- ------------- - | Device #1 | 0000 0001 0000 0000 0010 0000 1101 110 1 | ---------- | ---- ------------------- ------------- - | ****************************************************************************** Device Identifier -- Search for device name from list of device IDCODE values ---------- | ------------------- ------------- | TDO -> TDI | Device Mfgr | ---------- | ------------------- ------------- | Device #1 | EPC2 Altera | ---------- | ------------------- ------------- - | ****************************************************************************** Exit code = 0... Success
Did this for all LRBs... look OK. Try reprogramming LRB1 for good measure using:
./DCCrepair.exe sbs:0 8 -j mip1 lrbv27.jam -y
Doesn't help, MIP1 still missing. Check also with DCCrepair scan and don't see any of the MIPs except MIP5. So, the problem could be on the logic board, motherboard (or very unlikely) all the LRBs. Can't think of a single-point failure.
Try reprogramming PCI1 and PCI2 from JTAG to see if it helps:
./DCCrepair.exe sbs:0 8 -j log3 pci1vc.jam -y ./DCCrepair.exe sbs:0 8 -j log3 pci2vb.jam -y
Then a PCI scan:
./DCCrepair.exe sbs:0 8 -c -b -p
Bus Dev Alias PCI ID Device ID CSR BAR0 BAR1 BAR2 BAR3 BAR4 BAR5 0 0 br3 ac21104c PCI bridge 02100143 00000000 00000000 00020100 02000101 00000000 00000000 0 1 log3 00030072 DCC LOG3 04000000 00000008 00000000 00000000 00000000 00000000 00000000 0 2 uv2 00020201 ?? 00402f21 00020201 00402f21 00020201 00402f21 00020201 00402f21 0 3 bc ffffffff ?? 0 4 lc 00040070 Local Ctrl 04000147 00000000 00001001 00000000 00000000 00000000 00000000 1 0 br2 ac21104c PCI bridge 02100143 00000000 00000000 00020201 02000101 00000000 00000000 1 1 log2 00020072 DCC LOG2 04000000 00000000 00000000 00000000 00000000 00000000 00000000 1 2 mip3 ffffffff ?? 1 3 mip4 ffffffff ?? 1 4 mip5 00400055 LRB 04000000 00000000 00000000 00000000 00000000 00000000 00000000 2 0 -- ffffffff ?? 2 1 log1 00010072 DCC LOG1 04000000 00000000 00000000 00000000 00000000 00000000 00000000 2 2 mip0 ffffffff ?? 2 3 mip1 00400055 LRB 04000000 00000000 00000000 00000000 00000000 00000000 00000000 2 4 mip2 ffffffff ??
Aha! MIP1 is back. Probably had nothing to do with the PCI1/2 programming... I expect the LRB just needed to have it's firmware restored, and reloaded from flash.
Reprogram mip2-mip4 similarly:
./DCCrepair.exe sbs:0 8 -j mip2 lrbv27.jam -y ./DCCrepair.exe sbs:0 8 -j mip3 lrbv27.jam -y ./DCCrepair.exe sbs:0 8 -j mip4 lrbv27.jam -y
Now DCCprogrammer -i reports that all is well:
** Flash access OK ** ** Firmware Revisions: LOG1: 0x000c (12) LOG2: 0x000b (11) LOG3: 0x0015 (21) MIP1: 0x001b (27) MIP2: 0x001b (27) MIP3: 0x001b (27) MIP4: 0x001b (27) MIP5: 0x001b (27) XILINX: 0x2c10 (11280) CPLD: 0x02 (status bits=0x00) serial no: 59
The mystery is how the LRBs 1-4 suffered a massive firmware loss?