wiki:Apollo SM troubleshooting

Eth0 not coming up:

Depending on what the Zynq RJ45 jack is connected to, the autoneg can do some strange things. Please check /fw/startup.sh to see the state of the line

root@zynq_os:~# cat /fw/startup.sh 
echo Startup script!
ethtool -s eth0 speed 100 duplex full autoneg off

Try toggling if the ethtool line is commented out.



IPMC/Zynq BOOT handshaking. Useful commands for debugging

  • SHM is the shelf manager ip
  • BLAD_IPMB_ADDR is blade IPMC address 0x9a for physical slot 1

Find blade address

ipmitool -I lanplus -H SHM -P "" fru list

Connect to Zynq via SOL (works by default, IPMC can override destination)

ipmitool -I lan -H SHM -P "" -t BLADE_IPMB_ADDR SOL activate

Force reset the blade

ipmitool -I lan -H SHM -P "" -t BLADE_IPMB_ADDR mc reset cold

The ZYNQ must write a '1' to SLAVE_I2C.S1.SM.STATUS.DONE If this doesn't happen, the IPMC will shut down the blade.


CentOS upgrades to partition 2 (root) Network:

  • Locally extract as root the tarball (tar -zxpf) and then rsync to host
  • sudo tar --numeric-owner -p -zcf ../2020-01-29-SD_p2.tar.gz ./
  • sudo rsync -P -r -l -p -o -g -W --numeric-ids ./ root@host:/
  • sudo rsync -P -r -l -p -o -g -W -c --numeric-ids ./ root@host:/

IPMC issues

devboard power up:

  • SETHA 0x34 - set the hw address so the IPMC can power up

Restart shutdown Blade

First figure out the ipmb address of the blade. One way to do this is via the shelf manager webpage. Since you are probably in a shell, use lynx.

lynx IP

Shelf IP = 192.168.0.171 BU, 192.168.0.2 TIF

To get the info, select Boards/IPMC/AMCs->Boards->Show

Once the page refreshes, go to the bottom and select the board physical slot you want.

Once you submit you'll see a listing with the physical address and the IPMB address

ex:

Physical Slot # 1

9a: Entity: (0xa0, 0x60) Maximum FRU device ID: 0x00

PICMG Version 2.3 Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State Change Cause: Normal State Change (0x0)

9a: FRU # 0

Entity: (0xa0, 0x60) Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State Change Cause: Normal State Change (0x0) Device ID String: "APOLLO-BLADE"

Physical Slot # 1

9a: Entity: (0xa0, 0x60) Maximum FRU device ID: 0x00

PICMG Version 2.3 Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State Change Cause: Normal State Change (0x0)

9a: FRU # 0

Entity: (0xa0, 0x60) Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State Change Cause: Normal State Change (0x0) Device ID String: "APOLLO-BLADE"

Now we can see that physical slot #1 has ipmb address 0x9a and is an Apollo blade

Now we can apply a warm restart with

ipmitool -I lanplus -H SHELF_MAN_IP -P "" -t IPMB_ADDR mc reset warm

ipmitool -I lanplus -H 192.168.0.2 -P "" -t 0x9a mc reset warm

@CERN

Last modified 2 months ago Last modified on Mar 24, 2020, 2:11:35 PM