Friday, November 18, 2022

Dell MD3620f recovery

This has been sitting in my drafts since February so I'll just publish it. You can work it out. tl;dr reflash firmware via serial scsi rescan SCAN fc on linux https://linoxide.com/scandetect-luns-redhat-linux-outputs-remember/ https://www.thegeekdiary.com/how-to-identify-the-hba-cardsports-and-wwn-in-rheloel/ http://www.andrewparisio.com/2013/02/dell-md3620f-mini-review.html look at install scripts.. centos? no install redhat, nonsense to get licence, enable repos, install required tools.. Dell DVD image, install java magmt tools install a bunch of X yum.log.. Feb 21 00:21:10 Installed: atk-2.28.1-2.el7.x86_64 Feb 21 00:21:10 Installed: libjpeg-turbo-1.2.90-8.el7.x86_64 Feb 21 00:21:10 Installed: mesa-libglapi-18.3.4-12.el7_9.x86_64 Feb 21 00:21:10 Installed: hicolor-icon-theme-0.12-7.el7.noarch Feb 21 00:21:11 Installed: fontpackages-filesystem-1.44-8.el7.noarch Feb 21 00:21:11 Installed: libwayland-client-1.15.0-1.el7.x86_64 Feb 21 00:21:11 Installed: libusbx-1.0.21-1.el7.x86_64 Feb 21 00:21:11 Installed: 1:libglvnd-1.0.1-0.8.git5baa1e5.el7.x86_64 Feb 21 00:21:11 Installed: libxshmfence-1.2-1.el7.x86_64 Feb 21 00:21:11 Installed: libICE-1.0.9-9.el7.x86_64 Feb 21 00:21:12 Installed: pixman-0.34.0-1.el7.x86_64 Feb 21 00:21:12 Installed: libwayland-server-1.15.0-1.el7.x86_64 Feb 21 00:21:12 Installed: libSM-1.2.2-2.el7.x86_64 Feb 21 00:21:12 Installed: libgusb-0.2.9-1.el7.x86_64 Feb 21 00:21:12 Installed: libwayland-cursor-1.15.0-1.el7.x86_64 Feb 21 00:21:12 Installed: abattis-cantarell-fonts-0.0.25-1.el7.noarch Feb 21 00:21:13 Installed: dejavu-fonts-common-2.33-6.el7.noarch Feb 21 00:21:13 Installed: dejavu-sans-fonts-2.33-6.el7.noarch Feb 21 00:21:14 Installed: fontconfig-2.13.0-4.3.el7.x86_64 Feb 21 00:21:18 Installed: gnome-icon-theme-3.12.0-1.el7.noarch Feb 21 00:21:18 Installed: jasper-libs-1.900.1-33.el7.x86_64 Feb 21 00:21:19 Installed: libX11-common-1.6.7-4.el7_9.noarch Feb 21 00:21:19 Updated: subscription-manager-rhsm-certificates-1.24.50-1.el7_9.x86_64 Feb 21 00:21:19 Updated: subscription-manager-rhsm-1.24.50-1.el7_9.x86_64 Feb 21 00:21:20 Updated: subscription-manager-1.24.50-1.el7_9.x86_64 Feb 21 00:21:20 Installed: rarian-0.8.1-11.el7.x86_64 Feb 21 00:21:21 Installed: rarian-compat-0.8.1-11.el7.x86_64 Feb 21 00:21:22 Installed: lcms2-2.6-3.el7.x86_64 Feb 21 00:21:22 Installed: colord-libs-1.3.4-2.el7.x86_64 Feb 21 00:21:22 Installed: jbigkit-libs-2.0-11.el7.x86_64 Feb 21 00:21:22 Installed: libtiff-4.0.3-35.el7.x86_64 Feb 21 00:21:22 Installed: dconf-0.28.0-4.el7.x86_64 Feb 21 00:21:23 Installed: libepoxy-1.5.2-1.el7.x86_64 Feb 21 00:21:23 Installed: libthai-0.1.14-9.el7.x86_64 Feb 21 00:21:23 Installed: libwayland-egl-1.15.0-1.el7.x86_64 Feb 21 00:21:23 Installed: fribidi-1.0.2-1.el7_7.1.x86_64 Feb 21 00:21:23 Installed: libpciaccess-0.14-1.el7.x86_64 Feb 21 00:21:23 Installed: libdrm-2.4.97-2.el7.x86_64 Feb 21 00:21:24 Installed: mesa-libgbm-18.3.4-12.el7_9.x86_64 Feb 21 00:21:24 Installed: avahi-libs-0.6.31-20.el7.x86_64 Feb 21 00:21:24 Installed: 1:cups-libs-1.6.3-51.el7.x86_64 Feb 21 00:21:24 Installed: psmisc-22.20-17.el7.x86_64 Feb 21 00:21:25 Installed: GConf2-3.2.6-8.el7.x86_64 Feb 21 00:21:25 Installed: graphite2-1.3.10-1.el7_3.x86_64 Feb 21 00:21:25 Installed: harfbuzz-1.7.5-2.el7.x86_64 Feb 21 00:21:25 Installed: nettle-2.7.1-9.el7_9.x86_64 Feb 21 00:21:26 Installed: libmodman-2.0.1-8.el7.x86_64 Feb 21 00:21:26 Installed: libproxy-0.4.11-11.el7.x86_64 Feb 21 00:21:46 Installed: adwaita-cursor-theme-3.28.0-1.el7.noarch Feb 21 00:21:53 Installed: adwaita-icon-theme-3.28.0-1.el7.noarch Feb 21 00:21:53 Installed: libXau-1.0.8-2.1.el7.x86_64 Feb 21 00:21:53 Installed: libxcb-1.13-1.el7.x86_64 Feb 21 00:21:54 Installed: libX11-1.6.7-4.el7_9.x86_64 Feb 21 00:21:54 Installed: libXext-1.3.3-3.el7.x86_64 Feb 21 00:21:54 Installed: libXrender-0.9.10-1.el7.x86_64 Feb 21 00:21:54 Installed: gdk-pixbuf2-2.36.12-3.el7.x86_64 Feb 21 00:21:54 Installed: libXfixes-5.0.3-1.el7.x86_64 Feb 21 00:21:55 Installed: libXdamage-1.1.4-4.1.el7.x86_64 Feb 21 00:21:55 Installed: libXi-1.7.9-1.el7.x86_64 Feb 21 00:21:55 Installed: libXcursor-1.1.15-1.el7.x86_64 Feb 21 00:21:55 Installed: gtk-update-icon-cache-3.22.30-6.el7.x86_64 Feb 21 00:21:55 Installed: libXrandr-1.5.1-2.el7.x86_64 Feb 21 00:21:55 Installed: libXinerama-1.1.3-2.1.el7.x86_64 Feb 21 00:21:55 Installed: libXcomposite-0.4.4-4.1.el7.x86_64 Feb 21 00:21:56 Installed: libXtst-1.2.3-1.el7.x86_64 Feb 21 00:21:56 Installed: at-spi2-core-2.28.0-1.el7.x86_64 Feb 21 00:21:56 Installed: at-spi2-atk-2.26.2-1.el7.x86_64 Feb 21 00:21:56 Installed: libnotify-0.7.7-1.el7.x86_64 Feb 21 00:21:56 Installed: libXft-2.3.2-2.el7.x86_64 Feb 21 00:21:56 Installed: libXxf86vm-1.1.4-1.el7.x86_64 Feb 21 00:21:57 Installed: 1:libglvnd-glx-1.0.1-0.8.git5baa1e5.el7.x86_64 Feb 21 00:21:57 Installed: mesa-libGL-18.3.4-12.el7_9.x86_64 Feb 21 00:21:57 Installed: 1:libglvnd-egl-1.0.1-0.8.git5baa1e5.el7.x86_64 Feb 21 00:21:57 Installed: mesa-libEGL-18.3.4-12.el7_9.x86_64 Feb 21 00:21:57 Installed: cairo-1.15.12-4.el7.x86_64 Feb 21 00:21:58 Installed: pango-1.42.4-4.el7_7.x86_64 Feb 21 00:21:58 Installed: cairo-gobject-1.15.12-4.el7.x86_64 Feb 21 00:21:58 Installed: librsvg2-2.40.20-1.el7.x86_64 Feb 21 00:22:02 Installed: gtk2-2.24.31-1.el7.x86_64 Feb 21 00:22:02 Installed: pycairo-1.8.10-8.el7.x86_64 Feb 21 00:22:02 Installed: python-gobject-3.22.0-1.el7_4.1.x86_64 Feb 21 00:22:02 Installed: xcb-util-0.4.0-2.el7.x86_64 Feb 21 00:22:02 Installed: startup-notification-0.12-8.el7.x86_64 Feb 21 00:22:02 Installed: usermode-gtk-1.111-6.el7.x86_64 Feb 21 00:22:03 Installed: gsettings-desktop-schemas-3.28.0-3.el7.x86_64 Feb 21 00:22:03 Installed: json-glib-1.4.2-2.el7.x86_64 Feb 21 00:22:21 Installed: xkeyboard-config-2.24-1.el7.noarch Feb 21 00:22:21 Installed: libxkbcommon-0.7.1-3.el7.x86_64 Feb 21 00:22:22 Installed: trousers-0.3.14-2.el7.x86_64 Feb 21 00:22:22 Installed: gnutls-3.3.29-9.el7_6.x86_64 Feb 21 00:22:22 Installed: glib-networking-2.56.1-1.el7.x86_64 Feb 21 00:22:22 Installed: libsoup-2.62.2-2.el7.x86_64 Feb 21 00:22:23 Installed: rest-0.8.1-2.el7.x86_64 Feb 21 00:22:24 Installed: gtk3-3.22.30-6.el7.x86_64 Feb 21 00:22:24 Installed: rhsm-gtk-1.24.50-1.el7_9.x86_64 Feb 21 00:22:24 Installed: subscription-manager-gui-1.24.50-1.el7_9.x86_64 Feb 21 01:21:26 Installed: libXt-1.1.5-3.el7.x86_64 Feb 21 01:21:26 Installed: libXmu-1.1.2-2.el7.x86_64 Feb 21 01:21:26 Installed: libXpm-3.5.12-1.el7.x86_64 Feb 21 01:21:26 Installed: libXaw-1.0.13-4.el7.x86_64 Feb 21 01:21:26 Installed: xterm-295-3.el7_9.1.x86_64 Feb 21 01:22:04 Installed: 1:xorg-x11-xauth-1.0.9-1.el7.x86_64 Feb 21 16:09:00 Installed: libXv-1.0.11-1.el7.x86_64 Feb 21 16:09:00 Installed: libdmx-1.1.3-3.el7.x86_64 Feb 21 16:09:00 Installed: libXxf86misc-1.0.3-7.1.el7.x86_64 Feb 21 16:09:00 Installed: libXxf86dga-1.1.4-2.1.el7.x86_64 Feb 21 16:09:01 Installed: xorg-x11-utils-7.5-23.el7.x86_64 Feb 21 16:38:16 Installed: unzip-6.0-24.el7_9.x86_64 ssh forwarding.. xauth /opt/dell/mdstoragesoftware/mdstoragemanager/client/SMclient 2.5" array, disks stuck in bypass mode. (auto) create array/host fails, need 11 disks ;find that you can't create array without ~11 disks, don't have enouh. switch to 3.5" array chassis ; use 3.5chassis, works.. only with one controller. still fails! manual create group instead of pool, R0 rescan-scsi-bus.sh $ sudo rescan-scsi-bus.sh Scanning SCSI subsystem for new devices Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs Scanning host 1 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs Scanning host 2 for all SCSI target IDs, all LUNs Scanning for device 2 0 3 0 ... NEW: Host: scsi2 Channel: 00 Id: 03 Lun: 00 Vendor: DELL Model: MD36xxf Rev: 0820 Type: Direct-Access ANSI SCSI revision: 05 Scanning for device 2 0 3 31 ... OLD: Host: scsi2 Channel: 00 Id: 03 Lun: 31 Vendor: DELL Model: Universal Xport Rev: 0820 Type: Direct-Access ANSI SCSI revision: 05 1 new or changed device(s) found. [2:0:3:0] 0 remapped or resized device(s) found. $ sudo fdisk -l /dev/sda Disk /dev/sda: 4.88 TiB, 5347976675328 bytes, 10445266944 sectors Disk model: MD36xxf Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes test, working? BADBLOCKS test 32 bit issues running.. add other controller.. FC connection to host, going up down Feb 22 03:41:10 r710-c kernel: [91363.881899] lpfc 0000:07:00.0: 0:1305 Link Down Event x4 received Data: x4 x20 x800 00 x0 x0 Feb 22 03:41:40 r710-c kernel: [91394.172392] rport-2:0-0: blocked FC remote port time out: removing target and savi ng binding Feb 22 03:41:40 r710-c kernel: [91394.172553] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:14:f0:1f:af:e6 :c1:cf NPort x0000e1 Data: x0 x8 x0 Feb 22 03:41:48 r710-c kernel: [91402.624296] lpfc 0000:07:00.0: 0:1303 Link Up Event x5 received Data: x5 x1 x10 x2 x0 x0 0 Feb 22 03:41:48 r710-c kernel: [91402.624304] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 03:41:48 r710-c kernel: [91402.633358] scsi 2:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 A NSI: 5 Feb 22 03:41:48 r710-c kernel: [91402.634504] scsi 2:0:0:31: Attached scsi generic sg0 type 0 Feb 22 03:42:02 r710-c kernel: [91415.904373] lpfc 0000:07:00.0: 0:1305 Link Down Event x6 received Data: x6 x20 x800 00 x0 x0 Feb 22 03:42:02 r710-c kernel: [91415.936571] lpfc 0000:07:00.0: 0:1303 Link Up Event x7 received Data: x7 x1 x10 x2 x0 x0 0 Feb 22 03:42:02 r710-c kernel: [91415.936578] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 03:42:56 r710-c kernel: [91469.742779] lpfc 0000:07:00.0: 0:1305 Link Down Event x8 received Data: x8 x20 x800 00 x0 x0 Feb 22 03:43:26 r710-c kernel: [91500.665121] rport-2:0-0: blocked FC remote port time out: removing target and savi ng binding Feb 22 03:43:26 r710-c kernel: [91500.665304] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:14:f0:1f:af:e6 :c1:cf NPort x0000e1 Data: x0 x8 x0 Feb 22 03:43:34 r710-c kernel: [91508.508652] lpfc 0000:07:00.0: 0:1303 Link Up Event x9 received Data: x9 x1 x10 x2 x0 x0 0 Feb 22 03:43:34 r710-c kernel: [91508.508659] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 03:43:34 r710-c kernel: [91508.517684] scsi 2:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 A NSI: 5 Feb 22 03:43:34 r710-c kernel: [91508.518820] scsi 2:0:0:31: Attached scsi generic sg0 type 0 Feb 22 03:47:00 r710-c boinc[61107]: message repeated 1783 times: [ No protocol specified] Feb 22 03:47:01 r710-c CRON[82511]: (root) CMD ( test -x /etc/cron.daily/popularity-contest && /etc/cron.daily/popu larity-contest --crond) Feb 22 03:47:01 r710-c boinc[61107]: No protocol specified Feb 22 03:47:04 r710-c kernel: [91717.924342] ata1: SATA link down (SStatus 0 SControl 300) Feb 22 03:47:04 r710-c kernel: [91718.324277] ata1: SATA link down (SStatus 0 SControl 300) Feb 22 03:47:06 r710-c kernel: [91720.341566] ata2: failed to resume link (SControl 0) Feb 22 03:47:06 r710-c kernel: [91720.352346] ata2: SATA link down (SStatus 4 SControl 0) Feb 22 03:47:07 r710-c kernel: [91721.449505] ata2: failed to resume link (SControl 0) Feb 22 03:47:07 r710-c kernel: [91721.460274] ata2: SATA link down (SStatus 4 SControl 0) Feb 22 03:48:54 r710-c kernel: [91827.923638] lpfc 0000:07:00.0: 0:1305 Link Down Event xa received Data: xa x20 x800 00 x0 x0 Feb 22 03:49:25 r710-c kernel: [91859.052945] rport-2:0-0: blocked FC remote port time out: removing target and savi ng binding Feb 22 03:49:25 r710-c kernel: [91859.053124] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:14:f0:1f:af:e6 :c1:cf NPort x0000e1 Data: x0 x8 x0 Feb 22 03:49:56 r710-c kernel: [91890.538032] lpfc 0000:07:00.0: 0:1303 Link Up Event xb received Data: xb x1 x10 x2 x0 x0 0 Feb 22 03:49:56 r710-c kernel: [91890.538048] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 03:49:56 r710-c kernel: [91890.547337] scsi 2:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 A NSI: 5 Feb 22 03:49:56 r710-c kernel: [91890.548568] scsi 2:0:0:31: Attached scsi generic sg0 type 0 Feb 22 03:56:52 r710-c kernel: [92306.504281] ata1: SATA link down (SStatus 0 SControl 300) Feb 22 03:56:53 r710-c kernel: [92306.888292] ata1: SATA link down (SStatus 0 SControl 300) Feb 22 03:56:55 r710-c kernel: [92308.821535] ata2: failed to resume link (SControl 0) Feb 22 03:56:55 r710-c kernel: [92308.832306] ata2: SATA link down (SStatus 4 SControl 0) Feb 22 03:56:56 r710-c kernel: [92309.937537] ata2: failed to resume link (SControl 0) Feb 22 03:56:56 r710-c kernel: [92309.948299] ata2: SATA link down (SStatus 4 SControl 0) Feb 22 03:57:36 r710-c kernel: [92350.333652] lpfc 0000:07:00.0: 0:1305 Link Down Event xc received Data: xc x20 x800 00 x0 x0 Feb 22 03:58:07 r710-c kernel: [92381.275162] rport-2:0-0: blocked FC remote port time out: removing target and savi ng binding Feb 22 03:58:07 r710-c kernel: [92381.275328] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:14:f0:1f:af:e6 :c1:cf NPort x0000e1 Data: x0 x8 x0 Feb 22 03:58:56 r710-c kernel: [92429.799363] lpfc 0000:07:00.0: 0:1303 Link Up Event xd received Data: xd x1 x10 x2 x0 x0 0 Feb 22 03:58:56 r710-c kernel: [92429.799376] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 03:58:56 r710-c kernel: [92429.809302] scsi 2:0:1:31: Direct-Access DELL Universal Xport 0784 PQ: 0 A NSI: 5 Feb 22 03:58:56 r710-c kernel: [92429.810089] scsi 2:0:1:31: Attached scsi generic sg0 type 0 Feb 22 03:59:13 r710-c kernel: [92446.822200] lpfc 0000:07:00.0: 0:1305 Link Down Event xe received Data: xe x20 x800 00 x0 x0 Feb 22 03:59:43 r710-c kernel: [92477.527770] rport-2:0-1: blocked FC remote port time out: removing target and savi ng binding Feb 22 03:59:43 r710-c kernel: [92477.527945] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:16:f0:1f:af:e6 :c1:cf NPort x0000e1 Data: x0 x8 x0 Feb 22 03:59:46 r710-c kernel: [92479.829976] ata1: SATA link down (SStatus 0 SControl 300) Feb 22 03:59:46 r710-c kernel: [92480.206404] ata1: SATA link down (SStatus 0 SControl 300) Feb 22 03:59:48 r710-c kernel: [92482.103593] ata2: failed to resume link (SControl 0) Feb 22 03:59:48 r710-c kernel: [92482.114373] ata2: SATA link down (SStatus 4 SControl 0) Feb 22 03:59:49 r710-c kernel: [92483.219541] ata2: failed to resume link (SControl 0) Feb 22 03:59:49 r710-c kernel: [92483.230308] ata2: SATA link down (SStatus 4 SControl 0) Feb 22 04:08:26 r710-c kernel: [93000.521256] lpfc 0000:07:00.0: 0:1303 Link Up Event xf received Data: xf x1 x10 x2 x0 x0 0 Feb 22 04:08:26 r710-c kernel: [93000.521268] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 04:08:26 r710-c kernel: [93000.530043] scsi 2:0:1:31: Direct-Access DELL Universal Xport 0784 PQ: 0 A NSI: 5 Feb 22 04:08:26 r710-c kernel: [93000.530919] scsi 2:0:1:31: Attached scsi generic sg0 type 0 Feb 22 04:08:36 r710-c kernel: [93009.697931] lpfc 0000:07:00.0: 0:1305 Link Down Event x10 received Data: x10 x20 x8 0000 x0 x0 Feb 22 04:09:07 r710-c kernel: [93040.708429] rport-2:0-1: blocked FC remote port time out: removing target and savi ng binding Feb 22 04:09:07 r710-c kernel: [93040.708590] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:16:f0:1f:af:e6 :c1:cf NPort x000002 Data: x0 x8 x0 Feb 22 04:09:16 r710-c kernel: [93049.945080] lpfc 0000:07:00.0: 0:1303 Link Up Event x11 received Data: x11 x1 x10 x 2 x0 x0 0 Feb 22 04:09:16 r710-c kernel: [93049.945092] lpfc 0000:07:00.0: 0:1309 Link Up Event npiv not supported in loop topo logy Feb 22 04:09:16 r710-c kernel: [93049.954400] scsi 2:0:1:31: Direct-Access DELL Universal Xport 0784 PQ: 0 A NSI: 5 Feb 22 04:09:16 r710-c kernel: [93049.955256] scsi 2:0:1:31: Attached scsi generic sg0 type 0 Feb 22 04:15:49 r710-c kernel: [93442.622849] lpfc 0000:07:00.0: 0:1305 Link Down Event x12 received Data: x12 x20 x8 0000 x0 x0 Feb 22 04:16:19 r710-c kernel: [93472.821524] rport-2:0-1: blocked FC remote port time out: removing target and savi ng binding Feb 22 04:16:19 r710-c kernel: [93472.821698] lpfc 0000:07:00.0: 0:(0):0203 Devloss timeout on WWPN 20:16:f0:1f:af:e6 :c1:cf NPort x0000e1 Data: x0 x8 x0 Feb 22 04:16:54 r710-c boinc[61107]: message repeated 1776 times: [ No protocol specified] tiumeouts after plugging in other controller.. crashing? rebooting? https://downloads.dell.com/manuals/common/powervault-md3600f_owner%27s%20manual_en-us.pdf Got the array, one controller not working, IP conflicts.. unresponsive lockdown mode DB wrong version than expected controller removed try using SAS module, can't mix recovery guru issues, only detect one IP at a time UPGRADE? refuses.. try older, no, bridge FW.. no.. didn;t think to take screenshots.. got it "working" with one contoller, via FC something about SD card recovery. opened lids, imaged both SD cards, reflashed broken unit SD, still failes DB issues. https://blog.workinghardinit.work/2013/11/28/upgrading-the-dell-md3600f-controller-firmware-using-the-modular-disk-storage-manager/ AMW? clear recovery? can't access that. CLI? https://www.dell.com/support/manuals/en-ca/powervault-md3420/34xx_38xx_cli_pub/download-storagearray-firmwarenvsram?guid=guid-a993c52e-f583-4d69-bf87-415d75b6124a&lang=en-us find semicolon required. refuses to send firmware Worked out serial pinout (TX/RX), PS/2 hacked mouse cable, meter, voltage ~3V doesn't work, use pin connection, one at a time, find the TX pin work out baud rate, 115k.. IconSendInfeasibleException Find pages like https://community.spiceworks.com/topic/1479240-powervalut-md-3000i-controller-reset "When you get the following error on a controller it has failed & there is no recovery of the controller." https://www.dell.com/community/DELL-EMC-Storage-Forum/reset-setting-md-3000i/td-p/3691284 Anoher link: https://www.dell.com/community/DELL-EMC-Storage-Forum/reset-setting-md-3000i/td-p/3691284 includes: http://lists.us.dell.com/pipermail/linux-poweredge/2009-January/038389.html Connect the serial cable to a terminal emulator of your choice and set to 115200,8,N,1 with flow control = none. Power on the MD3000i and press "CTRL B" when it starts to boot. Type 10 and hit enter for "Serial Interface Mode Menu" Type 1 and hit enter for "Console Only" Type Q and hit enter to exit Type R and hit enter to reboot. Watch the boot sequence and when you see "sodMain complete", hit enter and type "sysWipeZero 1" and hit enter. The system will reset and go back to default settings. didn't help.. https://community.spiceworks.com/topic/1783884-dell-md3000i-array-issue https://www.ibm.com/mysupport/s/question/0D50z00006LL1LoCAL/ds5020-boot-loop?language=en_US I was able to resolve the issue by upgrading the firmware on the device. The following post outlines how to update firmware through a serial connection. https://ibmsf.force.com/mysupport/s/question/0D50z00006LL03PCAT describes my problem: "I have a DS5020 with 2 controllers. I have tried to leave 1 controller out of the chassis and work with a single controller however both present the same issue. I have done a lemClearLockdown which then reboots the controller but then it goes into the same boot loop until the maximum. I am just trying to do a sysWipe and get the system back to a fresh state. I dont care about any data housed on it." I also had to made my own Serial to PS/2 cable to connect. Below are the pin outs for the cable used: PS/2 <> COM 1 <> 3 2 <> 2 3 <> 5 4 <> 4 5 <> 5 wierd PS2 pinout.. https://community.spiceworks.com/topic/2127714-dell-md3200i-4-port-iscsi-controller-won-t-login-via-cli Based on the serial capture that you posted your MD3200i doesn’t have a filesystem configuration on it. Now you also stated that you are trying to reset this with no drives in the system. You must have 4 drives minimum in your MD3200i, in slots 0-3. If you are looking at your MD3200i from the front slots 0-3 are the first 4 slots on the left side of your chassis. download firmware "PowerVault_MD32_MD36_Series_Firmware_08_20_24_60" refused to install any method. Serial access https://www.reddit.com/r/homelab/comments/9vgkjh/completely_locked_out_of_md3000_even_in_the/ break, escape. login\ need the creds.. "MD3000 VXworks login" https://www.dell.com/community/PowerVault/PowerVault-MD3200-VxWorks-ShellUsr-Login/td-p/4466992/page/2 https://www.reddit.com/r/homelab/comments/9vgkjh/completely_locked_out_of_md3000_even_in_the/ https://www.orangecomputers.com/node/?command=kb&docid=30 shellUsr and DF4m/2> 2 permanant. xmodem select files, transfer.. MD3600f_MD3620f_Firmware_Package_07_84_47_60. 100% 33MB 78.7MB/s 00:00 MD3600f_MD3620f_Firmware_Package_07_84_53_60. 100% 33MB 33.6MB/s 00:00 MD3600f_MD3620f_Firmware_Package_08_20_24_60. 100% 35MB 95.3MB/s 00:00 MD3600f_MD3620f_NVSRAM_N26X0-784890-904.dlp 100% 27KB 18.5MB/s 00:00 MD3600f_MD3620f_NVSRAM_N26X0_820890_908.dlp 100% 30KB 17.0MB/s 00:00 enabled verbose boot looked better reboot scrollback too short.. (change to unlimited.. still lost) 02/26/22-05:49:08 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=2 02/26/22-05:49:08 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=3 02/26/22-05:49:08 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=4 02/26/22-05:49:08 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=5 02/26/22-05:49:08 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=6 02/26/22-05:49:08 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=7 02/26/22-05:49:09 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc:0 chann el:0 numEntries:22 02/26/22-05:49:09 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.10 02/26/22-05:49:09 (tSasExpChk): NOTE: Download Expander firmware image 02/26/22-05:49:17 (tSasExpChk): NOTE: Configuration of expander complete. This controller's expander will reboot.. . 02/26/22-05:49:17 (tSasExpChk): NOTE: sasnBaseboardExpanderManage: exp firmware and/or config pages was successfull y updated, channel:0 returnCode:40 02/26/22-05:49:23 (IOSched): WARN: SAS Expander Removed: ch:0 expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc: 0 numEntries:13 02/26/22-05:49:28 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc:0 chann el:0 numEntries:22 02/26/22-05:49:29 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.16 02/26/22-05:49:29 (tSasExpChk): NOTE: Enabling this controller's expander phys, channel=0 02/26/22-05:49:29 (tSasDiscCom): NOTE: SAS: Initial Discovery Complete Time: 56 seconds since last power on/reset, 23 seconds since sas instantiated 02/26/22-05:49:29 (tRAID): NOTE: CrushMemoryPoolMgr: platform and memory (CPU MemSize:2048), adjusted allocating si ze to 262144 for CStripes 02/26/22-05:49:29 (sasEnPhys): NOTE: sasEnableDrivePhysTask: 12 phys enabled on channel 0 02/26/22-05:49:34 (tRAID): SOD: Instantiation Phase Complete 02/26/22-05:49:30 (tRAID): WARN: No attempt made to open Inter-Controller Communication Channels 0-0 02/26/22-05:49:30 (tRAID): NOTE: LockMgr Role is Master 02/26/22-05:49:30 (tRAID): NOTE: WWN baseName 0004f01f-afe6c1e0 (valid==>SigMatch) 02/26/22-05:49:30 (tRAID): NOTE: spmEarlyData: No data available 02/26/22-05:49:35 (tRAID): SOD: Pre-Initialization Phase Complete 02/26/22-05:49:31 (tRAID): WARN: BID: Signature are invalid, initializing Eeprom region 02/26/22-05:49:32 (utlTimer1): NOTE: fcnChannelReport ==> -2 -3 -4 -5 02/26/22-05:49:37 (utlTimer1): NOTE: fcnChannelReport ==> =2 =3 =4 =5 SubClass ID 80: Gas Gauging - IT Cfg Before: OCV Wait AftChg 68[ 2] = 17280(sec/10) After: OCV Wait AftChg 68[ 2] = 2880(sec/10) SubClass ID 80: Gas Gauging - IT Cfg Before: Qinv Min Temp 33[ 2] = 100(0.1 C) After: Qinv Min Temp 33[ 2] = 990(0.1 C) SubClass ID 80: Gas Gauging - IT Cfg Before: Qinv Max Temp LSB 32[ 1] = 0x90 After: Qinv Max Temp LSB 32[ 1] = 0xe8 SubClass ID 80: Gas Gauging - IT Cfg Before: Qinv Max Temp MSB 31[ 1] = 0x01 After: Qinv Max Temp MSB 31[ 1] = 0x03 02/26/22-05:49:44 (tRAID): WARN: ACS: testCommunication: Domi Exception caught: IconSendInfeasibleException Error 02/26/22-05:49:44 (tRAID): NOTE: ACS: autoCodeSync(): Process start. Comm Status: 0 02/26/22-05:49:44 (tRAID): WARN: ACS: autoCodeSync(): Skipped since alt not communicating. 02/26/22-05:49:44 (tRAID): NOTE: iocfi: Peering Disabled (Alt Unavailable) 02/26/22-05:49:49 (tRAID): SOD: Code Synchronization Initialization Phase Complete 02/26/22-05:49:45 (NvpsPersistentSyncM): NOTE: NVSRAM Persistent Storage updated successfully 02/26/22-05:49:45 (tRAID): NOTE: SAFE: Process new features 02/26/22-05:49:45 (tRAID): NOTE: SAFE: Process legacy features 02/26/22-05:49:46 (tRAID): NOTE: PSTOR: Pstor bootstrap detected mismatch in pstor size b/w previous and current so d 02/26/22-05:49:46 (tRAID): NOTE: dirty data flag = 0x0, backup status = 0x0 02/26/22-05:49:46 (tRAID): NOTE: Logical device migration for cache downgrade 02/26/22-05:49:46 (bdbmBGTask): ERROR: sendFreezeStateToAlternate: Exception IconSendInfeasibleException Error 02/26/22-05:49:46 (tRAID): NOTE: SPM acquireObjects exception: IconSendInfeasibleException Error 02/26/22-05:49:46 (tRAID): NOTE: sas: Peering Disabled (Alt unavailable) 02/26/22-05:49:46 (tRAID): NOTE: fcn: Peering Disabled (Alt Unavailable) 02/26/22-05:49:46 (tRAID): NOTE: ion: Peering Disabled (Alt Unavailable) 02/26/22-05:49:47 (tRAID): WARN: Unable to intialize mirror device 02/26/22-05:49:47 (tRAID): NOTE: CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff 02/26/22-05:49:47 (tRAID): WARN: isCacheStoreMismatch: 1, Mirroring: 0, CacheStartAddr Mismatch: 1,CacheSize Mismat ch: 1 02/26/22-05:49:47 (tRAID): WARN: CCM: validateCacheMem() CacheStore mismatch 02/26/22-05:49:47 (tRAID): NOTE: CCM: validateCacheMem() cache memory is invalid 02/26/22-05:49:47 (tRAID): NOTE: CCM: validateCacheMem() Initializing my partition 02/26/22-05:49:47 (tRAID): WARN: CCM: initialize() - exchangeRestoreStatusAlt caught IconSendInfeasibleException Er ror 02/26/22-05:49:47 (tRAID): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStore ToPstor() caught IconSendInfeasibleException Error 02/26/22-05:49:47 (tRAID): NOTE: CCM: sodClearMOSIntentsAlt(), failure clearing MOS intents on alt 02/26/22-05:49:47 (tRAID): WARN: Unable to intialize mirror device 02/26/22-05:49:47 (tRAID): NOTE: CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff 02/26/22-05:49:48 (tRAID): NOTE: doRecovery: myMemory:1, IORCB:0, NEW:0, OLD:0 02/26/22-05:49:48 (tRAID): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStore ToPstor() caught IconSendInfeasibleException Error 02/26/22-05:49:48 (tRAID): NOTE: Writing persistent reservation manager header 02/26/22-05:49:49 (IWTask): NOTE: UWMgr: IW log drive 0x10000 created 02/26/22-05:49:49 (IWTask): NOTE: UWMgr: IW log drive 0x10001 created 02/26/22-05:49:49 (tRAID): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStore ToPstor() caught IconSendInfeasibleException Error 02/26/22-05:49:59 (tRAID): WARN: arvm::AsyncMirrorManager::initialize caught IconSendInfeasibleException Error 02/26/22-05:50:00 (tRAID): NOTE: Exception caught sending size to alt FCM initialize IconSendInfeasibleException Er ror 02/26/22-05:50:00 (tRAID): NOTE: DiagVolManager::initialize: Exception - Alt controller not ready 02/26/22-05:50:01 (tRAID): NOTE: MIB creation complete - size 15 02/26/22-05:50:05 (tRAID): SOD: Initialization Phase Complete ============================================== Title: Disk Array Controller Copyright 2008-2017 NetApp, Inc. All Rights Reserved. Name: RC Version: 08.20.24.60 Date: 04/13/2017 Time: 18:11:47 CDT Models: 2660 Manager: devmgr.v1120api13.Manager ============================================== 02/26/22-05:50:05 (tRAID): sodMain Normal sequence finished, elapsed time = 58 seconds 02/26/22-05:50:05 (tRAID): sodMain complete 02/26/22-05:50:02 (ProcessHandlers): WARN: CCM: backupStorageAvailable() caught IconSendInfeasibleException Error 02/26/22-05:50:02 (ProcessHandlers): NOTE: vdm::CrushTaskCoordinationManager::handleEvent(1) - Exception N4domi27Ic onSendInfeasibleExceptionE - IconSendInfeasibleException Error 02/26/22-05:50:02 (ProcessHandlers): NOTE: vdm::CrushTaskCoordinationManager::handleEvent(2) - Exception N4domi27Ic onSendInfeasibleExceptionE - IconSendInfeasibleException Error 02/26/22-05:50:02 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:50:02 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:50:02 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:50:02 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:50:02 (ProcessHandlers): NOTE: SYMbol available 02/26/22-05:50:02 (dsmUpgrade): NOTE: DSM: Async upgrade completed (0 drives, 0 requested) 02/26/22-05:50:10 (ProcessHandlers): SOD: sodComplete Notification Complete 02/26/22-05:50:07 (acsDriveImage): NOTE: ACS drv image write start. Drv:0xa0010000,Ver:0x8202460,Reas:3 02/26/22-05:50:10 (bdbmSync): NOTE: OBB Synchronization: Successful 02/26/22-05:50:31 (utlTimer1): WARN: Extended Link Down Timeout on channel 2 02/26/22-05:50:31 (utlTimer1): WARN: Extended Link Down Timeout on channel 3 02/26/22-05:50:31 (utlTimer1): WARN: Extended Link Down Timeout on channel 4 02/26/22-05:50:31 (utlTimer1): WARN: Extended Link Down Timeout on channel 5 02/26/22-05:50:37 (acsDriveImage): NOTE: ACS drv image write complete. 02/26/22-05:55:04 (utlTimer1): NOTE: sasDEClearRecoveryState: -=<###>=- -=<###>=- Instantiating /ram as rawFs, device = 0x1 Formatting /ram for DOSFS Instantiating /ram as rawFs, device = 0x1 Formatting...Retrieved old volume params with %38 confidence: Volume Parameters: FAT type: FAT32, sectors per cluster 0 0 FAT copies, 0 clusters, 0 sectors per FAT Sectors reserved 0, hidden 0, FAT sectors 0 Root dir entries 0, sysId (null) , serial number 10000 Label:" " ... Disk with 1024 sectors of 512 bytes will be formatted with: Volume Parameters: FAT type: FAT12, sectors per cluster 1 2 FAT copies, 1010 clusters, 3 sectors per FAT Sectors reserved 1, hidden 0, FAT sectors 6 Root dir entries 112, sysId VXDOS12 , serial number 10000 Label:" " ... Instantiating /ram as rawFs, device = 0x1 OK. RTC Error: Real-time clock device is not working Adding 13888 symbols for standalone. Length: 0x13c Bytes Version ver03.0A Reset, Power-Up Diagnostics - Loop 1 of 1 3600 Processor DRAM 01 Data lines Passed 02 Address lines Passed 3300 NVSRAM 01 Data lines Passed 4410 Ethernet 82574 1 01 Register read Passed 02 Register address lines Passed 6D40 Bobcat 02 Flash Test Passed 3700 PLB SRAM 01 Data lines Passed 02 Address lines Passed 65D0 Host Channel 1--Tachyon QE8 01 TachLite Register Test Passed 65D1 Host Channel 2--Tachyon QE8 01 TachLite Register Test Passed 65D2 Host Channel 3--Tachyon QE8 01 TachLite Register Test Passed 65D3 Host Channel 4--Tachyon QE8 01 TachLite Register Test Passed 3900 Real-Time Clock 01 RT Clock Tick Passed Diagnostic Manager exited normally. Current date: 06/11/09 time: 15:21:39 Send for Service Interface or baud rate change 02/26/22-04:11:07 (tRAID): NOTE: Set Powerup State 02/26/22-04:11:07 (tRAID): SOD Sequence is Normal, 0 on controller B 02/26/22-04:11:08 (tRAID): NOTE: Turning on tray summary fault LED 02/26/22-04:11:08 (tRAID): NOTE: Installed Protocols: 02/26/22-04:11:08 (tRAID): NOTE: Required Protocols: 02/26/22-04:11:08 (tRAID): NOTE: loading flash file: Fibre 02/26/22-04:11:10 (tRAID): NOTE: DSM: Current revision 7 02/26/22-04:11:10 (tRAID): NOTE: SYMBOL: SYMbolAPI registered. 02/26/22-04:11:10 (tRAID): NOTE: RCBBitmapManager total RPA size = 1778384896 02/26/22-04:11:11 (tRAID): NOTE: init: ioc: 0, PLVersion: 11-075-20-00 02/26/22-04:11:11 (tRAID): WARN: MLM: Failed creating m_MelNvsramLock 02/26/22-04:11:11 (tRAID): WARN: MLM: Failed creating m_MelEventListLock 02/26/22-04:11:12 (tRAID): NOTE: CrushMemoryPoolMgr: platform and memory (CPU MemSize:2048), adjusted allocating si ze to 262144 for CStripes 02/26/22-04:11:13 (tRAID): SOD: Instantiation Phase Complete 02/26/22-04:11:12 (tRAID): WARN: sodLockdownCheck: sodSequence update: CtlrLockdown 02/26/22-04:11:12 (tRAID): WARN: No attempt made to open Inter-Controller Communication Channels 0-0 02/26/22-04:11:12 (tRAID): NOTE: LockMgr Role is Master 02/26/22-04:11:12 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc:0 chann el:1 numEntries:22 02/26/22-04:11:13 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.10 02/26/22-04:11:20 (tSasDiscCom): NOTE: SAS: Initial Discovery Complete Time: 36 seconds since last power on/reset, 10 seconds since sas instantiated 02/26/22-04:11:20 (tRAID): NOTE: WWN baseName 0004f01f-afe6c1e0 (valid==>SigMatch) 02/26/22-04:11:20 (tRAID): NOTE: spmEarlyData: No data available 02/26/22-04:11:21 (tRAID): SOD: Pre-Initialization Phase Complete 02/26/22-04:11:22 (utlTimer): NOTE: fcnChannelReport ==> -2 -3 -4 -5 02/26/22-04:11:26 (tRAID): NOTE: DB Adoption: DsmOK:1 MultDB:0 OneDbBadSqN:0 Allow:1 Adopted:1 02/26/22-04:11:27 (utlTimer): NOTE: fcnChannelReport ==> =2 =3 =4 =5 02/26/22-04:11:27 (tRAID): WARN: CmgrLockdownException: CORRUPT DBM DATABASE DETECTED - LOCKDOWN Type=6 02/26/22-04:11:27 (tRAID): NOTE: ACS: Icon ping to alternate failed: -2, resp: 0 02/26/22-04:11:27 (tRAID): NOTE: ACS: autoCodeSync(): Process start. Comm Mode: 0, Status: 0 02/26/22-04:11:27 (tRAID): WARN: ACS: autoCodeSync(): Skipped since alt not communicating. 02/26/22-04:11:28 (tRAID): SOD: Code Synchronization Initialization Phase Complete 02/26/22-04:11:27 (tRAID): WARN: Lockdown case-DB CORRUPT. NVPS did not update from Sstor. 02/26/22-04:11:28 (tRAID): SOD: Initialization Phase Complete ============================================== Title: Disk Array Controller Copyright 2008-2013 NetApp, Inc. All Rights Reserved. Name: RC Version: 07.84.47.60 Date: 05/28/2013 Time: 13:16:16 CDT Models: 2660 Manager: devmgr.v1084api04.Manager ============================================== 02/26/22-04:11:27 (tRAID): NOTE: SYMbol available 02/26/22-04:11:27 (tRAID): WARN: ********************************************************* 02/26/22-04:11:27 (tRAID): WARN: ********************************************************* 02/26/22-04:11:27 (tRAID): WARN: ** ** 02/26/22-04:11:27 (tRAID): WARN: ** WARNING!!!!!! ** 02/26/22-04:11:27 (tRAID): WARN: ** ** 02/26/22-04:11:27 (tRAID): WARN: ** This controller is locked down due to detected ** 02/26/22-04:11:27 (tRAID): WARN: ** corruption in the primary database. ** 02/26/22-04:11:27 (tRAID): WARN: ** In order to prevent any configuration loss, the ** 02/26/22-04:11:27 (tRAID): WARN: ** backed up database must be restored. ** 02/26/22-04:11:27 (tRAID): WARN: ** ** 02/26/22-04:11:27 (tRAID): WARN: ** ** 02/26/22-04:11:27 (tRAID): WARN: ********************************************************* 02/26/22-04:11:27 (tRAID): WARN: ********************************************************* 02/26/22-04:11:28 (tRAID): SOD: sodComplete Notification Complete 02/26/22-04:11:28 (tRAID): sodMain CtlrLockdown sequence finished, elapsed time = 21 seconds 02/26/22-04:11:28 (tRAID): sodMain complete 02/26/22-04:11:28 (tRAID): WARN: SOD lockdown sequence complete - suspending... Press within 5 seconds: for Service Interface, for baud rate VxWorks login: VxWorks login: shellUsr Password: -> Serial Port shell started. M NOTICE: The BOOT OPERATIONS MENU has been invoked too late for proper operation of some activities, including Isolation Diagnostics. You may wish to restart this controller again and press Control-B IMMEDIATELY after seeing the start-up indicator ("-=<###>=-"). BOOT OPERATIONS MENU 1) Perform Isolation Diagnostics 10) Serial Interface Mode Menu 2) Download Permanent File 11) Display Hardware Configuration 3) Reserved 12) Change Hardware Configuration Menu 4) Dump NVSRAM Group 13) Development Options Menu 5) Patch NVSRAM Group 14) Display Memory Error Log 6) Set Real Time Clock 15) Manufacturing Setup Menu 7) Display Board Configuration R) Restart Controller 8) Special Services Menu Q) Quit Menu 9) Display Exception Message Enter Selection: 2 XMODEM Serial Transfer initiated; receiving at 115200 baud. Please start the XMODEM send process now... | Data received DOWNLOAD TO FLASH Type Name Version Download Status ==== ============ ============ =============== Cfg MfgConfig 08.20.00.00 Complete: 100% -- Ok Download stream processing complete (no errors) BOOT OPERATIONS MENU 1) Perform Isolation Diagnostics 10) Serial Interface Mode Menu 2) Download Permanent File 11) Display Hardware Configuration 3) Reserved 12) Change Hardware Configuration Menu 4) Dump NVSRAM Group 13) Development Options Menu 5) Patch NVSRAM Group 14) Display Memory Error Log 6) Set Real Time Clock 15) Manufacturing Setup Menu 7) Display Board Configuration R) Restart Controller 8) Special Services Menu Q) Quit Menu 9) Display Exception Message Enter Selection: 2 XMODEM Serial Transfer initiated; receiving at 115200 baud. Please start the XMODEM send process now... Data received DOWNLOAD TO FLASH Type Name Version Download Status ==== ============ ============ =============== File PkgInfo 08.20.24.60 Complete: 100% -- Ok File System 08.20.24.60 Complete: 100% -- Ok File Diagnostics 08.20.24.60 Complete: 100% -- Ok Boot (null) (null) Installed Ok File RAID 08.20.24.60 Complete: 100% -- Ok File Debug 08.20.24.60 Complete: 100% -- Ok File MAPI 08.20.24.60 Complete: 100% -- Ok File TAPI 08.20.24.60 Complete: 100% -- Ok File FBDT 08.20.24.60 Complete: 100% -- Ok File DIAGBET10G 08.20.24.60 Complete: 100% -- Ok File DIAGBET1G 08.20.24.60 Complete: 100% -- Ok File ECTGenOEM 08.20.24.60 Complete: 100% -- Ok File LSISAS2x36 08.20.24.60 Complete: 100% -- Ok File ECTNetApp 08.20.24.60 Complete: 100% -- Ok File SAPI 08.20.24.60 Complete: 100% -- Ok File LSIFalB0nvD 08.20.24.60 Complete: 100% -- Ok File ECTDell 08.20.24.60 Complete: 100% -- Ok File LSIFalB0nvH 08.20.24.60 Complete: 100% -- Ok File ECTTeraData 08.20.24.60 Complete: 100% -- Ok File LSIFalNvH1 08.20.24.60 Complete: 100% -- Ok File LSIFalNvH2 08.20.24.60 Complete: 100% -- Ok File LSISAS2IOC 08.20.24.60 Complete: 100% -- Ok File RAID1 08.20.24.60 Complete: 100% -- Ok File SEBE2T1G 08.20.24.60 Complete: 100% -- Ok File SEBE2T10G 08.20.24.60 Complete: 100% -- Ok File SolarFlare 08.20.24.60 Complete: 100% -- Ok File SolarFlare1 08.20.24.60 Complete: 100% -- Ok File MTLQLG 08.20.24.60 Complete: 100% -- Ok File qpge 08.20.24.60 Complete: 100% -- Ok File Fibre 08.20.24.60 Complete: 100% -- Ok File MTLSEBET1G 08.20.24.60 Complete: 100% -- Ok File Rhone13 08.20.24.60 Complete: 100% -- Ok File Rhone16 08.20.24.60 Complete: 100% -- Ok File Rhone17 08.20.24.60 Complete: 100% -- Ok File MTLSEBET10G 08.20.24.60 Complete: 100% -- Ok File iSCSI 08.20.24.60 Complete: 100% -- Ok File spyIscsi 08.20.24.60 Complete: 100% -- Ok File SPY 08.20.24.60 Complete: 100% -- Ok Download stream processing complete (no errors) BOOT OPERATIONS MENU 1) Perform Isolation Diagnostics 10) Serial Interface Mode Menu 2) Download Permanent File 11) Display Hardware Configuration 3) Reserved 12) Change Hardware Configuration Menu 4) Dump NVSRAM Group 13) Development Options Menu 5) Patch NVSRAM Group 14) Display Memory Error Log 6) Set Real Time Clock 15) Manufacturing Setup Menu 7) Display Board Configuration R) Restart Controller 8) Special Services Menu Q) Quit Menu 9) Display Exception Message Enter Selection: 13 DEVELOPMENT OPTIONS MENU ------------- Option Modes --------------- 1) CPU memory caching = enabled 2) Memory fault detection = disabled 3) General Development mode = disabled 4) Boot/Kernel Debug mode = disabled 5) Boot/Kernel Verbose mode = disabled 6) Reserved = disabled 7) Reserved = disabled 8) Reserved = disabled 9) Reserved = disabled 10) Reserved = disabled 11) Reserved = disabled 12) Debug Monitor on reboot = disabled 13) Debug Monitor on exception = disabled 14) Debug Monitor on watchdog NMI = disabled ------------- Actions ------------- E) Edit Application Development Script D) Boot Debug Menu R) Reset Development Option Modes Q) Quit Menu Enter Selection: 5 DEVELOPMENT OPTIONS MENU ------------- Option Modes --------------- 1) CPU memory caching = enabled 2) Memory fault detection = disabled 3) General Development mode = disabled 4) Boot/Kernel Debug mode = disabled 5) Boot/Kernel Verbose mode = enabled 6) Reserved = disabled 7) Reserved = disabled 8) Reserved = disabled 9) Reserved = disabled 10) Reserved = disabled 11) Reserved = disabled 12) Debug Monitor on reboot = disabled 13) Debug Monitor on exception = disabled 14) Debug Monitor on watchdog NMI = disabled ------------- Actions ------------- E) Edit Application Development Script D) Boot Debug Menu R) Reset Development Option Modes Q) Quit Menu Enter Selection: q BOOT OPERATIONS MENU 1) Perform Isolation Diagnostics 10) Serial Interface Mode Menu 2) Download Permanent File 11) Display Hardware Configuration 3) Reserved 12) Change Hardware Configuration Menu 4) Dump NVSRAM Group 13) Development Options Menu 5) Patch NVSRAM Group 14) Display Memory Error Log 6) Set Real Time Clock 15) Manufacturing Setup Menu 7) Display Board Configuration R) Restart Controller 8) Special Services Menu Q) Quit Menu 9) Display Exception Message Enter Selection: r -=<###>=- Instantiating /ram as rawFs, device = 0x1 Formatting /ram for DOSFS Instantiating /ram as rawFs, device = 0x1 Formatting...Retrieved old volume params with %38 confidence: Volume Parameters: FAT type: FAT32, sectors per cluster 0 0 FAT copies, 0 clusters, 0 sectors per FAT Sectors reserved 0, hidden 0, FAT sectors 0 Root dir entries 0, sysId (null) , serial number f10000 Label:" " ... Disk with 1024 sectors of 512 bytes will be formatted with: Volume Parameters: FAT type: FAT12, sectors per cluster 1 2 FAT copies, 1010 clusters, 3 sectors per FAT Sectors reserved 1, hidden 0, FAT sectors 6 Root dir entries 112, sysId VXDOS12 , serial number f10000 Label:" " ... RTC Error: Real-time clock device is not working OK. Adding 14607 symbols for standalone. NVBLD: Parsing Manufactured Hardware Configuration config file section (len=0) NVBLD: Parsing Model Baseline Configuration config file section (len=1) NVBLD: Parsing Board Identification config file section (len=1) NVBLD: Parsing System Configuration config file section (len=2) NVBLD: Parsing OEM Configuration config file section (len=2) System startup messages: ======================== NOTE 83.20: Mfg Config file converted to system configuration file NOTE 83.20: Mfg Config file converted to system configuration file NOTE 88.30: Flash-based Config File has been modified WARN 88.37: NVSRAM group "Board" rebuilt WARN 88.37: NVSRAM group "NonCfg" rebuilt WARN 88.37: NVSRAM group "DrvFault" rebuilt WARN 88.37: NVSRAM group "InfCfg" rebuilt WARN 88.37: NVSRAM group "H0UsrCfg" rebuilt WARN 88.37: NVSRAM group "H1UsrCfg" rebuilt WARN 88.37: NVSRAM group "H3UsrCfg" rebuilt WARN 88.37: NVSRAM group "H4UsrCfg" rebuilt WARN 88.37: NVSRAM group "H5UsrCfg" rebuilt WARN 88.37: NVSRAM group "H6UsrCfg" rebuilt WARN 88.37: NVSRAM group "H7UsrCfg" rebuilt WARN 88.37: NVSRAM group "H8UsrCfg" rebuilt WARN 88.37: NVSRAM group "H9UsrCfg" rebuilt WARN 88.37: NVSRAM group "H10UrCfg" rebuilt WARN 88.37: NVSRAM group "H11UrCfg" rebuilt WARN 88.37: NVSRAM group "H12UrCfg" rebuilt WARN 88.37: NVSRAM group "H13UrCfg" rebuilt WARN 88.37: NVSRAM group "H14UrCfg" rebuilt WARN 88.37: NVSRAM group "H15UrCfg" rebuilt WARN 88.37: NVSRAM group "DrvExpMg" rebuilt WARN 88.37: NVSRAM group "H16UrCfg" rebuilt WARN 88.37: NVSRAM group "H17UrCfg" rebuilt WARN 88.37: NVSRAM group "H18UrCfg" rebuilt WARN 88.37: NVSRAM group "H19UrCfg" rebuilt WARN 88.37: NVSRAM group "H20UrCfg" rebuilt WARN 88.37: NVSRAM group "H21UrCfg" rebuilt WARN 88.37: NVSRAM group "H22UrCfg" rebuilt WARN 88.37: NVSRAM group "H23UrCfg" rebuilt WARN 88.37: NVSRAM group "H24UrCfg" rebuilt WARN 88.37: NVSRAM group "H25UrCfg" rebuilt WARN 88.37: NVSRAM group "H26UrCfg" rebuilt WARN 88.37: NVSRAM group "H27UrCfg" rebuilt WARN 88.37: NVSRAM group "H28UrCfg" rebuilt WARN 88.37: NVSRAM group "H29UrCfg" rebuilt WARN 88.37: NVSRAM group "H30UrCfg" rebuilt WARN 88.37: NVSRAM group "H31UrCfg" rebuilt WARN 88.37: NVSRAM group "UserCfg2" rebuilt WARN 88.37: NVSRAM group "ExNetCfg" rebuilt WARN 88.37: NVSRAM group "RbotInfo" rebuilt WARN 88.37: NVSRAM group "InfCfg2" rebuilt NOTE 88.30: NVSRAM configuration updated from flash-based Config File NOTE 88.30: NVSRAM configuration updated from flash-based Config File 02/26/22-05:20:55 (tNetCfgInit): NOTE: eth0: LinkUp event 02/26/22-05:20:55 (tNetCfgInit): NOTE: Acquiring network parameters for interface gei0 using DHCP 02/26/22-05:20:57 (ipdhcpc): NOTE: netCfgDhcpReplyCallback :: received OFFER on interface gei0, unit 0 02/26/22-05:20:58 (ipdhcpc): NOTE: DHCP server: 192.168.13.1 02/26/22-05:20:58 (ipdhcpc): WARN: **WARNING** The DHCP Server did not assign a permanent IP for gei0. 02/26/22-05:20:58 (ipdhcpc): WARN: Network access to this controller may eventually fail. 02/26/22-05:20:58 (ipdhcpc): NOTE: Client IP routers: 192.168.13.1 02/26/22-05:20:58 (ipdhcpc): NOTE: Client DNS name servers: 192.168.13.1 02/26/22-05:20:58 (ipdhcpc): NOTE: DNS domain name: squigley.net 02/26/22-05:20:58 (ipdhcpc): NOTE: Assigned IP address: 192.168.13.181 02/26/22-05:20:58 (ipdhcpc): NOTE: Assigned subnet mask: 255.255.255.0 02/26/22-05:20:58 (tNetReset): NOTE: Network Ready Current date: 02/26/22 time: 05:21:19 Send for Service Interface or baud rate change 02/26/22-05:21:19 (tRAID): SOD Sequence is Normal, 0 on controller B Virtual User Services (VUS) Configuration ========================================= vusPlaceCCBInRPA = 0 vusPlaceMdatFSBsInRPA = 0 vusGetMaxRVMConsistencyGroups = 1 vusGetMaxDrives = 35 vusGetMaxPieces = 30 vusGetMaxLargeIOStruct = 30 vusGetMaxBufStruct = 40000 vusGetMaxIovStruct = 10000 vusGetMaxJobStruct = 750 vusGetControllerQueueDepth = 2048 vusGetNumIOClearingResources = 18 vusGetPortQueueDepth = 2048 vusGetForcedXferRepairCnt = 4096 vusGetMaxLargeIOBuffer = 3 vusGetMaxLargeIOSize(512) = 4096 vusGetMaxLargeIOSize(4096) = 512 vusGetMaxDiskIOSize = 1024 vusGetMaxBackgroundIOSize(512) = 4096 vusGetMaxBackgroundIOSize(4096 = 512 vusLimitCacheSize = 0 vusGetMaxDiskBlockSize = 512 vusGetMaxDmovRequest = 500 vusGetMaxDmovSegment = 500 vusBatterySupported = 1 vusEDC3Supported = 0 vusTestIncreasedDriveQDepth = 0 vusGetFwFeatures = 6 vusPlaceFcSestInRPA = 0 vusPlaceFcSGListInRPA = 1 vusGetMaxFcSGBufSize = 524284 vusGetMaxFcSGOddSize = 4 vusGetMaxFcDstSGBufSize = 524284 vusGetMaxFcEDCAlignSGSize = 0 vusGetDriveQDepthMgmtMethod = 1 vusNoParityCheck = 0 vusGetMaxSnapshotIOs = 400 vusGetMaxSnapDirtyDirBlks = 131072 vusGetMaxConcurrentCopies = 8 vusGetMaxWriteSameIOSize = 524288 vusGetMaxCompareAndWriteIOSize = 1 vusGetMaxEnabledCores = 1 02/26/22-05:21:16 (tRAID): NOTE: loading flash file: ECTDell 02/26/22-05:21:16 (tRAID): NOTE: unloading flash file: ECTDell 02/26/22-05:21:17 (tRAID): NOTE: Turning on tray summary fault LED ProcMPErrData theshold default 2, nvram 0 RPAEccErrData DIMM 0 theshold 2, nvram 0 RPAEccErrData DIMM 1 theshold 2, nvram 0 RPAEccErrData DIMM 2 theshold 2, nvram 0 RPAEccErrData DIMM 3 theshold 2, nvram 0 RPAEccErrData DIMM 4 theshold 2, nvram 0 RPAEccErrData DIMM 5 theshold 2, nvram 0 RPAEccErrData DIMM 6 theshold 2, nvram 0 RPAEccErrData DIMM 7 theshold 2, nvram 0 ProcEccErrData theshold default 10, nvram 0 02/26/22-05:21:17 (tRAID): NOTE: Lockdown NVSRAM data does not match current version, rebuilding 02/26/22-05:21:17 (tRAID): NOTE: Installed Protocols: 02/26/22-05:21:17 (tRAID): NOTE: Required Protocols: 02/26/22-05:21:17 (tRAID): NOTE: loading flash file: Fibre 02/26/22-05:21:19 (tRAID): NOTE: DSM: Current revision 7 02/26/22-05:21:19 (tRAID): NOTE: SYMBOL: SYMbolAPI registered. 02/26/22-05:21:19 (tRAID): NOTE: cmgrCtlr: Board Manager HostBoardData Model Name for slot 0: 0801 02/26/22-05:21:19 (tRAID): NOTE: RCBBitmapManager total RPA size = 1761607680 02/26/22-05:21:20 (tRAID): NOTE: init: ioc: 0, PLVersion: 13-075-70-00 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=0 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=1 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=2 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=3 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=4 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=5 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=6 02/26/22-05:21:20 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=7 02/26/22-05:21:20 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc:0 chann el:1 numEntries:22 02/26/22-05:21:21 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.10 02/26/22-05:21:21 (tSasExpChk): NOTE: Download Expander firmware image 02/26/22-05:21:29 (tSasExpChk): NOTE: Configuration of expander complete. This controller's expander will reboot.. . 02/26/22-05:21:29 (tSasExpChk): NOTE: sasnBaseboardExpanderManage: exp firmware and/or config pages was successfull y updated, channel:1 returnCode:40 02/26/22-05:21:35 (IOSched): WARN: SAS Expander Removed: ch:1 expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc: 0 numEntries:13 02/26/22-05:21:40 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc:0 chann el:1 numEntries:22 02/26/22-05:21:41 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.16 02/26/22-05:21:41 (tSasExpChk): NOTE: Enabling this controller's expander phys, channel=1 02/26/22-05:21:41 (sasEnPhys): NOTE: sasEnableDrivePhysTask: 12 phys enabled on channel 1 02/26/22-05:21:41 (tSasDiscCom): NOTE: SAS: Initial Discovery Complete Time: 56 seconds since last power on/reset, 24 seconds since sas instantiated 02/26/22-05:21:41 (tRAID): NOTE: CrushMemoryPoolMgr: platform and memory (CPU MemSize:2048), adjusted allocating si ze to 262144 for CStripes 02/26/22-05:21:47 (tRAID): SOD: Instantiation Phase Complete 02/26/22-05:21:42 (tRAID): WARN: No attempt made to open Inter-Controller Communication Channels 0-0 02/26/22-05:21:42 (tRAID): NOTE: LockMgr Role is Master 02/26/22-05:21:43 (tRAID): NOTE: WWN baseName 0004f01f-afe6c1e0 (valid==>SigMatch) 02/26/22-05:21:43 (tRAID): NOTE: spmEarlyData: No data available 02/26/22-05:21:48 (tRAID): SOD: Pre-Initialization Phase Complete 02/26/22-05:21:43 (tRAID): WARN: BID: Signature are invalid, initializing Eeprom region 02/26/22-05:21:45 (utlTimer1): NOTE: fcnChannelReport ==> -2 -3 -4 -5 02/26/22-05:21:50 (utlTimer1): NOTE: fcnChannelReport ==> =2 =3 =4 =5 SubClass ID 80: Gas Gauging - IT Cfg Before: OCV Wait AftChg 68[ 2] = 17280(sec/10) After: OCV Wait AftChg 68[ 2] = 2880(sec/10) SubClass ID 80: Gas Gauging - IT Cfg Before: Qinv Min Temp 33[ 2] = 100(0.1 C) After: Qinv Min Temp 33[ 2] = 990(0.1 C) SubClass ID 80: Gas Gauging - IT Cfg Before: Qinv Max Temp LSB 32[ 1] = 0x90 After: Qinv Max Temp LSB 32[ 1] = 0xe8 SubClass ID 80: Gas Gauging - IT Cfg Before: Qinv Max Temp MSB 31[ 1] = 0x01 After: Qinv Max Temp MSB 31[ 1] = 0x03 02/26/22-05:21:57 (tRAID): WARN: cmgr::isBoardDiff: No controller record 02/26/22-05:21:57 (tRAID): WARN: cmgr::isHostBoardIdDiff: No controller record 02/26/22-05:21:57 (tRAID): WARN: cmgr::isSubModelIdDiff: No controller record 02/26/22-05:21:57 (tRAID): WARN: ACS: testCommunication: Domi Exception caught: IconSendInfeasibleException Error 02/26/22-05:21:57 (tRAID): NOTE: ACS: autoCodeSync(): Process start. Comm Status: 0 02/26/22-05:21:57 (tRAID): WARN: ACS: autoCodeSync(): Skipped since alt not communicating. 02/26/22-05:21:57 (tRAID): NOTE: iocfi: Peering Disabled (Alt Unavailable) 02/26/22-05:22:02 (tRAID): SOD: Code Synchronization Initialization Phase Complete 02/26/22-05:21:57 (NvpsPersistentSyncM): NOTE: NVSRAM Persistent Storage updated successfully 02/26/22-05:21:58 (tRAID): NOTE: SAFE: Process new features 02/26/22-05:21:58 (tRAID): NOTE: SAFE: Process legacy features 02/26/22-05:21:58 (tRAID): WARN: PSTOR: Drive with devnum 0x90000000 has valid header but board serial num and time stamp don't match 02/26/22-05:21:58 (tRAID): WARN: PSTOR: Drive board serial number 3BP002J timestamp 1645524572 02/26/22-05:21:58 (tRAID): WARN: PSTOR: nvsram serial number 3BP002I timestamp 989891063 02/26/22-05:21:58 (tRAID): WARN: PSTOR: Drive with devnum 0x90000001 has valid header but board serial num and time stamp don't match 02/26/22-05:21:58 (tRAID): WARN: PSTOR: Drive board serial number 3BP002J timestamp 1645524572 02/26/22-05:21:58 (tRAID): WARN: PSTOR: nvsram serial number 3BP002I timestamp 989891063 02/26/22-05:21:58 (tRAID): NOTE: PSTOR: Creating new pstor... 02/26/22-05:21:58 (tRAID): WARN: PSTOR: Failed to update Alternate controller NVSRAM 02/26/22-05:21:58 (bdbmBGTask): ERROR: sendFreezeStateToAlternate: Exception IconSendInfeasibleException Error 02/26/22-05:21:58 (tRAID): NOTE: SPM acquireObjects exception: IconSendInfeasibleException Error 02/26/22-05:21:59 (tRAID): NOTE: sas: Peering Disabled (Alt unavailable) 02/26/22-05:21:59 (tRAID): NOTE: fcn: Peering Disabled (Alt Unavailable) 02/26/22-05:21:59 (tRAID): NOTE: ion: Peering Disabled (Alt Unavailable) 02/26/22-05:21:59 (tRAID): WARN: Unable to intialize mirror device 02/26/22-05:21:59 (tRAID): NOTE: CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff 02/26/22-05:21:59 (tRAID): NOTE: CCM: readAndValidateCacheStore() partitioning for no mirroring 02/26/22-05:21:59 (tRAID): WARN: CCM: validateCacheMem() Invalidate cache due to Database mismatch 02/26/22-05:21:59 (tRAID): NOTE: CCM: validateCacheMem() cache memory is invalid 02/26/22-05:21:59 (tRAID): NOTE: CCM: validateCacheMem() Initializing my partition 02/26/22-05:22:00 (tRAID): WARN: CCM: initialize() - exchangeRestoreStatusAlt caught IconSendInfeasibleException Er ror 02/26/22-05:22:00 (tRAID): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStore ToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:00 (tRAID): NOTE: CCM: sodClearMOSIntentsAlt(), failure clearing MOS intents on alt 02/26/22-05:22:00 (tRAID): WARN: Unable to intialize mirror device 02/26/22-05:22:00 (tRAID): NOTE: CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff 02/26/22-05:22:01 (tRAID): NOTE: doRecovery: myMemory:1, IORCB:0, NEW:0, OLD:0 02/26/22-05:22:01 (tRAID): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStore ToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:01 (tRAID): WARN: UWManager::initializeNvsramIWLog: database not matched 02/26/22-05:22:01 (tRAID): WARN: UWManager::initializeNvsramIWLog: IWLog invalidated 02/26/22-05:22:01 (IWTask): NOTE: UWMgr: IW log drive 0x10002 created 02/26/22-05:22:01 (IWTask): NOTE: UWMgr: IW log drive 0x10003 created 02/26/22-05:22:02 (tRAID): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStore ToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:12 (tRAID): WARN: arvm::AsyncMirrorManager::initialize caught IconSendInfeasibleException Error 02/26/22-05:22:12 (tRAID): NOTE: Exception caught sending size to alt FCM initialize IconSendInfeasibleException Er ror 02/26/22-05:22:13 (tRAID): NOTE: DiagVolManager::initialize: Exception - Alt controller not ready 02/26/22-05:22:13 (tRAID): NOTE: MIB creation complete - size 15 02/26/22-05:22:18 (tRAID): SOD: Initialization Phase Complete ============================================== Title: Disk Array Controller Copyright 2008-2017 NetApp, Inc. All Rights Reserved. Name: RC Version: 08.20.24.60 Date: 04/13/2017 Time: 18:11:47 CDT Models: 2660 Manager: devmgr.v1120api13.Manager ============================================== 02/26/22-05:22:18 (tRAID): sodMain Normal sequence finished, elapsed time = 59 seconds 02/26/22-05:22:18 (tRAID): sodMain complete 02/26/22-05:22:14 (ProcessHandlers): WARN: CCM: backupStorageAvailable() caught IconSendInfeasibleException Error 02/26/22-05:22:14 (ProcessHandlers): NOTE: vdm::CrushTaskCoordinationManager::handleEvent(1) - Exception N4domi27Ic onSendInfeasibleExceptionE - IconSendInfeasibleException Error 02/26/22-05:22:14 (ProcessHandlers): NOTE: vdm::CrushTaskCoordinationManager::handleEvent(2) - Exception N4domi27Ic onSendInfeasibleExceptionE - IconSendInfeasibleException Error 02/26/22-05:22:14 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:14 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:14 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:14 (ccmEventTask): WARN: CCM Failed to notify the alternate to update Cache Store in pstor, writeCac heStoreToPstor() caught IconSendInfeasibleException Error 02/26/22-05:22:14 (ProcessHandlers): NOTE: SYMbol available 02/26/22-05:22:14 (dsmUpgrade): NOTE: DSM: Async upgrade completed (0 drives, 0 requested) 02/26/22-05:22:23 (ProcessHandlers): SOD: sodComplete Notification Complete 02/26/22-05:22:19 (acsDriveImage): NOTE: ACS drv image write start. Drv:0xa0010000,Ver:0x8202460,Reas:0 02/26/22-05:22:23 (bdbmSync): NOTE: OBB Synchronization: Successful 02/26/22-05:22:43 (utlTimer1): WARN: Extended Link Down Timeout on channel 2 02/26/22-05:22:43 (utlTimer1): WARN: Extended Link Down Timeout on channel 3 02/26/22-05:22:44 (utlTimer1): WARN: Extended Link Down Timeout on channel 4 02/26/22-05:22:44 (utlTimer1): WARN: Extended Link Down Timeout on channel 5 02/26/22-05:22:49 (acsDriveImage): NOTE: ACS drv image write complete. -=<###>=- Instantiating /ram as rawFs, device = 0x1 Formatting /ram for DOSFS Instantiating /ram as rawFs, device = 0x1 Formatting...Retrieved old volume params with %38 confidence: Volume Parameters: FAT type: FAT32, sectors per cluster 0 0 FAT copies, 0 clusters, 0 sectors per FAT Sectors reserved 0, hidden 0, FAT sectors 0 Root dir entries 0, sysId (null) , serial number f10000 Label:" " ... Disk with 1024 sectors of 512 bytes will be formatted with: Volume Parameters: FAT type: FAT12, sectors per cluster 1 2 FAT copies, 1010 clusters, 3 sectors per FAT Sectors reserved 1, hidden 0, FAT sectors 6 Root dir entries 112, sysId VXDOS12 , serial number f10000 Label:" " ... RTC Error: Real-time clock device is not working OK. Adding 14607 symbols for standalone. Reset, Power-Up Diagnostics - Loop 1 of 1 3600 Processor DRAM 01 Data lines Passed 02 Address lines Passed 3300 NVSRAM 01 Data lines Passed 4410 Ethernet 82574 1 01 Register read Passed 02 Register address lines Passed 6D40 Bobcat 02 Flash Test Passed 3700 PLB SRAM 01 Data lines Passed 02 Address lines Passed 65D0 Host Channel 1--Tachyon QE8 01 TachLite Register Test Passed 65D1 Host Channel 2--Tachyon QE8 01 TachLite Register Test Passed 65D2 Host Channel 3--Tachyon QE8 01 TachLite Register Test Passed 65D3 Host Channel 4--Tachyon QE8 01 TachLite Register Test Passed 3900 Real-Time Clock 01 RT Clock Tick Passed Diagnostic Manager exited normally. 02/26/22-07:12:09 (tNetCfgInit): NOTE: eth0: LinkUp event 02/26/22-07:12:09 (tNetCfgInit): NOTE: Acquiring network parameters for interface gei0 using DHCP 02/26/22-07:12:17 (ipdhcpc): NOTE: netCfgDhcpReplyCallback :: received OFFER on interface gei0, unit 0 02/26/22-07:12:18 (ipdhcpc): NOTE: DHCP server: 192.168.13.1 02/26/22-07:12:18 (ipdhcpc): WARN: **WARNING** The DHCP Server did not assign a permanent IP for gei0. 02/26/22-07:12:18 (ipdhcpc): WARN: Network access to this controller may eventually fail. 02/26/22-07:12:18 (ipdhcpc): NOTE: Client IP routers: 192.168.13.1 02/26/22-07:12:18 (ipdhcpc): NOTE: Client DNS name servers: 192.168.13.1 02/26/22-07:12:18 (ipdhcpc): NOTE: DNS domain name: squigley.net 02/26/22-07:12:18 (ipdhcpc): NOTE: Assigned IP address: 192.168.13.182 02/26/22-07:12:18 (ipdhcpc): NOTE: Assigned subnet mask: 255.255.255.0 02/26/22-07:12:18 (tNetReset): NOTE: Network Ready Current date: 02/26/22 time: 07:12:30 Send for Service Interface or baud rate change 02/26/22-07:12:29 (tRAID): NOTE: Set Powerup State 02/26/22-07:12:30 (tRAID): SOD Sequence is Normal, 0 on controller A Virtual User Services (VUS) Configuration ========================================= vusPlaceCCBInRPA = 0 vusPlaceMdatFSBsInRPA = 0 vusGetMaxRVMConsistencyGroups = 1 vusGetMaxDrives = 35 vusGetMaxPieces = 30 vusGetMaxLargeIOStruct = 30 vusGetMaxBufStruct = 40000 vusGetMaxIovStruct = 10000 vusGetMaxJobStruct = 750 vusGetControllerQueueDepth = 2048 vusGetNumIOClearingResources = 18 vusGetPortQueueDepth = 2048 vusGetForcedXferRepairCnt = 4096 vusGetMaxLargeIOBuffer = 3 vusGetMaxLargeIOSize(512) = 4096 vusGetMaxLargeIOSize(4096) = 512 vusGetMaxDiskIOSize = 1024 vusGetMaxBackgroundIOSize(512) = 4096 vusGetMaxBackgroundIOSize(4096 = 512 vusLimitCacheSize = 0 vusGetMaxDiskBlockSize = 512 vusGetMaxDmovRequest = 500 vusGetMaxDmovSegment = 500 vusBatterySupported = 1 vusEDC3Supported = 0 vusTestIncreasedDriveQDepth = 0 vusGetFwFeatures = 6 vusPlaceFcSestInRPA = 0 vusPlaceFcSGListInRPA = 1 vusGetMaxFcSGBufSize = 524284 vusGetMaxFcSGOddSize = 4 vusGetMaxFcDstSGBufSize = 524284 vusGetMaxFcEDCAlignSGSize = 0 vusGetDriveQDepthMgmtMethod = 1 vusNoParityCheck = 0 vusGetMaxSnapshotIOs = 400 vusGetMaxSnapDirtyDirBlks = 131072 vusGetMaxConcurrentCopies = 8 vusGetMaxWriteSameIOSize = 524288 vusGetMaxCompareAndWriteIOSize = 1 vusGetMaxEnabledCores = 1 02/26/22-07:12:31 (tRAID): NOTE: loading flash file: ECTDell 02/26/22-07:12:31 (tRAID): NOTE: unloading flash file: ECTDell 02/26/22-07:12:31 (tRAID): NOTE: Turning on tray summary fault LED 02/26/22-07:12:31 (tRAID): NOTE: Installed Protocols: 02/26/22-07:12:31 (tRAID): NOTE: Required Protocols: 02/26/22-07:12:31 (tRAID): NOTE: loading flash file: Fibre 02/26/22-07:12:34 (tRAID): NOTE: DSM: Current revision 7 02/26/22-07:12:34 (tRAID): NOTE: SYMBOL: SYMbolAPI registered. 02/26/22-07:12:34 (tRAID): NOTE: cmgrCtlr: Board Manager HostBoardData Model Name for slot 0: 0801 02/26/22-07:12:34 (tRAID): NOTE: RCBBitmapManager total RPA size = 1761607680 02/26/22-07:12:35 (tRAID): NOTE: init: ioc: 0, PLVersion: 13-075-70-00 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=0 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=1 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=2 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=3 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=4 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=5 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=6 02/26/22-07:12:35 (tRAID): ERROR: findChannelFromIocPhy: channel not found, ioc=0, phy=7 02/26/22-07:12:35 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:25 port:2 ioc:0 chann el:0 numEntries:22 02/26/22-07:12:35 (IOSched): NOTE: SAS Expander Added: expDevHandle:x12 enclHandle:x3 numPhys:25 port:3 ioc:0 chann el:1 numEntries:22 02/26/22-07:12:36 (tSasEvtWkr): NOTE: Alt controller path up on channel:0 devH:x26 expDevH:x11 phy:16 itn:10 02/26/22-07:12:36 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.16 02/26/22-07:12:36 (tSasExpChk): NOTE: Enabling this controller's expander phys, channel=0 02/26/22-07:12:36 (tSasExpChk): NOTE: sasExpanderCheck send icon connection up event channel:0 02/26/22-07:12:37 (tRAID): NOTE: CrushMemoryPoolMgr: platform and memory (CPU MemSize:2048), adjusted allocating si ze to 262144 for CStripes 02/26/22-07:12:37 (sasEnPhys): NOTE: sasEnableDrivePhysTask: 0 phys enabled on channel 0 02/26/22-07:12:38 (tRAID): SOD: Instantiation Phase Complete 02/26/22-07:12:38 (tRAID): NOTE: Inter-Controller Communication Channels Opened 02/26/22-07:12:38 (tSasEvtWkr): NOTE: Alt controller path up on channel:1 devH:x25 expDevH:x12 phy:12 itn:22 02/26/22-07:12:38 (tSasExpChk): NOTE: Alternate Expander Firmware Version: 00.00.80.16 02/26/22-07:12:38 (tSasExpChk): NOTE: sasExpanderCheck send icon connection up event channel:1 02/26/22-07:12:39 (IOSched): NOTE: New Initiator: ioc: 0, port: 3, devHandle: x25, sasAddress: 5f01faf4e6c1cf08 02/26/22-07:12:39 (IOSched): NOTE: New Initiator: ioc: 0, port: 2, devHandle: x26, sasAddress: 5f01faf4e6c1cf0c 02/26/22-07:12:39 (tSasInitWkr): NOTE: New Initiator: 1 - channel:1, devHandle:x25, SAS Address:5f01faf4e6c1cf08 02/26/22-07:12:39 (tSasInitWkr): NOTE: New Initiator: 2 - channel:0, devHandle:x26, SAS Address:5f01faf4e6c1cf0c 02/26/22-07:12:41 (tRAID): NOTE: LockMgr Role is Slave 02/26/22-07:12:42 (tSasDiscCom): NOTE: SAS: Initial Discovery Complete Time: 43 seconds since last power on/reset, 10 seconds since sas instantiated 02/26/22-07:12:42 (tSasDiscCom): NOTE: sasDiscoveryCompleteFirstTime send icon connection up event channel:0 02/26/22-07:12:42 (tSasDiscCom): NOTE: sasDiscoveryCompleteFirstTime send icon connection up event channel:1 02/26/22-07:12:42 (tHckReset): NOTE: Firmware running line - ON 02/26/22-07:12:42 (tRAID): NOTE: WWN baseName 0004f01f-afe6c1e0 (valid==>SigMatch) 02/26/22-07:12:43 (tRAID): SOD: Pre-Initialization Phase Complete 02/26/22-07:12:44 (utlTimer1): NOTE: fcnChannelReport ==> -2 -3 -4 -5 02/26/22-07:12:49 (utlTimer1): NOTE: fcnChannelReport ==> =2 =3 =4 =5 02/26/22-07:12:50 (ssmTimer): NOTE: Power Supply 0 VPD recovered on retry 1 02/26/22-07:12:52 (iacTask4): NOTE: cmgrCtlr: Board Manager HostBoardData Model Name for slot 0: 0801 02/26/22-07:12:52 (tRAID): NOTE: ACS: autoCodeSync(): Process start. Comm Status: 1 02/26/22-07:12:52 (iacTask7): NOTE: cmgrCtlr: Board Manager HostBoardData Model Name for slot 0: 0801 02/26/22-07:12:53 (tRAID): SOD: Code Synchronization Initialization Phase Complete 02/26/22-07:12:53 (NvpsPersistentSyncM): NOTE: NVSRAM Persistent Storage updated successfully 02/26/22-07:12:53 (tRAID): NOTE: SAFE: Process new features 02/26/22-07:12:53 (tRAID): NOTE: SAFE: Process legacy features 02/26/22-07:12:55 (tRAID): NOTE: IOManager::restoreData - dataSize:0x2800000, startAddress:0x4779f7e00 02/26/22-07:12:56 (tRAID): NOTE: ncb::IOManager::restoreData - Successful 02/26/22-07:12:56 (tRAID): NOTE: OBB Restore From Flash Completed 02/26/22-07:12:57 (tRAID): NOTE: sas: Peering enabled 02/26/22-07:12:59 (tRAID): NOTE: CacheMgr::cacheOpenMirrorDevice:: mirror device 0xf00011 02/26/22-07:12:59 (tRAID): NOTE: CCM: validateCacheMem() cache memory is invalid 02/26/22-07:12:59 (tRAID): NOTE: CCM: validateCacheMem() Initializing my partition 02/26/22-07:13:00 (tRAID): NOTE: doRecovery: myMemory:1, IORCB:0, NEW:0, OLD:0 02/26/22-07:13:00 (tRAID): NOTE: IOManager::restoreData - dataSize:0x80000, startAddress:0x477977e00 02/26/22-07:13:00 (tRAID): NOTE: ncb::IOManager::restoreData - Successful 02/26/22-07:13:01 (tRAID): NOTE: UWMgr findIWLogs: Found IW log drive. Devnum 0x10000 tray=0 slot=1 ssd=0 qos=3 con troller=0 02/26/22-07:13:01 (tRAID): NOTE: UWMgr findIWLogs: Found IW log drive. Devnum 0x10001 tray=0 slot=2 ssd=0 qos=3 con troller=0 02/26/22-07:13:01 (IWTask): NOTE: UWMgr: IW logging started 02/26/22-07:13:04 (tRAID): NOTE: MIB creation complete - size 15 02/26/22-07:13:05 (tRAID): SOD: Initialization Phase Complete ============================================== Title: Disk Array Controller Copyright 2008-2017 NetApp, Inc. All Rights Reserved. Name: RC Version: 08.20.24.60 Date: 04/13/2017 Time: 18:11:47 CDT Models: 2660 Manager: devmgr.v1120api13.Manager ============================================== 02/26/22-07:13:05 (tRAID): sodMain Normal sequence finished, elapsed time = 35 seconds 02/26/22-07:13:05 (tRAID): sodMain complete 02/26/22-07:13:05 (iacTask8): NOTE: cmgrCtlr: Board Manager HostBoardData Model Name for slot 0: 0801 02/26/22-07:13:06 (PersistentRestore): NOTE: OBB Restore Completed 02/26/22-07:13:06 (PersistentRestore): NOTE: PRES Restore Completed 02/26/22-07:13:06 (PersistentRestore): WARN: ddcDq & ddcTrace restore abandoned: nothing to recover 02/26/22-07:13:06 (PersistentRestore): WARN: PSTOR: PstorRecordMgr: removeRecord failed 02/26/22-07:13:06 (PersistentRestore): WARN: deleteBackupStatus: caught pstorRecordNotFoundException Line 1893 File ncbIOManager.cc 02/26/22-07:13:06 (PersistentRestore): NOTE: DDC Restore Failed 02/26/22-07:13:06 (PersistentRestore): NOTE: IOManager::restoreData - dataSize:0x600000, startAddress:0xf351560 02/26/22-07:13:06 (ProcessHandlers): NOTE: SYMbol available 02/26/22-07:13:06 (IOSched): NOTE: UWMgr: Alt Mirror Check In Complete 02/26/22-07:13:06 (IWTask): NOTE: UWMgr: IW peering started 02/26/22-07:13:06 (IWTask): NOTE: UWMgr: IW logging stopped 02/26/22-07:13:06 (PersistentRestore): NOTE: ncb::IOManager::restoreData - Successful 02/26/22-07:13:06 (PersistentRestore): NOTE: DQ Restore Completed 02/26/22-07:13:07 (dsmUpgrade): NOTE: DSM: Async upgrade completed (0 drives, 0 requested) 02/26/22-07:13:14 (ProcessHandlers): SOD: sodComplete Notification Complete 02/26/22-07:13:16 (bdbmSync): NOTE: OBB Synchronization: Successful 02/26/22-07:13:43 (utlTimer1): WARN: Extended Link Down Timeout on channel 2 02/26/22-07:13:43 (utlTimer1): WARN: Extended Link Down Timeout on channel 3 02/26/22-07:13:43 (utlTimer1): WARN: Extended Link Down Timeout on channel 4 02/26/22-07:13:43 (utlTimer1): WARN: Extended Link Down Timeout on channel 5 02/26/22-07:18:05 (utlTimer1): NOTE: sasDEClearRecoveryState: 02/26/22-07:18:12 (symTask3): NOTE: *** Volume 0 is now OPTIMAL *** 02/26/22-07:18:13 (ccmEventTask): NOTE: CCM: reconfigureCache() reconfigSyncCache succeeded 02/26/22-07:18:13 (ccmEventTask): NOTE: CCM: reconfigureCache() quiesceVolumes succeeded 02/26/22-07:18:13 (ccmEventTask): NOTE: CCM: reconfigureCache() Partitioning for mirroring 02/26/22-07:18:13 (ccmEventTask): NOTE: CCM: reconfigureCache() initializing my partition 02/26/22-07:18:13 (ccmEventTask): NOTE: CCM: reconfigureCache() Initializing alt's mirror partition 02/26/22-07:18:14 (ccmEventTask): NOTE: CCM: reconfigureCache() configuring cache 02/26/22-07:18:14 (ccmEventTask): NOTE: doRecovery: myMemory:1, IORCB:0, NEW:0, OLD:0 02/26/22-07:18:14 (ccmEventTask): NOTE: CCM: reconfigureCacheCommon() clearing MOS 02/26/22-07:18:14 (ccmEventTask): NOTE: MediaScanAgent restartIO changing boundary from x9000 to x9000 So now.. squigley@r710-c:/var/log$ rescan-scsi-bus.sh You need to run scsi-rescan-bus.sh as root squigley@r710-c:/var/log$ sudo rescan-scsi-bus.sh Scanning SCSI subsystem for new devices Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs Scanning host 1 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs Scanning host 2 for all SCSI target IDs, all LUNs Scanning for device 2 0 3 0 ... NEW: Host: scsi2 Channel: 00 Id: 03 Lun: 00 Vendor: DELL Model: MD36xxf Rev: 0820 Type: Direct-Access ANSI SCSI revision: 05 Scanning for device 2 0 3 31 ... OLD: Host: scsi2 Channel: 00 Id: 03 Lun: 31 Vendor: DELL Model: Universal Xport Rev: 0820 Type: Direct-Access ANSI SCSI revision: 05 1 new or changed device(s) found. [2:0:3:0] 0 remapped or resized device(s) found. 0 device(s) removed. squigley@r710-c:/var/log$ squigley@r710-c:/var/log$ sudo fdisk -l Disk /dev/loop0: 61.93 MiB, 64917504 bytes, 126792 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop1: 55.39 MiB, 58073088 bytes, 113424 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop2: 71.28 MiB, 74735616 bytes, 145968 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop3: 61.91 MiB, 64897024 bytes, 126752 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop5: 43.6 MiB, 45703168 bytes, 89264 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop6: 55.52 MiB, 58204160 bytes, 113680 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sda: 4.88 TiB, 5347976675328 bytes, 10445266944 sectors Disk model: MD36xxf Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes squigley@r710-c:/var/log$ squigley@r710-c:/var/log$ sudo fdisk -l /dev/sda Disk /dev/sda: 4.88 TiB, 5347976675328 bytes, 10445266944 sectors Disk model: MD36xxf Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes squigley@r710-c:/var/log$ squigley@r710-c:/var/log$ squigley@r710-c:/var/log$ sudo fdisk -l /dev/sda Disk /dev/sda: 4.88 TiB, 5347976675328 bytes, 10445266944 sectors Disk model: MD36xxf Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes squigley@r710-c:/var/log$ sudo fdisk /dev/sda Welcome to fdisk (util-linux 2.34). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Device does not contain a recognized partition table. The size of this disk is 4.9 TiB (5347976675328 bytes). DOS partition table format cannot be used on drives for volumes larger than 2199023255040 bytes for 512-byte sectors. Use GUID partition table format (GPT). Created a new DOS disklabel with disk identifier 0x2226341f. Command (m for help): Command (m for help): g Created a new GPT disklabel (GUID: 42ADBB13-6FCB-474C-8338-14746A219EDF). Command (m for help): n Partition number (1-128, default 1): First sector (2048-10445266910, default 2048): Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-10445266910, default 10445266910): Created a new partition 1 of type 'Linux filesystem' and of size 4.9 TiB. Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. 5.4TB before?

Thursday, November 19, 2020

MySQL replication

At one of my previous jobs, I was tasked with taking an existing master/slave MySQL DB setup, and upgrading it from v5.x to v8.x.

This also included some hardware upgrades, SSDs etc, and complete reinstalls of the OS.

I did that a year and a half ago, so I don't remember the specifics, but what I do remember, is that I did some research into the safest and most convenient way to do it.

This involved running up a couple of temporary machines, running v8, and connecting them as slaves (or backups, since master/slave is apparently outdated terminology..), and then being able to use these to pivot the other servers around.

For some confusing fun, I seem to recall that the hostnames of those (hardware) systems included "master" and "slave", however due to some previous issues, they had been swapped around, and the machine called master was actually the slave, and vice versa, and there was only one way replication going on.

So, I spun up a virtual machine, and installed v8 on it, and connected it as a slave, after first being able to lock and dump the existing slave in order to obtain a consistent dump. With that working, just for fun, and security, I added a 4th system, another VM.

So at this point, there were the original master and slave hardware systems, and my new 3rd VM system, which I configured to be a slave to both the original master and slaves. I then connected the 4th VM system to the 3rd.

This is when I found out about relay logging, vs just master/slave logging.. The 3rd machine had to be configured as a relay, or else the 4th system would get left behind. That got configured/enabled/tested, and in the process I seem to recall needing to reload the data into it, which was fine, because I was able to enact read/write locks on any of the DBs in order to get consistent dumps, other than the sole master/primary server, eg, the slave/hardware, or the 3rd/VM.

There were lots of other quirks and issues discovered in the process of connecting these new v8 temp machines to the primary v5 machine, as things have changed, such as user management, not being able to create users by granting privs to them etc, if they don't already exist, and some stuff to do with the handling of character sets, latin vs utf8 etc.

You can read up on the process and some of those traps:

https://dev.mysql.com/doc/refman/8.0/en/replication-upgrade.html

https://dev.mysql.com/doc/refman/8.0/en/group-replication-online-upgrade-combining-versions.html

https://dev.mysql.com/doc/refman/8.0/en/group-replication-online-upgrade.html

So, with that all in place and working, I was then able to take the original slave server hardware offline, safe in the knowledge that we had 2 more slaves/backups to the primary running.

The hardware was upgraded, it may have involved swapping CPUs and memory into it from some other system(s), I can't exactly recall, along with the existing rust/spinner hard drives becoming secondary/additional space in the systems, with new SSDs being installed (probably a RAID 1, since that's what I do..)

The OS was reinstalled, and MySQL, v8, and then I was able to lock and dump all the data out of either the 3rd or 4th systems, load it into the new reinstalled system, now called "server01", as opposed to being a master or slave, as the intention was to implement cyclic replication, AKA master/master, and also to eventually (soon) promote this system to be the primary, so that the current master/primary could be taken offline, upgraded, wiped, reinstalled etc..

So the upgraded hardware system ("01") had all the data loaded, and was connected to the existing primary as a slave (probably along with either the 3rd of 4th system, so that it would catch up and stay up to date after the data was dumped and loaded).

I can't remember, but I think the current primary server was then connected to the new/upgraded system ("01") as a slave, so that it would stay in sync, for the brief period before it was phased out.

With that all working, all the client systems were reconfigured to connect via a DNS CNAME record, which was then pointed to server01, and restarted/reconnected.

This took the original master/primary server hardware out of the loop, at which point I was able to upgrade it, install CPUs/RAM/SSDs etc, reinstall the OS, and MySQL etc..

With it back up and running, now called "server02", it was loaded with all the data, by locking and dumping it from the 3rd or 4th systems, so as to not interfere with the production function of the DB.

It was then connected as a slave to whichever system (3rd or 4th) it had the data dumped from, and was able to catch up.

It was then connected to the new primary server ("01") as a slave, and in turn server01 was connected to the new server02 as a slave, for cyclic replication and redundancy.

Servers 3 and 4 were at some point disconnected and discarded, as they had served their purpose.

I left that job a short while after, solely because I couldn't handle the hours; needing to be at my desk for 8:30am every morning, which was seriously affecting my depression, and insomnia, and I was so tired all the time I couldn't think straight, couldn't function properly, was making stupid mistakes all the time. This was just a feedback loop into my depression.

It didn't make any sense either, as the majority of my co-workers that I worked with were in another office, in another timezone, 2 hours behind, so it wasn't like I could do much until they got into the office in the morning, and then we'd lose another 2 hours in the middle of the day because of the offset lunch breaks.

I would have been happy to have offset my entire day by 2 hours, to work in sync with the other office, but that wasn't an option, nor was working from home, which also would have helped me to be able to start work earlier in the day than what my brain prefers.

Speaking of lunch breaks, I quite often used to have to go out to my car on my lunch break and sleep in the back of it for 45 minutes or so, in order to be able to be able to get anything useful done during the day. I thought my head was just going to fall off.

I'd already received a written warning for my tardiness, the first written warning I have ever received in my career, and later, one Friday afternoon, I had been told that I was pretty much going to be fired right then, but that my manager had pleaded for them to keep me on.

I stayed around for another couple of months, but I knew that I was on incredibly thin ice, and that if I was late for any reason again, that I would probably just be fired and told to bugger off as soon as I arrived at the office.

Anyway, that's all mostly irrelevant to this story, and I don't want this to turn into one of those recipe pages where you have to read through 20 pages of someone's incredibly boring life story before you can get to the part where you find out how many cups of water you are supposed to use with a cup of rice (1 3/4), when you are trying to make fried rice.

So, getting back to the point.. I have been gone from this job for almost a year, and they get back in contact with me.

Someone was trying to do some DB upgrades, and the gist of it is that they made a mess of it.

They changed the DNS CNAME pointer (I presume), to point to the other DB, but because this isn't going to update immediately/everywhere, they ended up with a split brain scenario, and some clients writing in the primary DB still, and others moving and writing in the backup DB, which should normally be OK, and it's part of why cyclic replication is a thing, however..

In some circumstances, when there are triggers on tables, and/or the way primary keys are used etc, this can, and will, cause everything to break. Something will try to update in one DB, and sync to the other, where that has already been changed to something else, or the next primary key in an incremental count has been assigned to something else..

And then everything falls apart into a heap. Replication stops in both directions. Your tables/DBs are out of sync, and continue to get more and more out of sync.

So they contacted me, and I remoted into the 2 DB systems to work out what was going on, and how to fix it, and fix it.

Since the systems are running v8, and have been setup using GTIDs, it makes things a little more tricky to resolve, vs the old way, when you could just skip transactions and start the slave running again, skipping any of the conflicting transactions (although you still end up with data inconsistency, caused by these conflicting transactions which don't get resolved, just skipped/ignored).

The new way is similar, but it's not just a matter of telling it to skip 1 or more lines in the binlogs.

The new way, with GTIDs, is that you need to look at the slave status, and the received and executed sets of GTIDs, and then manually run dummy version of the GTID(s) which it is jamming on, because of the conflict.

So you disable the auto tracking of GTIDs, set it to point to the next GTID, run a dummy commit which doesn't do anything, except "use" that GTID, causing it to auto increment, and then you might be able to start the slave process again, and it will catch up and resume.

If you are lucky.

If you are not, then the conflicting statement will have some effect on the next statement/GTID, and it will conflict and fail.

So then you repeat the process, setting the GTID manually, dummy committing it, and trying to resume. And you do this again. And again, and again.

And eventually, you will hopefully have skipped all the affected transactions, and it will start replicating again, and catch up.

However, you now have inconsistency between the 2 tables/DBs.

Most people just ignore this and get on with their lives. Stuff is working again, so who cares?

If/when you flip the DBs over, and/or you are reading from both of them at the same time, you will now occasionally get weird results, depending on what you are querying, and if the inconsistent tuples are involved..

So how do you fix that?

You need to disconnect both systems from talking to each other, ie break the replication again, then you need to reset/wipe the settings related to the master/slave replication (I found the hard way).

Once that's done, you need a way to get a consistent dump of the data in the primary (or whichever system you deem to be the authoritative system), and load this into/over the top of the other one, to make them consistent, and then restart the replication, on both.

This seems pretty straight forward, but the issue occurs when you try to obtain that consistent dump, if you only have one system now, and it's your production system, as in order to obtain a consistent dump, it needs to be locked, so that no changes are occurring, while the data is dumped out of it.

I tried this several different ways; the old way, which is to reset the binlogs, flush and lock the tables, obtain the binlogging coordinates, and then start the dump, which should (does) work.

Once the dump is complete, you can unlock the tables and allow changes to start occurring again.

This is fine, except how do you handle this when your dump takes over 10 minutes?

I was reading that once you start the dump, you can apparently unlock the tables, and it will still work..

It should, because anything that happens after the locks are unlocked, should happen, and also get logged, so when the dump is loaded, and the log replayed, it should bring it up to sync, even if it tried to do transactions again which had already been done (though I guess that would be bad if they were incrementing statements or something).

That doesn't work. You end up with a few hundred rows missing, somehow.

Other "tricks" I found involved using --single-transaction, which is supposed to get a historical read lock on the tables at the point the dump starts, so it's basically reading them back in time to be in the state they were when the dump started, and then again, you just replay the log to catch up.. 

Doesn't work.

There's a new (to me at least) switch, of --master-data, which obtains the lock for you when the dump starts, and it writes the coordinates into the dump, and then unlocks when it's done.

This works.. except.. It's basically the original old method, of needing to flush and lock the tables and blocking changes for the entire time the dump is running. No good when you need to do this when people are trying to use the DB.

Also, the CHANGE MASTER TO statement that it includes in the dump, only includes the log file and position. So, that won't work, because the slave won't know who or where or what its master server is, nor the user/password to connect.

You may have already had these set, from before you stopped it.. but what I then found out the hard way, is that the GTID set(s) are dumped into the dump file, and when you try to load these on your backup server, it will conflict with the existing set(s) it has in the master/slave settings..

So these will conflict, and you can't set/adjust them manually (or via the load, which is manually)..

It will error out with something like "ERROR 3546 (HY000) at line 24: @@GLOBAL.GTID_PURGED cannot be changed: the added gtid set must not overlap with @@GLOBAL.GTID_EXECUTED"

You need to reset/wipe the master/slave settings, and re-enter them all.

And you can't just set a log file name and position when you don't have any other settings (eg server, user, password) set..

So, the correct method is probably to stop the slave on the backup (and primary), reset/wipe the settings (on both?), re-renter the hosts, users, passwords, and some dummy log settings.

Then you can load the dump, and it will update the log settings correctly, and then you can start the slave, and it will replay the logs and catch up the DB to what's currently running on the primary.

This is basically what I ended up doing, except I had to grep the CHANGE MASTER statement out of the dump in order to find the log settings, and set it manually, after resetting/wiping the master/slave settings, so that the load would succeed, and because I hadn't reset the host/user/password, the log file and position wouldn't set.

Anyway, with that all in place, I was able to start the slave, and it connected, replayed the logs, and was back up to date.

The next issue is that the primary is not a slave to the backup system, so there is only one way replication.

Thankfully at this point the servers contain the same data (or should!), so there's no need to dump it out and copy it over and load it back in to the other one..

It's just a matter of obtaining the log file name and position on the backup (maybe after locking the tables briefly?) and setting those coordinates on the primary server, and starting it, and then unlocking the backup, and allowing it to continue.

At that point, the master and slave status can be checked on both ends, and the received and executed GTID sets should match, and increment together.

Oh.. that's only if you have valid replication users on both ends..

For some reason, it looks like each DB had their own replication user created locally, which worked initially, but once the DB was dumped from the primary, and loaded over the secondary, it replaces the user table.. Since the replication user allowing the primary to connect to the secondary didn't exist in the primary (which it doesn't need to), it was then missing, and the primary was unable to connect to the secondary..

On the primary, I recreated the replication user for the secondary, which doesn't make much sense initially.. however this user will replicate to the secondary, and then exist in both places (along with the counterpart user for the opposite direction replication, which already existed on both).

Once that was done, and the fiddling of making sure that the FQDN was included in the username, as that's what each DB provides to the other, and the password(s) set to something known, and configured in the master/slave settings on each, they could both successfully connect and replicate to/from each other.

Some takeaways:

It's a good idea to have a 3rd, and/or 4th systems, perhaps only as slaves, connected to your primary/backup systems, in case you need to be able to block the tables for 10 minutes or more, while you get a consistent dump of the data.

Let's say we just have a 3rd system. it would also make good sense, even if it is only a slave to both other systems, and nothing else uses/reads from it, as it could be used to ascertain quorum in the case of writes being performed in the wrong DB, causing conflicts.

There are probably many other good reasons to have a 3rd system..

I realise there's not a lot of useful commands and copy/pastas in here, this is more of a theory than a practical article in the end..

Here's some useful links though..

https://www.barryodonovan.com/2013/03/23/recovering-mysql-master-master-replication

https://medium.com/@techrandomthoughts/resolving-mysql-replication-failure-on-ubuntu-using-shell-script-22d415ed0199

https://kb.virtubox.net/knowledgebase/backup-your-databases-with-mysqldump/

https://dba.stackexchange.com/questions/71961/mysqldump-single-transaction-yet-update-queries-are-waiting-for-the-backup

https://dev.mysql.com/doc/refman/8.0/en/replication-howto-masterstatus.html

https://dev.mysql.com/doc/refman/8.0/en/replication-snapshot-method.html

https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html#option_mysqldump_master-data

https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql

https://dev.mysql.com/doc/refman/8.0/en/replication-howto-repuser.html

https://dev.mysql.com/doc/refman/8.0/en/set-password.html

https://dba.stackexchange.com/questions/139131/error-1236-could-not-find-first-log-file-name-in-binary-log-index-file