Topic: n3 cm1 failure
In the last two weeks I've started having system failures. The system, comprised of two n3's and a ConMan node, stopped passing audio and the n3 in question required a reboot. Here are the suspect n3's log entries beginning at the time of the error:
10/28/2007 7:40:54 12408 note mcp/processes shutting down gracefully
10/28/2007 7:34:27 12407 note piond/role_manager role is stopped
10/28/2007 7:34:24 12406 note project user logged off: pwadmin
10/28/2007 7:34:24 12405 note project user logged off: etech
10/28/2007 7:34:24 12404 note piond/role_manager role is running
10/28/2007 7:34:24 12403 fault piond/fault_policy more than one error in less than one minute; stopping engine
10/28/2007 7:34:20 12402 note project user logged on: pwadmin
10/28/2007 7:34:20 12401 note project user logged on: etech
10/28/2007 7:33:58 12400 error piond/cm1 cm1 not detected : /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:58 12399 error piond/cm1 peek aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:57 12398 note piond/role_manager restarting role : USF SS new/DSP-01/JFb7-bfKY0Jcl7tj8I3pKUZ6uV8/xkD1T16dBJ-l9g11zxr5QbowFDS
10/28/2007 7:33:55 12397 note project user logged off: pwadmin
10/28/2007 7:33:55 12396 note project user logged off: etech
10/28/2007 7:33:55 12395 error piond/cm1 peek aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12394 error piond/cm1 mute assertion failed: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12393 error piond/cm1 poke aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12392 error piond/cm1 poke/peek driver exception : /dev/pion/cm10: timeout waiting for HF2 to go high
10/28/2007 7:33:55 12391 note piond/mute muted: menu command
10/28/2007 7:33:55 12390 error piond/fault_policy restarting audio engine
10/28/2007 7:33:55 12389 error piond/cm1 peek aborted after 5 tries: /dev/pion/cm10: timeout waiting for HF2 to go high
The graceful shutdown was initiated by the technicians to bring the system back online.
As mentioned above, this has happened at least twice. Same errors with the same n3, about 10 days apart. The other n3's logs just contain the normal complaints about losing XDAB at the same time the above happened. Its log entries are as follows:
10/28/2007 7:40:55 3889 note mcp/processes shutting down gracefully
10/28/2007 7:40:37 3888 note project user logged off: pwadmin
10/28/2007 7:34:04 3887 note piond/xdab/leader arbitration done; ring is incomplete in redundant failed mode
10/28/2007 7:33:55 3886 error piond/xdab/leader communication failure
10/28/2007 7:33:54 3885 note piond/xdab/leader poll returned false: 'DSP-01'
10/28/2007 7:33:54 3884 note piond/mute muted: xdab loss of clock signal