Startup Shutdown Synchronization Protocol (SSSP v1.3)¶
SSSP defines signal handling during the startup phase until all AMiRo Modules are fully initialized and during the shutdown phase, so that the system turns off in a controlled and safe manner or restarts, if requested.
The complexity of the protocol is quite low and designed in a way that modules which do not implement SSSP will not compromise system operation.
Hence, only two GPIO signals are required:
S
- synchronizePD
- power down
Both must be designed in a way, that they realize a logical OR on activation (one or more nodes are active) and a logical AND on deactivation (all nodes are inactive) respectively.
Electrically this can be implemented using active-low open-drain signals with pull-up resistors.
Although these two mandatory signals suffice to implement the protocol, some optional features require some further signals and communication interfaces:
UP/DN
- GPIO to the adjacent module (neighbor) up/downBCB
- a communication bus with broadcast capability
Note that a heterogeneous setup with some modules supporting the optional stages and others do not is fully compatible.
However, these optional features will only apply successfully if all modules support them.
Hence, the system must not rely on the additional information, but may take advantage of it, if it is available.
In order to make the protocol adaptable to any system, it uses a parameter T
.
This defines a time period, which is used by the protocol for synchronization or to detect timeouts.
However, this parameter must be identical for all nodes within a system (or at least similar since the factor between the largest and smallest parameter in the system must be smaller than ten).
An additional parameter F
defines the frequency at which S
is toggled during the Operation Phase.
Startup Phase¶
All modules must initialize the signals in a way, that S
is active and PD
is inactive.
Although only S
is used for startup, PD
must be inactive during the startup phase, or the shutdown phase will be initiated either immediately by the bootloader or by the operating system as soon as it is active.
Each module executes the following steps:
- basic initialization
- initialization of required signals, voltages, or other hardware
This first stage is very module specific and strongly depends on the hardware configuration.
When a module has finished this stage, it setsS
to inactive.
In order to prevent erroneous behavior due to incorrect signals during the initialization, this stage takes at least one periodT
. - waiting for synchronization
Each module waits forS
to become inactive (all modules are initialized) as a first synchronization. - synchronous start of stage 2
As soon asS
is inactive, the master node activates it again in order to start the next stage.
To ensure that each module had enough time to detect the inactive state ofS
, the master node must delay the activation by at least one periodT
.
- initialization of required signals, voltages, or other hardware
- operating system initialization
- complete system startup
Each module activatesS
again and fully initializes (e.g. starts the operating system).
As soon as it is ready, it deactivatesS
again.
When a module indicates to be ready, at least the main communication channel (for AMiRo this is CAN) must be fully operational.
Again,S
must be active for at least one periodT
, so every module can detect the activation. - waiting for synchronization
Each module waits forS
to become inactive (all modules are ready).
Only now it is safe to use the main communication channel and all modules are able to receive messages correctly.
- complete system startup
- module stack initialization [optional]
This stage is optional and can only be applied if all modules can read and write from/to the main communication channel (BCB
) and two additional signals (UP
andDN
), which connect neighboring modules, exist for each AM.
Furthermore, the first and last node of this 'module stack' must be known beforehand.
In case of the AMiRo, for instance, the DiWheelDrive and PowerManagement are defined to be the lowermost modules, and the LightRing always finalizes the stack at the top.- initiation of this stage
The master node initiates this stage by broadcasting an unique command viaBCB
to all modules, so they can interpret the upcoming communication via the neighbor signals (UP
andDN
) correctly. - starting the sequence
One of the known nodes at the end of the module stack broadcasts its own stack number (e.g. 1) viaBCB
.
Right after that, it signals its neighboring module to continue by setting the neighbor signal active for at least one periodT
.
Note that the smallest and largest numbers (0 and 255 for 8-bit addressing) are reserved and must not be used by any module.
These 'stack IDs' can later be used to represent a hierarchy within the system. - counting the modules
When a module is triggered by the activation of the neighbor signal, it broadcasts its own stack number, which is defined to be greater than the last one.
Then again, it triggers the next module in the stack to continue via the other neighbor signal.
This step is repeated until the one of the following terminating conditions is fulfilled. - termination of this stage
There are two ways this stage can be terminated: either it is completed correctly, or it is aborted because of a timeout.- completion
The stage is completed correctly if the signal is propagated to the known node on the other end of the module stack.- broadcast of final ID
The final (known) module broadcasts its own stack number (which is computed as the others before) plus the information, that the module stack initialization is done. - successful termination of the stage
The master node broadcasts a messages, which indicates the successful termination of the stage.
In this case, all nodes adopt their ID and can use it for later identification.
- broadcast of final ID
- abortion
The stage is aborted, if more than ten periodsT
have passed since the last ID was broadcasted.
Such a timeout can only occur if a module does not support the propagation process, or because of hardware issues.
When the master node detects such a delay, it broadcasts a message, which indicates the unsuccessful termination of the stage.
As result, all stack numbers must be considered as unreliable, thus identification is not supported.
- completion
- initiation of this stage
At the end of the startup phase (more precisely after stage 2) both signals, S
and PD
, are inactive.
Note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S
.
However, such a module might cause errors after the startup phase, if it does not receive crucial information because communication is not set up (e.g. stage 3 might fail).
Operation Phase¶
All AMiRo Modules are kept in sync during operation by toggling S
at frequency F
.
Hence, all modules must act as slaves and there may only be one (or none) master node.
Since S
gets activated when a shutdown is initialized (see Shutdown Phase), modules must synchronize at deactivation (logically falling edge) of S
.
Note that this whole phase is optional, since there may be no master node at all.
Further note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S
.
However, such a module might run out of sync which again may cause errors during operation.
Shutdown Phase¶
Since the PD
signal must not be used during system operation, it is defined to be inactive.
The state of S
is undefined, because is was used for synchronization during operation.
Any module can initiate the shutdown phase by activation of PD
.
All modules (including the initiating one) must then execute the following steps as soon as the activation of PD
is detected:
- shutdown of high-level operation
- initiation of module shutdown
As soon as the activation ofPD
is detected, each module activatesS
.
The module, which initiated system shutdown by activatingPD
has to activateS
as well, of course.
Obviously, the module which acted as master node during operation must stop togglingS
as soon asPD
is activated. - shutdown of high-level operation (e.g. the operating system)
Each module stops all computation in a safe manner, so it can be shut down without data loss or other issues.
As soon as this is done, it deactivatesS
.
In order to ensure that every module had a chance to detect the activation ofPD
, this step must take at least one periodT
. - waiting for synchronization
Each module waits forS
to become inactive (all modules are done).
- initiation of module shutdown
- system shutdown or restart
- evaluation of
PD
signal
WhenS
becomes inactive, the state ofPD
indicates whether the system shall shut down or restart.
Hence, the initiating module, which activatedPD
, must have set it to the according state before it deactivatedS
.
The implication of thePD
state at this point is defined as follows:- active: A system shutdown is requested.
- inactive: A system restart is requested.
- disambiguation procedure
Since there may be not one, but multiple ways to shutdown/restart the system, this ambiguity is resolved in the following procedure.
The requirement for this to work is that the identifiers, which encode the exact shutdown/restart procedure to be executed (see below), must be non-ambiguous.
These identifiers, however, dependent on the platform and implementation and hence are not defined by SSSP.- serial broadcast of identifier
The module which initiated the shutdown/restart phase broadcasts an arbitrary number of 'pulses' viaS
.
Each 'pulse' is defined to start withS
deactivated, activates it for at least one periodT
, and deactivates the signal again for at least one more periodT
.
All modules can count the number of pulses, which encodes the exact shutdown/restart procedure to be used.
Note thatS
must be inactive for at least one periodT
before the first pulse (afterPD
was evaluated). - termination of the serial broadcast
The broadcast is terminated by a timeout of ten periodsT
since the last change ofS
from active to inactive state.
This timeout also applies if no pulse was sent at all, which corresponds to the identifier 0.
Thus, this identifier is reserved for the special case, that the ambiguity is not resolved and all modules shall execute their default shutdown procedure.
- serial broadcast of identifier
- final shutdown or restart
Depending on the evaluation ofPD
and the result of the disambiguation procedure, each module reacts accordingly.- shutdown
Each module completely stops itself and enters low-power mode.
The details (e.g. which signals and sensors are still active) depend on the result of the disambiguation procedure. - restart
If a restart was requested, each module starts with the first step of the startup phase.
The details (e.g. which sensors are kept active) depend on the result of the disambiguation procedure.
In order to minimize risk of errors, all modules can power off, except for a master node, which resets the whole system and forces a clean startup.
- shutdown
- evaluation of
Again, a module which does not implement the protocol will cause no errors as long as it does not activate S
or PD
.
However, if such a module has its own power supply and does not enter low-power mode, it will unnecessarily draw energy and might not end up in a defined state as the rest of the system.
Most importantly, the latter might result in corruption of system operation if the not-defined state of modules that do not implement SSSP causes unwanted side effects like stalled communication buses or duplicate IDs.