Startup Shutdown Synchronization Protocol (SSSP v1.4)¶
SSSP defines signal handling during the startup phase until all AMiRo Modules are fully initialized and during the shutdown phase, so that the system turns off in a controlled and safe manner or restarts, if requested.
The complexity of the protocol is quite low and designed in a way that modules which do not implement SSSP will not compromise system operation.
Hence, only two GPIO signals are required:
S
- synchronizePD
- power down
Both must be designed in a way, that they realize a logical OR on activation (one or more nodes are active) and a logical AND on deactivation (all nodes are inactive) respectively.
Electrically this can be implemented using active-low open-drain signals with pull-up resistors.
Although these two mandatory signals suffice to implement the protocol, some optional features require some further signals and communication interfaces:
UP/DN
- GPIO to the adjacent module (neighbor) up/downBCB
- a communication bus with broadcast capability
Note that a heterogeneous setup with some modules supporting the optional stages and others do not is fully compatible.
However, these optional features will only apply successfully if all modules support them.
Hence, the system must not rely on the additional information, but may take advantage of it, if it is available.
In order to make the protocol adaptable to any system, it uses a parameter T
.
This defines a time period, which is used by the protocol for synchronization or to detect timeouts.
However, this parameter must be identical for all nodes within a system (or at least similar since the factor between the largest and smallest parameter in the system must be smaller than ten).
An additional parameter F
defines the frequency at which S
is toggled during the Operation Phase.
Startup Phase¶
All modules must initialize the signals in a way, that S
is active and PD
is inactive.
Although only S
is used for startup, PD
must be inactive during the startup phase, or the shutdown phase will be initiated either immediately by the bootloader or by the operating system as soon as it is active.
Each module executes the following steps:
- basic initialization
- initialization of required signals, voltages, or other hardware
This first stage is very module specific and strongly depends on the hardware configuration.
When a module has finished this stage, it setsS
to inactive.
In order to prevent erroneous behavior due to incorrect signals during the initialization, this stage takes at least one periodT
. - waiting for synchronization
Each module waits forS
to become inactive (all modules are initialized) as a first synchronization. - synchronous start of stage 2
As soon asS
is inactive, the master node activates it again in order to start the next stage.
To ensure that each module had enough time to detect the inactive state ofS
, the master node must delay the activation by at least one periodT
.
- initialization of required signals, voltages, or other hardware
- operating system initialization
- complete system startup
Each module activatesS
again and fully initializes (e.g. starts the operating system).
As soon as it is ready, it deactivatesS
again.
When a module indicates to be ready, at least the main communication channel (for AMiRo this is CAN) must be fully operational.
Again,S
must be active for at least one periodT
, so every module can detect the activation. - waiting for synchronization
Each module waits forS
to become inactive (all modules are ready).
Only now it is safe to use the main communication channel and all modules are able to receive messages correctly.
- complete system startup
- module stack initialization [optional]
This stage is optional and can only be applied if all modules can read and write from/to the main communication channel (BCB
) and two additional signals (UP
andDN
), which connect neighboring modules, exist for each AM.
However, modules which do not support this stage will not cause severe issues, but this stage will fail nevertheless (no stack numbers / module IDs will be available).
Furthermore, the first and last node of this 'module stack' must be known beforehand.
In case of the AMiRo, for instance, the DiWheelDrive and PowerManagement are defined to be the lowermost modules, and the LightRing always finalizes the stack at the top.- initiation of this stage
The master node initiates this stage by broadcasting a unique command viaBCB
to all modules, so they can interpret the upcoming communication via the neighbor signals (UP
andDN
) andBCB
correctly.
All supporting modules must wait at least ten periodsT
for the master's message before skipping this stage (similar to abortion; see below).
As soon as the initiation command was received, all modules activate S for later detection of failure. - starting the sequence
One of the known nodes at the end of the module stack broadcasts its own stack number (e.g. 1) viaBCB
.
One periodT
after that, it signals its neighboring module to continue by setting the neighbor signal active for at least one periodT
and deactivatesS
right after.
Note that an identifier value of 0 is reserved and must not be used by any module.
These 'module IDs' can later be used to represent a hierarchy within the system or to address/identify individual modules. - counting the modules
This stage is subdivided into two actions, which are triggered on different events.
All modules have to execute this stage.- triggered by neighbor signal
When a module is triggered by the activation of a neighbor signal, it broadcasts its own stack number (viaBCB
), which is defined to be greater than the last one.
Then again, it waits one periodT
before deactivatingS
and triggers the next module to continue by activating the other neighbor signal for at least one periodT
.
Furthermore, a timer is set to ten periodsT
, which is used to detect timeouts.
In case this timer runs out before the nextBCB
message or neighbor event (viaUP
orDN
) is received, the module broadcasts an abort message to abort the stage (see below).
Another reason for abortion would be if the module is triggered a second time during this stage, indicating an invalid loop in the system architecture.
This step is repeated until one of the termination conditions is fulfilled (see below). - message received via
BCB
If a message that holds a stack number of another module is received viaBCB
, the timer as mentioned above is reset to ten periodsT
.
Moreover, the received ID is checked, whether it is greater than the one before.
If this rule is violated, an abort message is broadcasted viaBCB
and the stage is aborted.
- triggered by neighbor signal
- termination of this stage
There are two ways this stage can be terminated: either it is completed correctly, or it is aborted.
Whereas any module can abort this stage, only the known module finalizing the stack can complete it successfully.- completion
The stage is completed correctly if the signal is propagated to the known node on the other end of the module stack andS
becomes inactive as soon as that node deactivates it (all modules have participated in the procedure).
All modules need to wait ten more periodsT
after the deactivation of S to make sure no timeouts occurred and no abort message was emitted.
In this case, all nodes adopt their ID and can use it for later identification.
If an abort message was received at any time during this stage, however, the whole procedure is aborted (see below). - abortion
The stage is aborted, whenever an abort message was received or a timeout occurred (see above).
As a result, all stack numbers must be considered unreliable, thus identification is not supported.
Any modules that still activate S must hence deactivate it and as soon as S becomes inactive, all modules may continue operation.
- completion
- initiation of this stage
At the end of the startup phase both signals, S
and PD
, are inactive.
Note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S
.
However, such a module might cause errors after the startup phase, if it does not receive crucial information because communication is not set up (e.g. stage 3 might fail).
Operation Phase¶
All AMiRo Modules are kept in sync during operation by toggling S
at frequency F
.
Hence, all modules must act as slaves and there may only be one (or none) master node.
Since S
gets activated when a shutdown is initialized (see Shutdown Phase), modules must synchronize at deactivation (logically falling edge) of S
.
Note that this whole phase is optional, since there may be no master node at all.
Further note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S
.
However, such a module might run out of sync which again may cause errors during operation.
Shutdown Phase¶
Since the PD
signal must not be used during system operation, it is defined to be inactive.
The state of S
is undefined, because is was used for synchronization during operation.
Any module can initiate the shutdown phase by activation of PD
.
All modules (including the initiating one) must then execute the following steps as soon as the activation of PD
is detected:
- shutdown of high-level operation
- initiation of module shutdown
As soon as the activation ofPD
is detected, each module activatesS
.
The module, which initiated system shutdown by activatingPD
has to activateS
as well, of course.
Obviously, the module which acted as master node during operation must stop togglingS
as soon asPD
is activated. - shutdown of high-level operation (e.g. the operating system)
Each module stops all computation in a safe manner, so it can be shut down without data loss or other issues.
As soon as this is done, it deactivatesS
.
In order to ensure that every module had a chance to detect the activation ofPD
, this step must take at least one periodT
. - waiting for synchronization
Each module waits forS
to become inactive (all modules are done).
- initiation of module shutdown
- system shutdown or restart
- evaluation of
PD
signal
WhenS
becomes inactive, the state ofPD
indicates whether the system shall shut down or restart.
Hence, the initiating module, which activatedPD
, must have set it to the according state before it deactivatedS
.
The implication of thePD
state at this point is defined as follows:- active: A system shutdown is requested.
- inactive: A system restart is requested.
- disambiguation procedure
Since there may be not one, but multiple ways to shutdown/restart the system, this ambiguity is resolved in the following procedure.
The requirement for this to work is that the identifiers, which encode the exact shutdown/restart procedure to be executed (see below), must be non-ambiguous.
These identifiers, however, dependent on the platform and implementation and hence are not defined by SSSP.- serial broadcast of identifier
The module which initiated the shutdown/restart phase broadcasts an arbitrary number of 'pulses' viaS
.
Each 'pulse' is defined to start withS
deactivated, activates it for at least one periodT
, and deactivates the signal again for at least one more periodT
.
All modules can count the number of pulses, which encodes the exact shutdown/restart procedure to be used.
Note thatS
must be inactive for at least one periodT
before the first pulse (afterPD
was evaluated). - termination of the serial broadcast
The broadcast is terminated by a timeout of ten periodsT
since the last change ofS
from active to inactive state.
This timeout also applies if no pulse was sent at all, which corresponds to the identifier 0.
Thus, this identifier is reserved for the special case, that the ambiguity is not resolved and all modules shall execute their default shutdown procedure.
- serial broadcast of identifier
- final shutdown or restart
Depending on the evaluation ofPD
and the result of the disambiguation procedure, each module reacts accordingly.- shutdown
Each module completely stops itself and enters low-power mode.
The details (e.g. which signals and sensors are still active) depend on the result of the disambiguation procedure. - restart
If a restart was requested, each module starts with the first step of the startup phase.
The details (e.g. which sensors are kept active) depend on the result of the disambiguation procedure.
In order to minimize risk of errors, all modules can power off, except for a master node, which resets the whole system and forces a clean startup.
- shutdown
- evaluation of
Again, a module which does not implement the protocol will cause no errors as long as it does not activate S
or PD
.
However, if such a module has its own power supply and does not enter low-power mode, it will unnecessarily draw energy and might not end up in a defined state as the rest of the system.
Most importantly, the latter might result in corruption of system operation if the not-defined state of modules that do not implement SSSP causes unwanted side effects like stalled communication buses or duplicate IDs.