Startup Shutdown Synchronization Protocol (SSSP v2.0)¶
SSSP defines signal handling for modular systems during the startup phase until all modules are fully initialized, during operation phase to synchronize all modules, and during the shutdown phase so that the system turns off in a controlled and safe manner or restarts.
The complexity of the protocol is quite low and designed in a way that modules which do not implement SSSP will not compromise system operation.
Hence, only two GPIO signals are required:
S
- synchronizePD
- power down
Both must be designed in a way that they realize a logical OR on activation (one or more nodes are active) and a logical AND on deactivation (all nodes are inactive) respectively.
This can easily be implemented using active-low, open-drain signals with pull-up resistors (wired-AND) and define a low signal as active state.
A wired-OR topology and active-high signals are equally fine for any specific implementation, of course.
Although these two mandatory signals suffice to implement the protocol, some optional features require further signals and communication interfaces:
N
,P
- next and previous (GPIOs)B
- a more sophisticated, preferably real-time capable communication busC
- a dedicated clock signal for system synchronization during the Operation Phase.
N
and P
must realize a circular daisy-chain, so that each module can communicate with the next module in the system topology and the according back channel (an additional wire from connector to connector) just connects through.
Hence, the output signal N
of each module is connected to the input P
of its successor and two additional pins are required for the connectors (four in total; two per connector).
In case an architecture implements this daisy-chain but a module does not feature any logic to evaluate P
or set N
, it must keep these signals unconnected (breaking the daisy-chain) rather than to connect P
through to N
.
Due to the circularity of this signal and for clock synchronization during Operation Phase there must be a single module acting as master node, while all others are slaves.
Although B
in general may be any bus signal, examples for recommended interfaces are CAN and FlexRay.
Some stages of SSSP are optional and may or may not be implemented by a module.
Note that a heterogeneous setup with some modules supporting the optional stages and others which do not, is fully compatible.
However, these optional features will only apply successfully if all modules support them.
System operation must thus not rely on the additional information (see Startup Phase), but may take advantage of it if available.
In order to make the protocol adaptable to any system, it defines three parameters:
D
- delay timeT
- timeout periodF
- synchronization frequency
D
defines a time period, which is used by the protocol for synchronization barriers, while T
defines the maximum delay before timeouts are detected.
These two parameters must be identical for all nodes within a system and T
, obviously, must be greater than D
.F
defines the frequency at which S
is toggled during the Operation Phase to synchronize all modules in a system.
Recommended values for these parmeters are D
= 1ms, T
= 10ms, and F
= 1Hz.
In some cases it might be necessary to define D
and/or T
differently for specific stages.
This can be achieved by defining further parameters like T
startup_3_1, which would supersede the default in startup stage 3.1, while other stages still use T
.
As a result, the basic parameters D
and T
may only be unspecified if there are custom parameters for all stages of the protocol.
Furthermore, defining a timeout parameter to be infinite is completely valid and will deactivate the timeout functionality.
Disclaimer: This is a draft version. All information and specifications stated below must be assumed to be modified in the future. If you want to use this version of SSSP, please contact a project manager. We will gladly freeze this version and move any further modifications to a new version (e.g. 2.1). You are also invited to propose any modifications to this version.
Startup Phase¶
All modules must initialize the signals in a way, that S
is active and PD
, N
, and P
are inactive.PD
must stay inactive during the startup phase, or the shutdown phase will be initiated either immediately by the bootloader or the operating system as soon as it is active.
Each module executes the following steps:
- basic initialization
- initialization of required signals and voltages
This first stage primarily effects modules that provide energy.
These must deactivateS
only when the power is up and stable.
All other modules may setS
inactive as soon as they are powered up.
In order to prevent erroneous behavior due to incorrect signals during the initialization, this stage takes at least one periodD
(at least one module must delay deactivation accordingly). - waiting for synchronization
Each module waits forS
to become inactive (all modules are initialized) as a first synchronization barrier. - synchronous start of stage 2
As soon asS
is inactive, the master node activates it again in order to start the next stage.
To ensure that each module had enough time to detect the inactive state ofS
, the master node must delay the activation by at least one periodD
.
- initialization of required signals and voltages
- operating system initialization
- complete system startup
Each module activatesS
again and fully initializes (e.g. starts the operating system, initializes local hardware, etc.).
As soon as it is ready, it deactivatesS
again.
When a module indicates to be ready, at least the main communication channel must be fully operational.
While it will usually act asB
, in cases where these are two distinct interfaces,B
must be fully operational as well at this point.
If there is no such communication busB
at all, this requirement does not apply, of course.
Again,S
must be active for at least one periodD
, so every module can detect the activation. - waiting for synchronization
Each module waits forS
to become inactive (all modules are ready).
Only now it is safe to use the main communication channel (andB
) and all modules are able to receive messages correctly.
- complete system startup
- assigning module identifiers [optional]
This stage is optional and only applies ifB
exists.
Furthermore, it will only be successful if all modules fully implementN
andP
and there are no exceptional cases to these signals as described above.
The 'module IDs' assigned in this stage can later be used to represent a hierarchy within the system or to address/identify individual modules.- initiation of this stage
The master node initiates this stage by broadcasting a unique command viaB
to all modules, so they can interpret the upcoming communication viaB
,N
andP
correctly.
All supporting modules must wait at least one periodT
for the master's message before skipping this stage (similar to abortion; see below).
As soon as the initiation command was received, all modules activateS
for later detection of failure and set a timer to one periodT
in order to detect timeouts (which would lead to abortion of this module stack initialization). - starting the sequence
The master module broadcasts its own module identifier (e.g. 1) viaB
.
Right after that, it signals the next module to continue by settingN
active for at least one periodD
, but keepsS
activated for now.
Note that an identifier value of 0 is reserved and must not be used by any module. - iterating over all modules
This step is subdivided into two actions, which are triggered on different events and are repeated until one of the termination conditions is fulfilled (see below).
All modules have to execute this stage.- message received via
B
If a message that holds an ID of another module is received viaB
, the timer as mentioned above is reset toT
.
Moreover, the received module identifier is checked, whether it is greater than the one before.
If this rule is violated, an abort message is broadcasted viaB
and the stage is aborted. - triggered by
P
When a module is triggered by the activation ofP
(the preceding module activatedN
), it broadcasts its own module identifier viaB
, which is defined to be greater than the last one.
Then again, it deactivatesS
and triggers the next module to continue by activatingN
for again at least one periodD
.
If the module is triggered a second time during this stage, indicating an invalid loop in the system architecture, this module must abort this stage.
- message received via
- termination of this stage
There are two ways this stage can be terminated: either it is completed correctly, or it is aborted.
While any module can abort this stage, only the master (the initiator) can complete it successfully.- completion
The stage is completed successfully if the signal is propagated all the way through the circular daisy-chain and the master module receives an activation of itsP
signal andS
becomes is inactive as soon as the master deactivates it (all modules have participated in the procedure).
All modules need to wait one more periodT
after the deactivation ofS
to make sure no timeouts occurred and no abort message was emitted.
In this case, all nodes adopt their ID and can use it for later identification.
If an abort message was received at any time during this stage, however, the whole procedure is aborted (see below). - abortion
The stage is aborted, whenever an abort message was received or invalid behavior has been detected (see above).
As a result, a unique abort message is broadcasted voaB
by all modules that detected an issue.
In this case, all module IDs must be considered unreliable, thus identification is not supported.
Any modules that still activateS
must hence deactivate it and as soon asS
becomes inactive, all modules may continue operation.
- completion
- rearranging module identifiers [optional]
This sub-stage is an again optional extension to the already optional module identifier assignment stage.
As soon asS
became inactive after a successful completion of the assignment procedure, all modules can request ID swaps of any two modules by sending according messages viaB
.
The addressed modules both have to confirm (or reject) the request and may adapt the new IDs only when the second confirmation was received.
On any message, the timeout intervalT
is restarted.
Furthermore, there must be no parallel swap requests, hence whenever a request has been sent, no module must send another request until both addressed modules answered the request.
If any module in the system reads an invalid communication (e.g. a different module confirms than was requested or a parallel request was detected), or detects a timeout, it must send an abort message viaB
, invalidating all module IDs in the system.
In any case, this stage has to be implemented with care, since it may result in an invalidation of all already assigned module IDs and ending up in a livelock is possible.
However, with the default assignment procedure, the hierarchy described by the module IDs will be defined by physical properties like arrangement and wiring (e.g. depth-first or breadth-first hierarchy for tree-like architectures).
In case a different hierarchy is desired, the swapping mechanic allows to do so.
In order to reduce risk of errors, it is recommended that modules monitor the back channel of the daisy-chain signal and - if applicable - only superior modules send swap requests to inferior ones.
Anyway, each swap request procedure is defined as follows:- sending a request
A module sends a request message (viaB
), specifying the two modules to swap their IDs by naming their current IDs.
If the module itself is one of those, it has to confirm the request nevertheless as described in the next step. - confirmation/rejection
When a module receives a swap request that contains its own ID, it has to confirm or reject the request.
Even if the other module (the one to swap the ID with) already rejected, this module has to send an according message nevertheless. - swapping IDs
The IDs are swapped only when the second module confirmed the request.
In other words, both modules reassign their IDs right after the second confirmation was transmitted viaB
.
- sending a request
- initiation of this stage
At the end of the startup phase all signals - S
, PD
, N
, and P
- are inactive.
Note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S
, N
or P
.
However, such a module might cause errors after the startup phase, if it does not receive crucial information because communication is not set up (e.g. stage 3 might fail).
Operation Phase¶
Modules are kept in sync during operation by toggling either S
or a dedicated clock signal C
(defined by the implementation) at frequency F
.
Hence, all modules must act as slaves and there may only be one (or none) master node.
Since S
gets activated when a shutdown is initialized (see Shutdown Phase), modules must only synchronize at deactivation (falling edge) of S
.
Since it is recommended to use a hardware timer to toggle the synchronization signal, a dedicated clock signal C
might reduce complexity, but using S
for that purpose in general is also possible without compromises.
Note that this whole phase is optional, since there may be no master node at all.
Further note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S
or C
respectively.
However, such a module might run out of sync which again may cause errors during operation.
Shutdown Phase¶
Since the PD
signal must not be used during system operation, it is defined to be inactive.
The state of S
is undefined, because it may be used for synchronization during operation.
Any module can initiate the shutdown phase by activation of PD
.
Via S
a regular system shutdown (active) or an emergency stop (inactive) can be selected.
Hence, modules which do not support the protocol but interfere with PD are assumed to be defective and thus will initiate an emergency shutdown rather than a regular one.
All modules (including the initiating one) must then execute the following steps as soon as the activation of PD
is detected:
- selection of shutdown mode
Obviously, the module which acted as master node during operation must stop togglingS
as soon asPD
is activated and deactivate it.
Since the value ofS
is undefined until the master node of the operation phase reacts on thePD
signal, all modules must apply a delay of one periodD
before evaluatingS
in order to distinguish between regular or emergency shutdown.
If an emergency stop was requested, all modules must stop as fast as possible and enter a safe state (e.g. cut supply power).
The following stages thus only apply for the regular, controlled shutdown case. - shutdown of high-level operation
- shutdown of high-level systems (e.g. applications and operating system)
After another delay of one periodD
, all modules activateS
again.
Only now each module stops all computation in a safe manner, so it can be shut down without data loss or other issues.
In order to ensure that every module had a chance to detect the activation ofS
, this step must take at least one periodD
.
Each module deactivates S again, as soon as high-level shutdown is completed.
The initiating module can select between system shutdown or restart: KeepingPD
active indicates a shutdown request, deactivation of the signal beforeS
is deactivated indicates a restart request. - waiting for synchronization
Each module waits forS
to become inactive (all modules are done).
- shutdown of high-level systems (e.g. applications and operating system)
- system shutdown or restart
- evaluation of
PD
WhenS
becomes inactive, the state ofPD
indicates whether the system shall shutdown or restart.
Hence, the initiating module, which activatedPD
, must have set it to the according state before it deactivatedS
. - disambiguation procedure
Since there may be not one, but multiple ways to shutdown/restart the system, this ambiguity is resolved in the following procedure.
The requirement for this to work is that the identifiers, which encode the exact shutdown/restart procedure to be executed (see below), must be non-ambiguous.
These identifiers, however, are implementation specific and are not defined by SSSP.- serial broadcast of identifier
The module which initiated the shutdown/restart phase broadcasts an arbitrary number of 'pulses' viaS
.
Each pulse is defined to start withS
deactivated, activates it for at least one periodD
, and deactivates the signal again for at least another periodD
.
All modules can count the number of pulses, which encodes the exact shutdown/restart procedure to be used.
Note thatS
must be inactive for at least one periodD
before the first pulse (afterPD
was evaluated). - termination of the serial broadcast
The broadcast is terminated by a timeoutT
since the last change ofS
from active to inactive state.
This timeout also applies if no pulse was sent at all, which corresponds to the identifier 0.
Thus, this identifier is reserved for the special case, that the ambiguity is not resolved and all modules shall execute their default shutdown procedure.
- serial broadcast of identifier
- final shutdown or restart
Depending on the evaluation ofPD
and the result of the disambiguation procedure, each module reacts accordingly.- shutdown
Each module completely stops itself and enters low-power mode.
The details (e.g. which signals and sensors are still active) depend on the result of the disambiguation procedure and are implementation specific. - restart
If a restart was requested, each module starts with the first step of the startup phase.
The details (e.g. which sensors are kept active) depend on the result of the disambiguation procedure and are implementation specific.
In order to minimize risk of errors, all modules can power off, except for a master node, which resets the whole system and forces a clean startup.
- shutdown
- evaluation of
Again, a module which does not implement the protocol will cause no errors as long as it does not activate S
or PD
.
However, if such a module has its own power supply and does not enter low-power mode, it will unnecessarily draw energy and might not end up in a defined state as the rest of the system.
Most importantly, the latter might result in corruption of system operation if the not-defined state of modules that do not implement SSSP causes unwanted side effects like stalled communication buses or duplicate module IDs.