Startup Shutdown Synchronization Protocol (SSSP v1.3)

SSSP defines signal handling during the startup phase until all AMiRo Modules are fully initialized and during the shutdown phase, so that the system turns off in a controlled and safe manner or restarts, if requested.
The complexity of the protocol is quite low and designed in a way that modules which do not implement SSSP will not compromise system operation.
Hence, only two GPIO signals are required:

  • S - synchronize
  • PD - power down

Both must be designed in a way, that they realize a logical OR on activation (one or more nodes are active) and a logical AND on deactivation (all nodes are inactive) respectively.
Electrically this can be implemented using active-low open-drain signals with pull-up resistors.

Although these two mandatory signals suffice to implement the protocol, some optional features require some further signals and communication interfaces:

  • UP/DN - GPIO to the adjacent module (neighbor) up/down
  • BCB - a communication bus with broadcast capability

Note that a heterogeneous setup with some modules supporting the optional stages and others do not is fully compatible.
However, these optional features will only apply successfully if all modules support them.
Hence, the system must not rely on the additional information, but may take advantage of it, if it is available.

In order to make the protocol adaptable to any system, it uses a parameter T.
This defines a time period, which is used by the protocol for synchronization or to detect timeouts.
However, this parameter must be identical for all nodes within a system (or at least similar since the factor between the largest and smallest parameter in the system must be smaller than ten).

An additional parameter F defines the frequency at which S is toggled during the Operation Phase.


Startup Phase

All modules must initialize the signals in a way, that S is active and PD is inactive.
Although only S is used for startup, PD must be inactive during the startup phase, or the shutdown phase will be initiated either immediately by the bootloader or by the operating system as soon as it is active.

Each module executes the following steps:

  1. basic initialization
    1. initialization of required signals, voltages, or other hardware
      This first stage is very module specific and strongly depends on the hardware configuration.
      When a module has finished this stage, it sets S to inactive.
      In order to prevent erroneous behavior due to incorrect signals during the initialization, this stage takes at least one period T.
    2. waiting for synchronization
      Each module waits for S to become inactive (all modules are initialized) as a first synchronization.
    3. synchronous start of stage 2
      As soon as S is inactive, the master node activates it again in order to start the next stage.
      To ensure that each module had enough time to detect the inactive state of S, the master node must delay the activation by at least one period T.
  2. operating system initialization
    1. complete system startup
      Each module activates S again and fully initializes (e.g. starts the operating system).
      As soon as it is ready, it deactivates S again.
      When a module indicates to be ready, at least the main communication channel (for AMiRo this is CAN) must be fully operational.
      Again, S must be active for at least one period T, so every module can detect the activation.
    2. waiting for synchronization
      Each module waits for S to become inactive (all modules are ready).
      Only now it is safe to use the main communication channel and all modules are able to receive messages correctly.
  3. module stack initialization [optional]
    This stage is optional and can only be applied if all modules can read and write from/to the main communication channel (BCB) and two additional signals (UP and DN), which connect neighboring modules, exist for each AM.
    Furthermore, the first and last node of this 'module stack' must be known beforehand.
    In case of the AMiRo, for instance, the DiWheelDrive and PowerManagement are defined to be the lowermost modules, and the LightRing always finalizes the stack at the top.
    1. initiation of this stage
      The master node initiates this stage by broadcasting an unique command via BCB to all modules, so they can interpret the upcoming communication via the neighbor signals (UP and DN) correctly.
    2. starting the sequence
      One of the known nodes at the end of the module stack broadcasts its own stack number (e.g. 1) via BCB.
      Right after that, it signals its neighboring module to continue by setting the neighbor signal active for at least one period T.
      Note that the smallest and largest numbers (0 and 255 for 8-bit addressing) are reserved and must not be used by any module.
      These 'stack IDs' can later be used to represent a hierarchy within the system.
    3. counting the modules
      When a module is triggered by the activation of the neighbor signal, it broadcasts its own stack number, which is defined to be greater than the last one.
      Then again, it triggers the next module in the stack to continue via the other neighbor signal.
      This step is repeated until the one of the following terminating conditions is fulfilled.
    4. termination of this stage
      There are two ways this stage can be terminated: either it is completed correctly, or it is aborted because of a timeout.
      • completion
        The stage is completed correctly if the signal is propagated to the known node on the other end of the module stack.
        1. broadcast of final ID
          The final (known) module broadcasts its own stack number (which is computed as the others before) plus the information, that the module stack initialization is done.
        2. successful termination of the stage
          The master node broadcasts a messages, which indicates the successful termination of the stage.
          In this case, all nodes adopt their ID and can use it for later identification.
      • abortion
        The stage is aborted, if more than ten periods T have passed since the last ID was broadcasted.
        Such a timeout can only occur if a module does not support the propagation process, or because of hardware issues.
        When the master node detects such a delay, it broadcasts a message, which indicates the unsuccessful termination of the stage.
        As result, all stack numbers must be considered as unreliable, thus identification is not supported.

At the end of the startup phase (more precisely after stage 2) both signals, S and PD, are inactive.
Note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S.
However, such a module might cause errors after the startup phase, if it does not receive crucial information because communication is not set up (e.g. stage 3 might fail).


Operation Phase

All AMiRo Modules are kept in sync during operation by toggling S at frequency F.
Hence, all modules must act as slaves and there may only be one (or none) master node.
Since S gets activated when a shutdown is initialized (see Shutdown Phase), modules must synchronize at deactivation (logically falling edge) of S.

Note that this whole phase is optional, since there may be no master node at all.
Further note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S.
However, such a module might run out of sync which again may cause errors during operation.


Shutdown Phase

Since the PD signal must not be used during system operation, it is defined to be inactive.
The state of S is undefined, because is was used for synchronization during operation.
Any module can initiate the shutdown phase by activation of PD.
All modules (including the initiating one) must then execute the following steps as soon as the activation of PD is detected:

  1. shutdown of high-level operation
    1. initiation of module shutdown
      As soon as the activation of PD is detected, each module activates S.
      The module, which initiated system shutdown by activating PD has to activate S as well, of course.
      Obviously, the module which acted as master node during operation must stop toggling S as soon as PD is activated.
    2. shutdown of high-level operation (e.g. the operating system)
      Each module stops all computation in a safe manner, so it can be shut down without data loss or other issues.
      As soon as this is done, it deactivates S.
      In order to ensure that every module had a chance to detect the activation of PD, this step must take at least one period T.
    3. waiting for synchronization
      Each module waits for S to become inactive (all modules are done).
  2. system shutdown or restart
    1. evaluation of PD signal
      When S becomes inactive, the state of PD indicates whether the system shall shut down or restart.
      Hence, the initiating module, which activated PD, must have set it to the according state before it deactivated S.
      The implication of the PD state at this point is defined as follows:
      • active: A system shutdown is requested.
      • inactive: A system restart is requested.
    2. disambiguation procedure
      Since there may be not one, but multiple ways to shutdown/restart the system, this ambiguity is resolved in the following procedure.
      The requirement for this to work is that the identifiers, which encode the exact shutdown/restart procedure to be executed (see below), must be non-ambiguous.
      These identifiers, however, dependent on the platform and implementation and hence are not defined by SSSP.
      1. serial broadcast of identifier
        The module which initiated the shutdown/restart phase broadcasts an arbitrary number of 'pulses' via S.
        Each 'pulse' is defined to start with S deactivated, activates it for at least one period T, and deactivates the signal again for at least one more period T.
        All modules can count the number of pulses, which encodes the exact shutdown/restart procedure to be used.
        Note that S must be inactive for at least one period T before the first pulse (after PD was evaluated).
      2. termination of the serial broadcast
        The broadcast is terminated by a timeout of ten periods T since the last change of S from active to inactive state.
        This timeout also applies if no pulse was sent at all, which corresponds to the identifier 0.
        Thus, this identifier is reserved for the special case, that the ambiguity is not resolved and all modules shall execute their default shutdown procedure.
    3. final shutdown or restart
      Depending on the evaluation of PD and the result of the disambiguation procedure, each module reacts accordingly.
      • shutdown
        Each module completely stops itself and enters low-power mode.
        The details (e.g. which signals and sensors are still active) depend on the result of the disambiguation procedure.
      • restart
        If a restart was requested, each module starts with the first step of the startup phase.
        The details (e.g. which sensors are kept active) depend on the result of the disambiguation procedure.
        In order to minimize risk of errors, all modules can power off, except for a master node, which resets the whole system and forces a clean startup.

Again, a module which does not implement the protocol will cause no errors as long as it does not activate S or PD.
However, if such a module has its own power supply and does not enter low-power mode, it will unnecessarily draw energy and might not end up in a defined state as the rest of the system.
Most importantly, the latter might result in corruption of system operation if the not-defined state of modules that do not implement SSSP causes unwanted side effects like stalled communication buses or duplicate IDs.