Overview of synchronization using Raft Consensus
This article aims to provide a conceptual explanation of how data is saved and synchronized between nodes(servers) on Raft. Please refer to the previous article(Consensus for AERGO Private Environment) for a general overview of Raft Consensus.
The Raft Consensus considers each node as a state machine and provides a consensus algorithm that synchronizes these nodes so that the nodes maintain identical states. For this purpose, Raft Consensus records the changes in the state machine as logs and takes control to ensure that the entire nodes apply the same log in the same orders.
Each node of AERGO is equivalent to a state machine, and the block saved at each height can be regarded as a state. To operate in such form, blocks are included as data in the Raft log.
For Raft to operate, it requires an assumption.
Each node is honest, meaning that it does not lie.
The information contained in the messages sent between nodes must be accurate at all times. In other words, Raft is not Byzantine fault tolerant. For example, there should be no occasion in which a non-leader node sends messages as if it is the leader node. Thus, Raft is a consensus algorithm that is appropriate for private chains with ensured security that keeps the risk of malicious nodes being included in the cluster low.
Data saving in Raft Consensus
The data saved for Raft Consensus can be largely divided into Raft log and block-related data. These data are broadcasted through Raft protocol and saved in repositories of each server. Blocks are saved in the form of Raft log data. When these blocks are added to the blockchain, after having gone through the commit process, information such as related account information and chain linkage information is saved.
Fig 1. Illustration of logs saved in Raft
Raft log entry has the following features.
A term is a sequence number that is given to a new leader, and it increases by 1 in every new leader election. Thus, it serves as an ID for each leader. An index is a sequence number that increases by 1 every time a new log is saved. Since a leader generates only one log for a particular index in the respective term, logs can be uniquely distinguished by referring to their terms and indexes.
Data refers to the information shared through Raft, and in AERGO the subjects are blocks.
Since a term maintains to have the same value until the leader is changed, the periodic lengths of generating logs under the same term may vary between different terms.
Fig 2. An example of a log status in the term-changing situation
A critical feature in Raft log is that a log committed by Raft protocol can never be deleted or modified. In other words, the committed log cannot be lost or replaced by another log, even in situations where the leader changes or the system fails. This characteristic ensures immediate finality once a block is added to the chain.
Synchronization between nodes using Raft protocol
Let us now examine how Raft protocol synchronizes the logs between nodes.
The protocol that Raft uses is defined in a simple way so that it does not have to maintain complex contexts.
The fundamental protocols are comprised of two types of messages that are sent from the leader to the followers.
AppendEntry message is used by the leader to search a timepoint for synchronization when the nodes are not synchronized after the leader had broadcasted a log to the followers. As the blocks are included in the log, saving the log also saves the blocks. However, although the blocks are saved in the log, the blocks remain unconnected to the blockchain.
Heartbeat message checks liveness between the leader and the followers and is sent regularly from the leader to follower to broadcast commit information. After the leader receives commit information, it executes the blocks saved in the committed log and adds them to the blockchain.
Now let us review the synchronization process using AppendEntry
There are two essential things to note.
– For the follower to apply the new log sent from the leader, the follower should have completed applying the last log.
– If the follower had not completed applying the last log, it should send the currently saved log status as a hint to the leader for the leader to send an appropriate log.
Then, let us see how the synchronization point is found.
1. The leader sends the newly generated log as an AppendEntry message. When sending the AppendEntry message, it also sends information on the new log and meta information(term/index) of a previous entry(the last log before the new log entry).
2. A follower checks whether the meta-information in the previous entry matches the log that was last saved in the log repository.
3. If there is a match, the follower saves the new log entry and sends a success response through the AppendEntryResponse message.
Fig 3. An example of a successful AppendEntry by a new log
4. If it does not match, the follower sends a failure response through the AppendEntryResponse messages and sends the meta information of the log that was last saved as a hint. In the hint, the follower sends a previous index(Reject) and the last saved index(Last) for the leader to send an appropriate log when sending the next request.
5. In case the follower sends a failure response, the leader reviews the last log information from the follower and sends an appropriate log as a new AppendEntry.
Fig 4. An example of a failed AppendEntry by a new log
In the example in Fig 4, the previous entry (2/6) log is not saved in the follower. Therefore, the follower determines it as a match failure and returns an error. At this point, the follower sends information on the index(6) and its last log(3) that caused the match failure for the next request. If the leader sends log entry (2/4) in the next AppendEntry using the hint, the next request will be successful.
By repeating the above processes, the leader finds the log point that was last synchronized with the followers, and sequentially performs synchronization after the relevant log.
Processes after a change in leadership
When a leader is changed, the logs generated by the previous leader and those generated by the new leader may collide. Such cases can occur when some nodes do not receive the last log of the previous leader due to system failures such as network disconnection.
Fig 5. In the example of Fig 5, there is a log replacement due to a change in leadership. Although the log (1/6) that the previous leader(term 1) is included in the follower node, when the follower node receives a new log(2/6) it replaces the previous log with the new log and saves the latter.
Raft Consensus is a simple protocol that repeats the process of sending AppendEntry messages and receiving responses to perform synchronization between nodes. The simple and easily understandable nature of Raft enables the smooth development of a stable implementation.
By facilitating an understanding of Raft, we hope to promote understanding of the concepts of synchronization in a distributed environment and of other consensus protocols.
If you want to know more detailed information on Raft Consensus for AERGO, please refer to our other article, Raft: Consensus for AERGO Private Environment.
The articles below will help you get an easier and also in-depth understanding of Raft.
The following site provides an animation of Raft synchronization, which will lead you to a visual illustration of the synchronization process.
Also, refer to the slides created by Diego Ongaro and John Ousterhout for detailed information about Raft.