Anomaly Detection for Industrial Internet of Things Devices Based on Self-Adaptive Blockchain Sharding and Federated Learning |
Algorithm 1 | Birch-Kmeans based on Federated Learning |
Input: | Input data:$$\begin{equation} D_N \end{equation}$$; Balancing factor:$$\begin{equation} B, L \end{equation}$$, L; Threshold:$$\begin{equation} \mathcal{T} \end{equation}$$ |
1: | Create $$\begin{equation} CF \end{equation}$$ tree, traverse the dataset adding nodes one by one to get $$\begin{equation} K=L \end{equation}$$ leaf nodes: $$\begin{equation} \left\{C F_{1 \text { Node }}, C F_{2 \text { Node }}, \cdots, C F_{K \text { Node }}\right\}, \forall C F_{i \text { Node }} \in D_N \end{equation}$$; |
2: | Initialize the initial $$\begin{equation} K \end{equation}$$ centroids of kmeans: $$\begin{equation} \left\{C F_{1 \text { Node }}^{\text {init }}, \cdots, C F_{K \text { Node }}^{\text {init }}\right\}=\left\{C F_{1 \text { Node }}, \cdots, C F_{K \text { Node }}\right\} \end{equation}$$ |
3: | procedure |
4: | while $$\begin{equation} \text { iter }<E \end{equation}$$ do |
5: | for $$\begin{equation} D_i \end{equation}$$ in $$\begin{equation} D_N \end{equation}$$ do |
6: | Calculate the distance $$\begin{equation} d \end{equation}$$ to the $$\begin{equation} K \end{equation}$$ centroids as in equation(1); |
7: | Assign $$\begin{equation} D_i \end{equation}$$ to the cluster corresponding to the nearest centroids; |
8: | end for |
9: | for each cluster $$\begin{equation} k \end{equation}$$ in $$\begin{equation} \mathcal{K} \end{equation}$$ do |
10: | he new clustering centroids are calculated as in equation(2):$$\begin{equation} \left\{C F_{1 N o d e}^{n e w}, \cdots, C F_{K N o d e}^{n e w}\right\} \end{equation}$$ |
11: | if No change in the center of clustering then |
12: | Output the grouping results $$\begin{equation} \mathcal{K} \end{equation}$$ |
13: | else |
14: | $$\begin{equation} \begin{aligned} & \left\{C F_{1 \text { Node }}, \cdots, C F_{\text {KNode }}\right\} \leftarrow \\ & \left\{C F_{1 N o d e}^{\text {new }}, \cdots, C F_{\text {KNowe }}^{\text {new }}\right\} \end{aligned} \end{equation}$$ |
15: | end if |
16: | end for |
17: | end while |
18: | return $$\begin{equation} \left\{C F_{1 N o d e}^{n e w}, \cdots, C F_{K \text { Node }}^{n e w}\right\} \end{equation}$$ |
19: | end procedure |
and $$\begin{equation} CF_i \end{equation}$$ denoting the sub-cluster that is the location of the child node. Each leaf node $$\begin{equation} C F_{i \text { iNode }} \end{equation}$$, where i$$\begin{equation} i=1,2, \cdots, L \end{equation}$$, and $$\begin{equation} L \end{equation}$$ represents the maximum number of leaf nodes in the $$\begin{equation} CF \end{equation}$$ tree, serves as the initial clustering center for the subsequent K-means.
Euclidean distance formula between data objects in space:
Where $$\begin{equation} x \end{equation}$$ is the data, $$\begin{equation} C F_{i \text { iNode }} \end{equation}$$ is the $$$$th clustering center, $$$$ is the dimension of the data, and $$\begin{equation} x_j \end{equation}$$, $$\begin{equation} C F_{i j} \end{equation}$$ is the $$\begin{equation} j \end{equation}$$th attribute value of $$\begin{equation} x \end{equation}$$ and $$\begin{equation} C F_{i \text { iNode }} \end{equation}$$.
The following is the update within the clusters, the point within the cluster with the smallest average distance from all nodes is chosen as the center of the cluster
$$\begin{equation} r, s \end{equation}$$ are all data objects inside the $$\begin{equation} i \end{equation}$$th cluster $$\begin{equation} K_i \end{equation}$$, and $$\begin{equation} \delta_i \end{equation}$$ is the federated learning impact factor of client $$\begin{equation} i \end{equation}$$, which determines the impact of FL accuracy for space distance. $$\begin{equation} r_k \end{equation}$$ is the model correlation coefficient of the last round of shards where client $$\begin{equation} i \end{equation}$$ is located, and $$\begin{equation} \frac{n_i}{n} \end{equation}$$ is the ratio of the samples that client $$\begin{equation} i \end{equation}$$ participated in the training to the total samples in this round of FL training, and scaling factors $$\begin{equation} \beta_i \end{equation}$$ and $$\begin{equation} S(x)=\frac{1}{1+e^{-x}} \end{equation}$$ are used to adjust for proportionality effects. $$\begin{equation} \delta_i \end{equation}$$ takes the value (0–1). Highly correlated models in homogeneous datasets might show less variation in $$\begin{equation} r_k \end{equation}$$, whereas heterogeneous datasets will likely show significant differences. Ensuring accurate calculation of$$\begin{equation} r_k \end{equation}$$ is crucial for maintaining the balance between collaboration and individual model performance. The scaling function’s sensitivity is influenced by $$\begin{equation} \beta_i \end{equation}$$, which needs to be tuned according to the dataset size and distribution. For larger datasets, a smaller $$\begin{equation} \beta_i \end{equation}$$ might suffice, while smaller datasets might require a larger $$\begin{equation} \beta_i \end{equation}$$ to ensure meaningful scaling.
Evaluating the quality of clustering results is an important task, for this reason, both external and internal indicators are used in this paper. NMI, AMI, ARI are used to compare the similarity or difference between the clustering results and the real labels [25], with the help of silhouette coefficient to assess the grouping quality of birch-Kmeans algorithms based on the characteristics and statistical information of the clustering results themselves [26].
The silhouette coefficient is an evaluation index of the degree of density and dispersion of classes and is defined as follows:
The average distance between data point $$\begin{equation} i \end{equation}$$ and other points in the same cluster is $$\begin{equation} a(i) \end{equation}$$, and the minimum value of the average distance between each data point $$\begin{equation} i \end{equation}$$ and all points in different clusters is denoted as $$\begin{equation} b(i) \end{equation}$$. Where $$\begin{equation} n_c \end{equation}$$ denotes the number of data objects in cluster $$\begin{equation} c, d(i, j) \end{equation}$$ is the Euclidean distance between $$\begin{equation} i, j, a(i) \end{equation}$$ denotes the average distance between and all data objects in the same cluster as $$\begin{equation} i \end{equation}$$, and $$\begin{equation} b(i) \end{equation}$$ denotes the minimum average distance between and all data objects in different clusters as $$\begin{equation} i \end{equation}$$. $$\begin{equation} S_i \end{equation}$$ takes values in [-1,1], and the larger the value is, the more reasonable is the clustering of the data object into a cluster [27]. The data are initially processed by Birch, then the K-means algorithm is iterated $$\begin{equation} H \end{equation}$$ times, the distance matrix $$\begin{equation} \operatorname{dist}(j) \end{equation}$$ of each iteration is recorded and all the individual silhouette coefficients are averaged to obtain $$\begin{equation} S_{a v g} \end{equation}$$, which is put into the reward function in DRL to optimize the selection of the initial parameters of Birch.
To determine the optimal number of clusters $$\begin{equation} K \end{equation}$$ for different datasets when integrating Birch and K-means, we employ a combination of the Birch algorithm’s $$\begin{equation} CF \end{equation}$$ tree structure and evaluation metrics such as the Silhouette Coefficient. The initial cluster centers provided by Birch help improve the accuracy and efficiency of the K-means algorithm. By starting with a more accurate initial estimation, K-means can converge faster and yield better clustering results. The use of silhouette coefficient as an evaluation metric ensures that the clustering quality is continuously assessed and adjusted. This approach helps in identifying the optimal number of clusters that best represent the underlying data structure.
One of the key challenges with blockchain sharding is the increase in cross-shard transactions, which can introduce communication overhead and complexity. To mitigate this, our architecture uses a grouping mechanism that dynamically assigns IIoT devices to different shards based on network conditions, data characteristics, and load balancing require- ments. This grouping mechanism ensures that devices gen- erating related data or tasks are placed in the same shard, thereby reducing the need for cross-shard communication. By minimizing cross-shard transactions, we reduce the associated overhead, improve the efficiency of the system, and maintain a high level of performance even as the network scales.
All the blockchain verification nodes are assigned to dif- ferent shards and shard ID is assigned to each node in a decentralized manner. In this paper, the blockchain system consists of $$\begin{equation} k+1 \end{equation}$$ shards denoted as $$\begin{equation} \left\{B_0, B_1, \cdots, B_k\right\} \end{equation}$$, where $$\begin{equation} B_0 \end{equation}$$ is the joint committee and each shard consists of $$\begin{equation} \frac{M}{K+1} \end{equation}$$ nodes. Each shard of the network has complete blockchain functionality, which means that each shard independently pro- cesses the transactions in the transaction pool corresponding, and can independently generate a new data block and verify the integrity of the transactions in the block through intra-shard consensus. Blockchain network sharding is mainly divided into two steps: first, a certain number of nodes are elected to the joint committee, and then the network is sharded and the remaining nodes are assigned to different shards.
Joint committee members are elected every epoch, and at the end of the current term, each member can apply to be a candidate for the next epoch of committee members, and each applicant is required to pledge a portion of their assets for incentives and disincentives to prevent malicious nodes. During the election process, the collateralized values of all candidates are ranked and the top $$\begin{equation} \frac{3 M}{2(K+1)} \end{equation}$$ members are selected to form the nomination pool.Then these $$\begin{equation} \frac{3 M}{2(K+1)} \end{equation}$$ members are allowed to solve the PoW puzzle [28].
Where $$\begin{equation} \text { EpochRand } \end{equation}$$ denotes the random seed of PoW that is generated whenever reconstructing the shard. The public key $$\begin{equation} PK \end{equation}$$, $$\begin{equation} IP \end{equation}$$ and the random number Nonceare additionally used to compute the node $$\begin{equation} ID \end{equation}$$. $$\begin{equation} H \end{equation}$$ is the hash operator and $$\begin{equation} D \end{equation}$$ is the difficulty level of the PoW algorithm.
The first $$\begin{equation} \frac{M}{K+1} \end{equation}$$ nodes to complete the PoW puzzle will be elected to the joint committee and will be responsible for managing the allocation of subsequent nodes. After the election of the joint committee members is completed, the nodes that are not part of the joint committee will continue to complete the PoW puzzle and broadcast the computed correct $$\begin{equation} I D \end{equation}$$ value, $$\begin{equation} \text { Nonce } \end{equation}$$ value and the node’s own $$\begin{equation} I P \end{equation}$$ to the joint committee. When the joint committee receives $$\begin{equation} \text { Nonce } \end{equation}$$ and $$\begin{equation} I P \end{equation}$$ of the node, it will sort the list of nodes based on the $$\begin{equation} I D \end{equation}$$ value and assign nodes to the sharding network. For example, the preset number of shards is $$\begin{equation} K \end{equation}$$. After sorting the $$\begin{equation} I D \end{equation}$$ of each non joint committee, the first $$\begin{equation} \frac{M}{K+1} \end{equation}$$ is assigned to shard 1, followed by $$\begin{equation} \frac{M}{K+1} \end{equation}$$ to shard 2, and so on.
At the same time, in order to prevent malicious collusion among nodes from causing falsification of block data valida- tion, it is necessary to ensure the random distribution of nodes. Therefore, after the final consensus process of each epoch is completed, the above joint committee election and node network allocation process will be executed again to replace the old joint committee nodes with the newly elected joint committee nodes and form a new joint committee consisting of $$\begin{equation} \frac{M}{K+1} \end{equation}$$ nodes to guide the next round of the consensus protocol.
1) Intra-shard consensus: When the blockchain network sharding is completed, according to the number of shards, the data transactions uploaded by the IIoT devices will be divided into multiple data transaction pools, which are connected to multiple shard networks respectively. Each IIoT device registers on the blockchain and writes the FL-trained models, grouping configurations, and local training rounds via smart contracts. The transaction is then broadcasted to all IIoT devices in the blockchain. Each device is matched to the corresponding shard according to the assigned blockchain network address, which means that the transaction pool is divided into the number of corresponding blockchain shards $$\begin{equation} K \end{equation}$$. When a node in a shard receives the IIoT data in its corresponding transaction pool, the shard leader node will pack this transaction into the local block and run the PBFT consensus protocol.
2) Joint committee mechanism: The joint committee ver- ifies the model parameters obtained from each shard, and then the model parameters are compared with the global aggregation model parameters of the previous round to obtain a relevance score, and only when the relevance score is larger than a certain threshold can it be proved that this is a qualified update, which is merged with the local blocks uploaded by other shards based on the PBFT consensus. In addition to this, the joint committee generates an update block containing the training round $$\begin{equation} t_{f e d} \end{equation}$$, the global model parameters of the previous round, the global model parameters of the current round, the shard address and its corresponding score. Finally, the final block generated by combining all the blocks together is packed onto the blockchain, the committee will be elected again, the blockchain network is re-partitioned, and the next round of training starts.
Thus nodes within each shard store the local blocks of the shard as well as the final blocks that are eventually merged together, and local model parameters and global aggregated model parameters are stored on each shard subchain, so nodes have access to the latest aggregated model parameters for training at any time. If a malicious node participates in the training, the uploaded model parameters will not participate in the final aggregation due to the consensus mechanism of the joint committee to defend against model poisoning attacks.
3) Incentive mechanism: In order to incentivize committee members to act honestly, a reward/penalty mechanism is in- troduced in this paper. In the joint committee election process, committee member candidates must pledge some assets on the blockchain. Assuming that the value of the pledged assets is $$\begin{equation} c \end{equation}$$, the judgment is based on the ratio of the number of aggregation model parameters $$\begin{equation} K_B \end{equation}$$, to the number of shards $$\begin{equation} K \end{equation}$$, in the final consensus process.
$$\begin{equation} \lambda_a \end{equation}$$ is used to determine whether the committee member’s operation is invalid. If $$\begin{equation} \frac{K_B}{K} \end{equation}$$ is less than the threshold $$\begin{equation} \lambda_a \end{equation}$$, the representative’s operation will be judged invalid and all collateralized value will be forfeited. $$\begin{equation} \lambda_a \end{equation}$$ is determined based on historical data of malicious node behavior, ensuring ef- fective identification of inefficient or malicious operations. $$\begin{equation} \lambda_b \end{equation}$$ is used to determine whether the committee member’s operation is valid and eligible for rewards. If $$\begin{equation} \frac{K_B}{K} \end{equation}$$ is greater than the threshold $$\begin{equation} \lambda_a \end{equation}$$ and less than the threshold $$\begin{equation} \lambda_b \end{equation}$$, the operation of the representative will also be ruled invalid, but the collateralized value will not be forfeited. The reason for this is that the validation results may contain some errors that should not be attributed to the representative. If $$\begin{equation} \frac{K_B}{K} \end{equation}$$ is greater than the threshold $$\begin{equation} \lambda_b \end{equation}$$, the representative’s payoff is positively correlated with the mortgage value, the ratio of $$\begin{equation} \frac{K_B}{K} \end{equation}$$ , and the model accuracy $$\begin{equation} a_t \end{equation}$$. If the model accuracy drops significantly compared to the global model accuracy in the previous round, the value will be negative, i.e., the reward will change to a penalty. $$\begin{equation} \lambda_b \end{equation}$$ is also determined based on the distribution of efficient operations in historical data, ensuring that committee members are incentivized to actively participate and improve system performance. In practical applications, $$\begin{equation} \lambda_a \end{equation}$$ and $$\begin{equation} \lambda_b \end{equation}$$ can be dynamically adjusted based on network load and device status to balance system security and incentive effectiveness.
This incentive mechanism is designed to motivate the com- mittee members to perform their tasks honestly and efficiently, ensuring the accuracy of the global model and the overall reliability of the system. By incorporating asset staking, we encourage stakeholders to maintain the integrity of the ag- gregation process, which ultimately improves the quality and accuracy of the trained models.
1) Initialize: In the initialization phase, we consider the IIoT set of devices in the $$\begin{equation} i \end{equation}$$th subgroup as $$\begin{equation} N_{\mathcal{K S}}=\left\{N_{11}, \cdots, N_{K S}\right\}, \end{equation}$$ where $$$$ is the number of shards, and inline-formula>$$\begin{equation} S \end{equation}$$ is the number of IIoT devices within the shard. The set of this dataset is $$\begin{equation} D_{\mathcal{K S}}=\left\{D_{11}, \cdots, D_{K S}\right\}, \end{equation}$$, all local models are $$\begin{equation} w_{\mathcal{K S}}=\left\{w_{11}, \cdots, w_{K S}\right\}, \end{equation}$$, and the global initial model to be trained is $$\begin{equation} W^0 \end{equation}$$ (the subscripts denote the number of iteration rounds, and 0 means that no training has been performed yet).
2) Local model training within a shard: The joint commit- tee obtains the training task and publishes the training task to each shard. Devices $$\begin{equation} N_{k s} \end{equation}$$ in the shard train the model using the local dataset $$\begin{equation} D_{k s} \end{equation}$$ to obtain the model parameters inline-formula>$$\begin{equation} w \end{equation}$$. The goal of the training for device $$\begin{equation} N_{k s} \end{equation}$$ is to minimize $$\begin{equation} f_{k s}(w) \end{equation}$$:
$$\begin{equation} l\left(x_k, y_k, w\right) \end{equation}$$ represents the loss function of the current data sample, $$\begin{equation} n_k \end{equation}$$ represents the local dataset size, $$\begin{equation} x_k \end{equation}$$ is the feature vector, and $$\begin{equation} y_k \end{equation}$$ is the label vector.
Get the gradient $$\begin{equation} w^{t-1} \end{equation}$$ at the moment $$\begin{equation} t-1 \end{equation}$$, and each participating training node performs local stochastic gradient descent (SGD) to optimize its objective.
3) Intra-shard aggregation: When $$\begin{equation} N_{k s} \end{equation}$$ completes the local training, $$\begin{equation} N_{k s} \end{equation}$$ uploads the optimal model parameters $$\begin{equation} w_{k s} \end{equation}$$, and the leader node $$\begin{equation} N_k \end{equation}$$ elected by the shard will use the federated averaging algorithm to aggregate the best parameters in the shard.
The gradient aggregation process is currently the most commonly used FedAvg algorithm. The gradient $$\begin{equation} w^{t-1} \end{equation}$$ at the moment $$\begin{equation} t-1 \end{equation}$$ is first obtained, and each participating training node performs local stochastic gradient descent (SGD) to optimize its objective.
4) Global aggregation: The leader nodes of all the shards will upload the optimal parameters $$\begin{equation} \left\{w_1, \cdots, w_K\right\} \end{equation}$$to the joint committee for global aggregation, and $$$$$ denotes the number of shards. The Pearson correlation coefficient between the local model update in the current round and the global model update in the previous round is calculated to determine the performance of the local update. $$\begin{equation} w_k^\tau \end{equation}$$ is the local model param- eter of the kth shard in the $$\begin{equation} \tau \end{equation}$$th round of training, and $$\begin{equation} w_G^{\tau-1} \end{equation}$$ is the global aggregation model parameter of the previous $$\begin{equation} \tau-1 \end{equation}$$ round. According to the Pearson correlation coefficient algorithm, the Pearson correlation coefficient $$\begin{equation} r \end{equation}$$ between the local model and the global model in this round is:
At the beginning of training, local models may not be too similar to the global model. Therefore, if a large threshold is set at the beginning, too many local models may be filtered. Whereas if a small threshold is set at the beginning, some unnecessary models will not be filtered. We choose to set the threshold dynamically to the correlation at the location of the unfiltered models in the previous round. Assuming that the threshold after round $$\begin{equation} tau \end{equation}$$ is $$\begin{equation} \operatorname{th}(\tau) \end{equation}$$ and the correlation $$\begin{equation} r \end{equation}$$ between local and global models is less than $$\begin{equation} \operatorname{th}(\tau) \end{equation}$$, the model parameters are filtered and do not participate in the global aggregation.
The final global aggregation model is:
$$\begin{equation} K_m \end{equation}$$ refers to the number of filtered shards, $$\begin{equation} r_i \end{equation}$$ is the model cor- relation coefficient of the $$\begin{equation} i \end{equation}$$th shard, and the scaling factors $$\begin{equation} \beta_k \end{equation}$$ and $$\begin{equation} S(x)=\frac{1}{1+e^{-x}} \end{equation}$$ are used to adjust the ratio of samples in shard $$\begin{equation} i \end{equation}$$ to the total samples.
The bottleneck of the mechanism is sharding, which strongly relies on the construction of the shards. Fixed sharding will limit the throughput of FL-enabled blockchain frameworks. In this section, the process of sharding and the blockchain parameter tuning are modeled as Markov decision- making process. By defining the state space, action space, and reward function, the MDP agents can achieve the joint optimization of the system throughput and the FL accuracy by maximizing the reward function.
The workflow consists of five phases: shard configuration, FL training, shard consensus, aggregation and broadcast. FL training and shard consensus belong to intra-shard, and inter- shard aggregation and broadcast. Therefore, we divide the processing time in the shard formation round into the intra- shard FL training time $$\begin{equation} T_{f e d}, \end{equation}$$, the intra-shard delay time $$\begin{equation} T_{c o n}^k \end{equation}$$ for $$\begin{equation} k \end{equation}$$ shards, and the consensus time $$\begin{equation} T_{c o n}^k \end{equation}$$ for the final consensus process. The DRL training process is executed only before shard formation, and the FL training process is started only after the shard formation. Then, if the correlation of the shard model parameters is too low, the FL full aggregation cannot be executed.
Consensus delay includes message propagation and message validation delay. PBFT consensus is used within the shard, and in the pre-preparation phase, the leader node within the shard will broadcast to the remaining $$\begin{equation} S-1 \end{equation}$$ nodes. In the preparation phase, the $$\begin{equation} S-1 \end{equation}$$ nodes broadcast messages to all the nodes. And in the commit phase, all the nodes broadcast messages to each other,
where $$\begin{equation} R_t \end{equation}$$, $$\begin{equation} T_v \end{equation}$$ and $$\begin{equation} M \end{equation}$$ are the message transmission rate, message verification time and message size for each stage.
Because of the model parameters for each shard are sent to the joint committee, the inter-shard delay $$\begin{equation} \begin{equation} T_{\text {fcon }}^k \end{equation} \end{equation}$$:
$$\begin{equation} \begin{equation} T_{\text {fcon }}^k \end{equation} \end{equation}$$ should be limited to the duration of $$\begin{equation} \begin{equation} T_{\text {fcon }}^k \end{equation}. \end{equation}$$.
Algorithm 2 | Fed-Filt |
1: | procedure JOINT COMMITTEE EXECUTES |
2: | initialize $$\begin{equation} W^0 \end{equation}$$ |
3: | for each round $$\begin{equation} \tau=1,2, \cdots, \tau_{\max } \end{equation}$$ do |
4: | for each shard $$\begin{equation} k \in \mathcal{K} \end{equation}$$ do |
5: | $$\begin{equation} w_k^{\tau+1} \leftarrow \operatorname{ShardUpdate}\left(k, W_G^\tau\right) \end{equation}$$ |
6: | end for |
7: | $$\begin{equation} \left\{r_k\right\}_{k \in \mathcal{K}}^\tau \leftarrow \operatorname{GetScore}() \end{equation}$$ as equation(11) |
8: | for each $$\begin{equation} r_k \end{equation}$$ do |
9: | if $$\begin{equation} r_k>\operatorname{th}(\tau) \end{equation}$$ then |
10: | Ditch $$\begin{equation} w_k^{\tau+1} \end{equation}$$ in this round |
11: | end if |
12: | end for |
13: | $$\begin{equation} W_G^{\tau+1} \leftarrow \frac{\sum_{i=1}^{K_m} r_i \times S\left(\beta_k \times \frac{n_i}{n}\right) w_i^\tau}{\sum_{i=1}^{K m} r_i \times S\left(\beta_k \times \frac{n_i}{n}\right)} \end{equation}$$ |
14: | end for |
15: | end procedure |
16: | procedure SHARDUPDATE$$\begin{equation} \left(k, W_G^\tau\right) \end{equation}$$ |
17: | initialize batch size $$\begin{equation} B \end{equation}$$, local round $$\begin{equation} E \end{equation}$$ |
18: | for each client $$\begin{equation} s \end{equation}$$ in the shard $$\begin{equation} k \end{equation}$$ do |
19: | for each local epoch $$\begin{equation} t \end{equation}$$ in $$\begin{equation} E \end{equation}$$ do |
20: | for batch $$\begin{equation} b \end{equation}$$ samples based on $$\begin{equation} B \end{equation}$$ do |
21: | $$\begin{equation} w_{k s}^t \leftarrow w_{k s}^{t-1}-\eta \nabla l\left(w_{k s}^{t-1}, b\right) \end{equation}$$ |
22: | end for |
23: | end for |
24: | end for |
25: | $$\begin{equation} w_k \leftarrow \sum_{s=1}^S \frac{n_s}{n} w_{k s} \end{equation}$$ |
26: | end procedure |
1) State Space: The state space at the discrete-time calen- dar element as the states of all IIoT devices N in the shard, the data transfer rate $$\begin{equation} R_t \end{equation}$$, the training time $$\begin{equation} T_fed \end{equation}$$, and the FL final model accuracy $$\begin{equation} A c c_{W_G} \end{equation}$$. Therefore, the state space can be represented as
2) Action Space: To adapt to the dynamic environment, several measures are required. They include internal node balancing factor $$\begin{equation} B \end{equation}$$, leaf node balancing factor $$\begin{equation} L \end{equation}$$, cluster radius $$\begin{equation} \mathcal{T} \end{equation}$$, and data size $$\begin{equation} M \end{equation}$$. Thus, the action space can be expressed as
3) Reward Function: The slice takes action to obtain the next state and gets a reward function based on the feedback. The reward function is used to maximize the blockchain throughput and the grouping quality of the mechanism, which can be expressed as
$$\begin{equation} K=L \end{equation}$$ is the number of shards and $$\begin{equation} M \end{equation}$$ is the data size. If $$\begin{equation} C1 \end{equation}$$, $$\begin{equation} C2 \end{equation}$$, and $$\begin{equation} C3 \end{equation}$$ are all satisfied at round $$\begin{equation} t \end{equation}$$, then the instantaneous reward function is $$\begin{equation} R^t\left(S^t, A^t\right)=\mu_s S_\theta(t)+\mu_\omega \Omega(t) \end{equation}$$. If neither is satisfied, it is 0. $$\begin{equation} S_\theta(t) \end{equation}$$ is the evaluation metrics for IIoT user groupings, where $$\begin{equation} \theta=\{0,1,2, \cdots\} \end{equation}$$ denotes different evaluation metrics for clustered groupings, such as NMI, AMI, ARI, silhouette coefficient, $$\begin{equation} \Omega(t)=\frac{M K}{M_t T_{\text {eopch }}} \end{equation}$$ is the throughput of the sharded blockchain, and $$\begin{equation} \mu_s \end{equation}$$ and $$\begin{equation} \mu_w \end{equation}$$ are weighting coefficients representing the objective’s bias on grouping quality and throughput performance, which can be dynamically adjusted according to the actual scenario.
In order to receive the best long-term aggregation rewards,
The temporal discount $$\begin{equation} \gamma \in[0,1] \end{equation}$$ determines how the agent views future rewards. Too small a value of $$\begin{equation} \gamma \end{equation}$$ will cause the agent to be short-term and seek to maximize short-term benefits, and a large value of $$\begin{equation} \gamma \end{equation}$$ will cause the agent to take a long-term view and maximize gains over a longer time frame. $$\begin{equation} \pi \end{equation}$$ is the behavioral strategy, and the goal is to find the optimal strategy $$\begin{equation} \pi^* \end{equation}$$.
The simulation tool for this experiment is python3.9, tensorflow2.0, CPU is Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz, GPU is NVIDIA GeForce RTX 3060, and the dataset is Sensor stream contains information (temperature, humidity, light, and sensor voltage) collected from 54 sensors deployed in Intel Berkeley Research Lab [55], where the sensor IDs are used as class labeled real signatures, so that external evaluation metrics can be used as a comparison. In this we use a subset of 13477 as a block training.
TABLE I
Model | Accuracy(%) | Recall(%) | F1-score(%) |
Birch-Kmeans | 88.85 | 88.07 | 88.46 |
DBSCAN | 73.07 | 71.09 | 72.08 |
Birch-Kmeans achieves significantly higher accuracy is 88.85% and recall is 88.07% compared to DBSCAN (73.07% and 71.09%). This is attributed to Birch-Kmeans preprocessing capability, which provides better initial cluster centers for Kmeans, improving clustering quality. Birch-Kmeans F1-score is 88.46% which is far exceeds DBSCAN (72.08%), demon- strating its balanced performance in precision and recall.
Fig. 2 shows the comparison of the quality of different subgroups under different grouping schemes. NMI measures the similarity between two subgroups based on the concept of information theory, which is able to take into account the mutual information and entropy of the subgroups, and a higher value indicates that the two subgroups are more similar. AMI is an improved version of NMI, which is able to more accurately measure the similarity of the subgroups by adjusting for the error generated by random factors, and a higher value indicates that the subgroups are more consis- tent. ARI calculates the relative consistency of groupings by
Fig. 2.
comparing the common points and differences between the actual groupings and the clustering results, with higher values indicating that the clustering results are more consistent with the actual groupings [30]. For DRL, we use Twin Delayed DDPG (TD3) algorithm [31], TD3 uses Twin Q-networks, which can estimate the action value function more accurately by maintaining two independent Q-networks, reducing the error estimation of the action value function, thus improving the accuracy of strategy evaluation and improving the accuracy of strategy evaluation in terms of the separation of the target strategy from the action strategy separation, adaptive target strategy noise and standardized action space, and other mech- anisms are improved, with better stability, convergence and explorability. We set $$\begin{equation} \gamma=0.99 \end{equation}$$, $$\begin{equation} t a u=0.005 \end{equation}$$, the learning rate as $$\begin{equation} l_r=0.0004 \end{equation}$$, the noise trimming as 0.5, the noise exploration range as 0.2, $$\begin{equation} \delta_0=0.4 \end{equation}$$, and the maximum number of shard $$\begin{equation} K \end{equation}$$ as 64, and after that, we respectively perform 100 episodes using our proposed method without birch preprocessing and using DBSCAN method as a comparison algorithm, and each round has 30 rounds of stepnum DRL training, it can be seen that all the algorithms converge successfully in the end. The preprocessing of birch algorithm makes the grouping quality of the initial k-means to be improved and speeds up the convergence of the training, and better training results are obtained. Also compared to DBSCAN, our algorithm is better not only in external evaluation metrics but also in internal evaluation metrics.
Fig. 3 and Fig. 4 show the accuracy of FL under the valida- tion of this sharding mechanism with the Skoltech Anomaly Benchmark (SKAB) dataset [32], which is an anomaly bench-
mark for evaluating anomaly detection algorithms. For the experimental setup, we set the number of shards to 4, the number of devices within a slice to 13, the number of local training rounds $$\begin{equation} \tau_{\max }=40 \end{equation}$$, the number of global aggregation rounds to 100, and the learning rate $$\begin{equation} l_{r f e d}=0.02 \end{equation}$$. The comparison of Fed-Filt, MISO, and FedAvg algorithms under different proportions of malicious nodes further validates their performance. Fig. 3 is under 25% malicious nodes, and Fig. 4 is under 50% malicious nodes. In the experiments, due to low model correlation in the early stages of training, the Fed- Filt algorithm cannot completely filter out malicious nodes under the preset correlation threshold (e.g., 0.45). This leads to some accuracy degradation in the initial phase, as the model is still affected by malicious nodes. However, as the training of normal nodes progresses, the correlation of malicious nodes drops significantly, leading to their dynamic exclusion. This allows the model accuracy to recover and eventually stabilize at a high level.
In comparison, the MISO demonstrates a smoother declin- ing trend in the early stages, showing some robustness against interference. However, MISO falls short in resisting malicious nodes and recovering in the later stages, compared to Fed-Filt. Particularly in the final convergence phase, MISO’s accuracy is noticeably lower than Fed-Filt, indicating its limitations in dynamic filtering and model protection. By contrast, Fed- Filt’s dynamic filtering mechanism not only enables rapid recovery during the mid-training phase but also achieves higher accuracy in the final convergence stage.
On the other hand, FedAvg, which aggregates all local mod- els equally, fails to distinguish malicious nodes from normal ones effectively. This results in the model being persistently affected by malicious nodes, with accuracy significantly lower than both Fed-Filt and MISO. Overall, Fed-Filt demonstrates clear advantages in resisting poisoning attacks and recovery capabilities, while MISO, though better than FedAvg, does not match the performance of Fed-Filt.
In this paper, we study the self-adaptive blockchain sharding strategy based on the grouping of IIoT devices, and the FL anomaly detection training is carried out when the grouping
is completed. We innovate in the grouping algorithm, con- sensus mechanism and aggregation process to improve the security of FL training while considering the scalability of the sharded blockchain and the reasonableness of device grouping. Specifically, we adopt Birch for preprocessing to improve the convergence of K-means, and combine distance matrix and FL to aggregate secure nodes together to improve the security of sharding. A joint committee mechanism is used to improve the accuracy of the model by combining the filtering mechanism of global aggregation with FL. Self-adaptive DRL sharding mechanism is obtained by training with TD3 agent. The experimental results also show the effectiveness of the scheme. At the same time, because the blockchain system has to be re-partitioned in every round, there are problems such as high computational consumption and miner assets, so the transaction mechanism and the partitioning mechanism between the partitions have to be improved in the future. In this paper, we only consider model relevance as a screening criterion, and in the future, we will research to find more reasonable judging criteria to better defend against model poisoning attacks.
Song Luo received his Master degree in Communication and Information System from the Academy of Telecommunication Science and Technology in 2006. He is the deputy director of the Institute of Industrial Internet and Internet of Things of the China Academy of Information and Communications Technology, the head of the domestic counterpart group of the ITU-T Internet of Things and Smart City Study Group (SG20), and the vice chairman of the ITU-T SG20. He is currently a Ph.D. candidate in Tsinghua University, his research interests include Industrial Internet policies and technologies, basic common technical standards and industrial development of the Internet of Things.
Chao Ma received his Ph.D. in Communications Engineering from Aston University, Birmingham, UK in 2014. He joined the China Academy of Information and Communications Technology, as the Director of the Business Development Department of the Institute of Industrial Internet and Internet of Things in 2020. He is also the vice chairman of WP2 of the ITU-T SG20, the registered expert of ISO/IEC JTC1 SC41 IoT Technical Committee, the expert of W3C Web of Things, Research areas include IoT and smart city architecture and standards, blockchain technology and applications, industrial Internet.
Yifei Wei received the B.Sc. and Ph.D. degrees in Electronic Engineering from Beijing University of Posts and Telecommunications (BUPT, China), in 2004 and 2009, respectively. He was a visiting Ph.D. student in Carleton University (Canada) from 2008 to 2009. He was a postdoctoral research fellow in the Dublin City University (Ireland) in 2013. He was the vice dean of school of science in BUPT from 2014 to 2016. He was a visiting scholar in the University of Houston (USA) from 2016 to 2017. He is currently a Professor and the vice-dean of school of electronic engineering at BUPT. His current research interests are in intelligent optimization of network resources, deep learning and blockchain technology.
Copyright © 2025 JCN.
Open Access Policy: This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.