Anomaly Detection for Industrial Internet of Things Devices Based on Self-Adaptive Blockchain Sharding and Federated Learning

Song Luo; Pengyi Zeng; Chao Ma; Yifei Wei

doi:10.23919/JCN.2025.000019

Anomaly Detection for Industrial Internet of Things Devices Based on Self-Adaptive Blockchain Sharding and Federated Learning

Song Luo, Pengyi Zeng, Chao Ma, and Yifei Wei

Abstract: With the rapid growth of the Industrial Internet of Things (IIoT), more and more devices are connecting to the network, generating vast amounts of data, including sensors, actuators, and controllers. Traditional anomaly detection methods often rely on centralized data collection, leading to concerns about privacy leakage and data centralization. To address these challenges, approaches that combine federated learning (FL) with blockchain technology offer an efficient, scalable solution. These methods enable automatic scaling based on system size, adapting to increasing devices and data traffic. However, the limitations of fixed shards and security risks associated with the aggregation of data from different shards introduce new issues. To overcome these challenges, this paper proposes a self-adaptive blockchain sharding strategy, based on IIoT device grouping, which jointly optimizes the number of shards and the security of model updates. This optimization is modeled as a Markov Decision Process (MDP), with deep reinforcement learning (DRL) used to determine the optimal device sharding parameters. Furthermore, a joint committee mechanism is introduced to ensure secure cross-shard transactions, while a federated learning filtering mechanism (Fed-Filt) is applied to enhance the accuracy and security of global model aggregation by screening out malicious nodes. In the clustering experiments, the proposed method shows significant improvements in clustering quality metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), Adjusted Mutual Information (AMI), and silhouette score, with faster convergence. Experimental results demonstrate that under 25% and 50% malicious node scenarios, the proposed algorithm can effectively resist poisoning attacks and achieve stable convergence, outperforming the traditional FedAvg algorithm. Specifically, with 50% malicious nodes, the accuracy is improved by approximately 27.8%, and the method exhibits strong resistance and recovery capabilities.

Keywords: Anomaly detection, blockchain sharding, cluster, deep reinforcement learning, federated learning, IIoT

I. INTRODUCTION

WITH the increasing scale of industrial development, the era of Internet of Things (IoT) where everything is interconnected as envisioned by trustworthy, reliable and intelligent 6G communication network technology can provide higher quality of service and further promote the intelligent development of industry. But with that comes an explosion of accessible smart devices for the IIoT, massive amounts of data generated by sensors and other [1], [2]. Any abnormality in the equipment in industrial production is very likely to bring huge losses and even affect the subsequent production and develop- ment. However, the multivariate time-series data generated by sensors and others in the equipment can be used to analyze the stop loss [3]. Blockchain and Federated Learning (FL) are then widely used in this field to intelligently analyze the equipment data and detect the abnormal information in advance under the premise of security and privacy protection to safeguard the efficiency of industrial production.

FL safeguards the security of device data to a certain extent through the distributed training method, provides local collaborative training for IIoT devices, and shares global model parameters by aggregating the local model parameters. Due to the specificity of the FL aggregation method, it is easy to lead to security problems such as single point of failure and model inference attacks, blockchain as a decentralized, trace- able, tamper-proof and trusted network, through the consensus mechanism, encryption technology and so on can effectively solve the above mentioned challenges [4]. Therefore, in many studies, targeted solutions for IoT issues are often based on the integration of both methods. However, due to the blockchain ternary paradox problem, which states that, decentralization, security, and scalability can’t be satisfied at the same time, when facing a large number of IoT accessible nodes, transac- tion validation slows down, and the ledger and block size in the blockchain increase unsustainably. For example, Bitcoin specifies a maximum of no more than 1MB, and by adjusting block generation, the maximum system throughput is about 7–10 transactions. Too low throughput is obviously unable to adapt to the data processing of large-scale IoT devices [5], so focusing on the scalability of the blockchain is a major and important direction for the development of today’s IoT.

Based on existing research, sharding is shown to be one of the most promising ways to achieve horizontal scalability of blockchain networks. In sharding, nodes in the network are divided into many shards, each maintaining an independent blockchain, and multiple shards generate blocks in parallel to improve the overall throughput of the blockchain system [6], thus a large number of IoT systems based on sharding blockchain have emerged. Owing to the self-organizing nature of IoT computing tasks and the uncertainty of IoT device states, self-adaptive sharding strategies based on dynamic clus- tering should be prioritized. On the other hand, the throughput of a sharded blockchain is determined by the number of shards and the consensus efficiency of blocks within each shard. Theoretically, the higher the number of shards, the higher the blockchain throughput, which corresponds to a smaller number of nodes within the shards, while at the same time the cross-shard transactions increase, the consensus efficiency decreases, the throughput decreases, and the latency increases [7]. Therefore the number of shards is a major core point of sharding strategy. Dynamic clustering based sharding strategy determines the number of groupings by guiding the grouping of IIoT devices through machine learning. The most typical of clustering algorithms is the K-means algorithm, for which the initial parameters of clustering is particularly important and affects the effectiveness of subsequent grouping. Therefore, comprehensively considering the efficiency and throughput of anomaly detection, obtaining a more appropriate number of groupings through machine learning training is the focus of the research.

Based on the above problems, this paper proposes an self- adaptive blockchain sharding strategy based on grouping of IIoT devices, which uses deep reinforcement learning (DRL) to dynamically divide the shards and find the optimal parame- ters thus obtaining the maximized federated learning accuracy. The main contributions of this paper are as follows:

· We propose an optimized birch-kmeans clustering algo- rithm to group devices and then dynamically shard the network of blockchain by deep reinforcement learning.

· We propose a joint committee mechanism for cross-shard transactions, which is combined with FL to ensure the security of model updates across clusters. In the global aggregation of FL, the aggregation screening algorithm Fed-Filt is proposed to reduce malicious nodes and im- prove the accuracy of FL.

· We evaluate the algorithm using datasets from real- world sensors. The optimized birch-kmeans has better performance not only in external evaluation metrics but also in internal evaluation metrics. Fed-Filt has a large improvement in resistance to malicious nodes compared to the traditional FedAvg, which ensures the accuracy of the model.

The remainder of this paper is organized as follows. Section II is the latest research direction of blockchain and federal learning in anomaly detection of IIoT devices at home and abroad and the comparison with this paper. Section III is the overall architecture of the system and the partitioning strategy. Section IV describes the joint optimization problem in the industrial scene, which is solved by TD3. Section V is about the setting of experimental parameters and the analysis of experimental results. Section VI is the summary of the work and the prospect of the future.

II. RELATED WORK

A. Blockchain and Federated Learning

The combination of federated learning and blockchain can build a secure and trustworthy anomaly detection sys- tem. Santin et al. [8] proposed a FL anomaly detection architecture integrated into a blockchain, proposing a fully decentralized system with agents collaborating on demand and authentication through the blockchain to ensure security. Yazdinejad et al. [9] proposed a Block Hunter, cluster-based anomaly detection search framework considering the impact of blockchain’s throughput, block size, and so on. The above approaches perceive to some extent that the scalability, con- sensus delay, and security are important influencing factors of blockchain architecture for IIoT, so according to the different needs of different scenarios, the corresponding consensus protocols are designed to optimize on incentives, such as malicious node detection [10], incentive mechanism [11], [12], and so on.

Additionally, T vats et al. [13] emphasizes context-aware clustering and explainability in IIoT systems. Lacks sup- port for dynamic sharding and cross-device collaboration, making it unsuitable for real-time demands in large-scale IIoT networks.Our model optimizes sharding parameters using BIRCH-Kmeans clustering and DRL, supporting dynamic device grouping and load balancing. Gupta et al. [14] discusses using blockchain to secure searchable encrypted data in cloud- based healthcare systems, improving privacy and efficiency for federated learning. But it dosn’t combine federated learning with sharding, failing to improve model training efficiency and attack resistance.Our model integrates federated learning with blockchain sharding, significantly improving training efficiency and attack resistance through the joint committee mechanism and Fed-Filt algorithm. Nguyen et al. [15] inte- grates blockchain with deep learning to enhance the security and privacy of medical data, aligning with the security needs of FL in healthcare.

B. Clustering applied to IIoT

In IIoT, there are a large number of devices and sensors which may belong to different types or have different charac- teristics. Using clustering algorithms, devices can be automat- ically classified and managed by grouping them into different clusters based on their similarities. The Birch algorithm is a clustering algorithm for large-scale datasets, which is able to efficiently handle a large amount of IIoT datability. The goal of the K-means algorithm is to minimize intra-cluster differences and maximize inter-cluster differences [16]. Van- danov et al. [17] proposed a parallel primal pairwise heuristic algorithm to improve and optimize k-medoids on distance matrix computation and GPU parallel algorithms. For the processing of time-series data in IIoT, [18] proposed clustering algorithms such as integrated Ikmeans and birch, which do not require the initial setting of K-parameters to perform clustering and outlier detection. Zhang et al. [19] uses GANs to address class imbalance in IIoT anomaly detection. Does not integrate blockchain sharding with federated learning, limiting its ability to handle data privacy and security challenges in distributed environments. Our model combines adaptive blockchain shard- ing with federated learning to address data privacy issues and enhances security through the Fed-Filt filtering algorithm.

These works highlight the importance of clustering in handling device heterogeneity, optimizing performance, and improving the accuracy of anomaly detection in IIoT environ- ments.

C. Blockchain Sharding

Sharded blockchain is an extended form of blockchain technology that divides the entire blockchain network into multiple shards (or subchains), each of which is responsible for processing a portion of the data and transactions, thereby increasing the throughput and efficiency of the system. The design of a sharded blockchain allows data storage and val- idation to take place on multiple nodes, further enhancing data security. [20] utilizes blockchain to securely authenticate and manage healthcare data, demonstrating a use case for enhancing shard-level data security in real-time applications. But it does not optimize for IIoT anomaly detection and lacks a dynamic filtering mechanism for malicious nodes. Our model optimizes real-time anomaly detection and security in IIoT through adaptive sharding and the Fed-Filt algorithm, dynamically filtering malicious nodes. Furthermore, Luo et al. [21] proposed an autonomously scalable Fission sharding mechanism, a diffusion-based consensus algorithm linearly improves scalability, and a graph division of shards by location reduces cross-shard transactions. The number of nodes and transactions in the blockchain network may change over time.

If the sharding mechanism is static and fixed, it may lead to overloading of certain shards. A dynamic sharding mechanism can automatically adjust the size and composition of the shards according to the load of the network, achieving better load balancing and improving overall performance and throughput. Dynamic programming algorithm for DRL can learn the characteristics, node behaviors, and load conditions of the blockchain network and use the information for sharding strat- egy. Yang et al. [22] used an overlapping sharding architecture in order to solve the centralized problem and reduce the cross- shard transactions, and based on the information of local nodes and the frequency of transactions, they proposed DRL-OSS to implement the self-organizing sharding. Taking into account the FL in the IIoT, the sharding mechanism can be designed by combining the mechanism and efficiency of FL. For exam- ple, [23] proposed DRL-based adaptive sharding mechanism, which dynamically selects the sharding parameters through the reputation and throughput among the vehicles, and at the same time, adopts the two-layer FL framework of main chain and subchain, and reduces the cross-shard transactions based on the pipelined global aggregation approach.

In this paper, we focus on the unique characteristics of clustering in IIoT devices, particularly within the context of manufacturing environments. Specifically, we propose an integrated approach that combines the Birch and K-means clustering algorithms, where the accuracy of the model trained by the IIoT devices is integrated into the distance matrix to improve clustering accuracy. This method not only achieves fast clustering but also enhances the accuracy by considering various factors such as the number of shards, throughput, and the efficiency of federated learning.

III. SYSTEM MODEL

The self-adaptive blockchain sharding architecture inte- grating federated learning (FL) is designed to address the scalability, privacy, and efficiency challenges faced by IIoT systems. As shown in Fig. 1, the architecture consists of several key components, each playing an important role in ensuring the reliable processing and secure management of data and models in the IIoT network.

The system dynamically configures the blockchain network into multiple shards, each capable of independently process- ing a subset of the total transactions and data. This shard- based approach allows the system to scale and handle a larger number of transactions, while maintaining high levels of security and performance.The IIoT devices are the data producers in the system. These devices generate data (such as sensor readings) that need to be processed and used for model training. The tasks assigned to the IIoT devices are processed by the blockchain network, which manages these tasks through a smart contract mechanism. In Fig. 1, the label ”receives tasks” refers to the IIoT devices that are assigned specific tasks by the blockchain network. These tasks involve data collection and the training of local models. The blockchain, through its shard configuration, ensures that tasks are distributed in such a way that each device contributes to the training process while ensuring that the overall system remains scalable and efficient.

The federated learning process is integrated with the blockchain through smart contracts, which are triggered when a shard completes its task. These smart contracts ensure that the tasks, including model training and data aggregation, are executed in a secure and transparent manner. Once a round of training is complete, the updated model parameters from all shards are uploaded to a committee, which performs global ag- gregation. The aggregated parameters are then used to update the global model, which is further refined in subsequent rounds of training. This iterative process continues until the optimal model parameters are achieved. Once the model reaches an optimal state, the next sharding epoch begins, and the system starts the process again, dynamically adjusting the shards and continuing the federated learning cycle. This paper assumes the existence of N

This paper assumes the existence of $$\begin{equation} N \end{equation}$$ IIoT device nodes and $$\begin{equation} M \end{equation}$$ blockchain verification nodes, and DRL adaptive sharding of IIoT devices by clustering, dividing $$\begin{equation} N \end{equation}$$ nodes into $$\begin{equation} \mathcal{K}=\{1, \cdots, k, \cdots, K\} \end{equation}$$ group training shards, each shard has $$\begin{equation} \mathcal{S}=\{1, \cdots, s, \cdots, S\} \end{equation}$$ nodes. Relative to the device nodes, $$\begin{equation} M \end{equation}$$ blockchain verification nodes are divided into $$\begin{equation} K \end{equation}$$ group shards and a group of joint committees according to the shard ID, which is equivalent to the existence of $$\begin{equation} n_{k s} \end{equation}$$ shards. IIoT de- vices in different training shards collect local data as datasets, they receive model training tasks from the blockchain, and then complete the training tasks. Each node $$\begin{equation} n_{k s} \end{equation}$$ (representing the s node in the $$\begin{equation} k \end{equation}$$ shard) acts as a follower in the shard and a worker in FL to collaborate on the training task. Through

Fig. 1.

Adaptive Blockchain Sharding Architecture Incorporating Federated Learning.

the joint committee consensus mechanism, the committee is elected, and then the model parameters of the shards are scored. After screening, the committee aggregates the remaining qualified model parameters to obtain the global aggregation model.

Since the blockchain can dynamically configure the shard- ing, each $$\begin{equation} t_{\text {shard }} \end{equation}$$ will trigger a smart contract when it completes the sharding, issue transaction tasks, and conduct FL. The model training results of multiple shards will be uploaded to the committee for global aggregation, and multiple rounds of training will be conducted until the optimal model parameters of the current round are obtained and the next sharding epoch begins.

A. Grouping Strategy based on Birch-Kmeans

The traditional K-means clustering algorithm first randomly selects $$\begin{equation} K \end{equation}$$ initial cluster centers from the dataset, and then assigns the object to its nearest cluster center cluster based on the Euclidean distance between the rest of the data objects and these $$\begin{equation} K \end{equation}$$ cluster centers. Then these $$\begin{equation} K \end{equation}$$ clustering centers are calculated to be updated according to the sample midpoint average until the convergence state ($$\begin{equation} K \end{equation}$$ clustering centers no longer change) or the maximum number of iterations is reached. The K-means algorithm is sensitive to the selection of initial cluster centroids, and different initial selections may lead to different clustering results. Birch is a fast clustering algorithm based on hierarchical clustering. It clusters data hierarchically by constructing a $$\begin{equation} CF \end{equation}$$ tree, and dynamically divides and merges nodes during the tree construction pro- cess [24]. Eventually, the clustering results can be obtained by traversing the $$\begin{equation} CF \end{equation}$$ tree. Birch can automatically identify the cluster structure in the data, which provides a better initial clustering center for the subsequent K-means, and improves the accuracy of clustering. Through the preprocessing of Birch, the original data can be compressed and summarized, thus reducing the computation of the subsequent K-means.

Birch constructs a $$\begin{equation} CF \end{equation}$$ tree by scanning the $$\begin{equation} N \end{equation}$$ dimensional dataset of the device, where each leaf node $$\begin{equation} C F_{i N o d e} \end{equation}$$ represents a sub-cluster, and the other non-leaf nodes are denoted as $$\begin{equation} \left\{C F_i, \text { child }_i\right\} \end{equation}$$, with $$\begin{equation} \operatorname{child}_i \end{equation}$$ denoting the ith pointed child node,

Table 1.

표 설명

Algorithm 1	Birch-Kmeans based on Federated Learning
Input:	Input data:$$\begin{equation} D_N \end{equation}$$; Balancing factor:$$\begin{equation} B, L \end{equation}$$, L; Threshold:$$\begin{equation} \mathcal{T} \end{equation}$$
1:	Create $$\begin{equation} CF \end{equation}$$ tree, traverse the dataset adding nodes one by one to get $$\begin{equation} K=L \end{equation}$$ leaf nodes: $$\begin{equation} \left\{C F_{1 \text { Node }}, C F_{2 \text { Node }}, \cdots, C F_{K \text { Node }}\right\}, \forall C F_{i \text { Node }} \in D_N \end{equation}$$;
2:	Initialize the initial $$\begin{equation} K \end{equation}$$ centroids of kmeans: $$\begin{equation} \left\{C F_{1 \text { Node }}^{\text {init }}, \cdots, C F_{K \text { Node }}^{\text {init }}\right\}=\left\{C F_{1 \text { Node }}, \cdots, C F_{K \text { Node }}\right\} \end{equation}$$
3:	procedure
4:	while $$\begin{equation} \text { iter }<E \end{equation}$$ do
5:	for $$\begin{equation} D_i \end{equation}$$ in $$\begin{equation} D_N \end{equation}$$ do
6:	Calculate the distance $$\begin{equation} d \end{equation}$$ to the $$\begin{equation} K \end{equation}$$ centroids as in equation(1);
7:	Assign $$\begin{equation} D_i \end{equation}$$ to the cluster corresponding to the nearest centroids;
8:	end for
9:	for each cluster $$\begin{equation} k \end{equation}$$ in $$\begin{equation} \mathcal{K} \end{equation}$$ do
10:	he new clustering centroids are calculated as in equation(2):$$\begin{equation} \left\{C F_{1 N o d e}^{n e w}, \cdots, C F_{K N o d e}^{n e w}\right\} \end{equation}$$
11:	if No change in the center of clustering then
12:	Output the grouping results $$\begin{equation} \mathcal{K} \end{equation}$$
13:	else
14:	$$\begin{equation} \begin{aligned} & \left\{C F_{1 \text { Node }}, \cdots, C F_{\text {KNode }}\right\} \leftarrow \\ & \left\{C F_{1 N o d e}^{\text {new }}, \cdots, C F_{\text {KNowe }}^{\text {new }}\right\} \end{aligned} \end{equation}$$
15:	end if
16:	end for
17:	end while
18:	return $$\begin{equation} \left\{C F_{1 N o d e}^{n e w}, \cdots, C F_{K \text { Node }}^{n e w}\right\} \end{equation}$$
19:	end procedure

and $$\begin{equation} CF_i \end{equation}$$ denoting the sub-cluster that is the location of the child node. Each leaf node $$\begin{equation} C F_{i \text { iNode }} \end{equation}$$, where i$$\begin{equation} i=1,2, \cdots, L \end{equation}$$, and $$\begin{equation} L \end{equation}$$ represents the maximum number of leaf nodes in the $$\begin{equation} CF \end{equation}$$ tree, serves as the initial clustering center for the subsequent K-means.

Euclidean distance formula between data objects in space:

(1)

$$\begin{equation} d\left(x, C F_{i N o d e}\right)=\sqrt{\sum_{l=1}^m\left(x_j-C F_{i j}\right)^2} \end{equation}$$

Where $$\begin{equation} x \end{equation}$$ is the data, $$\begin{equation} C F_{i \text { iNode }} \end{equation}$$ is the $$$$th clustering center, $$$$ is the dimension of the data, and $$\begin{equation} x_j \end{equation}$$, $$\begin{equation} C F_{i j} \end{equation}$$ is the $$\begin{equation} j \end{equation}$$th attribute value of $$\begin{equation} x \end{equation}$$ and $$\begin{equation} C F_{i \text { iNode }} \end{equation}$$.

The following is the update within the clusters, the point within the cluster with the smallest average distance from all nodes is chosen as the center of the cluster

(2)

$$\begin{equation} \begin{aligned} C F_{i N o d e}^{n e w} & =\arg \max _s\left(\sum_{r, s \in K_i} \delta_i d(r, s)\right), \\ \delta_i & =\delta_0 \times r_k \times S\left(\beta_i \times \frac{n_i}{n}\right) . \end{aligned} \end{equation}$$

$$\begin{equation} r, s \end{equation}$$ are all data objects inside the $$\begin{equation} i \end{equation}$$th cluster $$\begin{equation} K_i \end{equation}$$, and $$\begin{equation} \delta_i \end{equation}$$ is the federated learning impact factor of client $$\begin{equation} i \end{equation}$$, which determines the impact of FL accuracy for space distance. $$\begin{equation} r_k \end{equation}$$ is the model correlation coefficient of the last round of shards where client $$\begin{equation} i \end{equation}$$ is located, and $$\begin{equation} \frac{n_i}{n} \end{equation}$$ is the ratio of the samples that client $$\begin{equation} i \end{equation}$$ participated in the training to the total samples in this round of FL training, and scaling factors $$\begin{equation} \beta_i \end{equation}$$ and $$\begin{equation} S(x)=\frac{1}{1+e^{-x}} \end{equation}$$ are used to adjust for proportionality effects. $$\begin{equation} \delta_i \end{equation}$$ takes the value (0–1). Highly correlated models in homogeneous datasets might show less variation in $$\begin{equation} r_k \end{equation}$$, whereas heterogeneous datasets will likely show significant differences. Ensuring accurate calculation of$$\begin{equation} r_k \end{equation}$$ is crucial for maintaining the balance between collaboration and individual model performance. The scaling function’s sensitivity is influenced by $$\begin{equation} \beta_i \end{equation}$$, which needs to be tuned according to the dataset size and distribution. For larger datasets, a smaller $$\begin{equation} \beta_i \end{equation}$$ might suffice, while smaller datasets might require a larger $$\begin{equation} \beta_i \end{equation}$$ to ensure meaningful scaling.

Evaluating the quality of clustering results is an important task, for this reason, both external and internal indicators are used in this paper. NMI, AMI, ARI are used to compare the similarity or difference between the clustering results and the real labels [25], with the help of silhouette coefficient to assess the grouping quality of birch-Kmeans algorithms based on the characteristics and statistical information of the clustering results themselves [26].

The silhouette coefficient is an evaluation index of the degree of density and dispersion of classes and is defined as follows:

(3)

$$\begin{equation} \begin{aligned} S_i & =\frac{b(i)-a(i)}{\max \{a(i), b(i)\}}, \\ a(i) & =\frac{1}{n_c-1} \sum_{i, j \in \mathcal{C} F_c, i \neq j} d(i, j), \\ b(i) & =\min _{p, p \neq c}\left\{\frac{1}{n_p} \sum_{i \in C F_c, j \in C F_p} d(i, j)\right\} . \end{aligned} \end{equation}$$

The average distance between data point $$\begin{equation} i \end{equation}$$ and other points in the same cluster is $$\begin{equation} a(i) \end{equation}$$, and the minimum value of the average distance between each data point $$\begin{equation} i \end{equation}$$ and all points in different clusters is denoted as $$\begin{equation} b(i) \end{equation}$$. Where $$\begin{equation} n_c \end{equation}$$ denotes the number of data objects in cluster $$\begin{equation} c, d(i, j) \end{equation}$$ is the Euclidean distance between $$\begin{equation} i, j, a(i) \end{equation}$$ denotes the average distance between and all data objects in the same cluster as $$\begin{equation} i \end{equation}$$, and $$\begin{equation} b(i) \end{equation}$$ denotes the minimum average distance between and all data objects in different clusters as $$\begin{equation} i \end{equation}$$. $$\begin{equation} S_i \end{equation}$$ takes values in [-1,1], and the larger the value is, the more reasonable is the clustering of the data object into a cluster [27]. The data are initially processed by Birch, then the K-means algorithm is iterated $$\begin{equation} H \end{equation}$$ times, the distance matrix $$\begin{equation} \operatorname{dist}(j) \end{equation}$$ of each iteration is recorded and all the individual silhouette coefficients are averaged to obtain $$\begin{equation} S_{a v g} \end{equation}$$, which is put into the reward function in DRL to optimize the selection of the initial parameters of Birch.

To determine the optimal number of clusters $$\begin{equation} K \end{equation}$$ for different datasets when integrating Birch and K-means, we employ a combination of the Birch algorithm’s $$\begin{equation} CF \end{equation}$$ tree structure and evaluation metrics such as the Silhouette Coefficient. The initial cluster centers provided by Birch help improve the accuracy and efficiency of the K-means algorithm. By starting with a more accurate initial estimation, K-means can converge faster and yield better clustering results. The use of silhouette coefficient as an evaluation metric ensures that the clustering quality is continuously assessed and adjusted. This approach helps in identifying the optimal number of clusters that best represent the underlying data structure.

B. Joint Committee Mechanism

One of the key challenges with blockchain sharding is the increase in cross-shard transactions, which can introduce communication overhead and complexity. To mitigate this, our architecture uses a grouping mechanism that dynamically assigns IIoT devices to different shards based on network conditions, data characteristics, and load balancing require- ments. This grouping mechanism ensures that devices gen- erating related data or tasks are placed in the same shard, thereby reducing the need for cross-shard communication. By minimizing cross-shard transactions, we reduce the associated overhead, improve the efficiency of the system, and maintain a high level of performance even as the network scales.

All the blockchain verification nodes are assigned to dif- ferent shards and shard ID is assigned to each node in a decentralized manner. In this paper, the blockchain system consists of $$\begin{equation} k+1 \end{equation}$$ shards denoted as $$\begin{equation} \left\{B_0, B_1, \cdots, B_k\right\} \end{equation}$$, where $$\begin{equation} B_0 \end{equation}$$ is the joint committee and each shard consists of $$\begin{equation} \frac{M}{K+1} \end{equation}$$ nodes. Each shard of the network has complete blockchain functionality, which means that each shard independently pro- cesses the transactions in the transaction pool corresponding, and can independently generate a new data block and verify the integrity of the transactions in the block through intra-shard consensus. Blockchain network sharding is mainly divided into two steps: first, a certain number of nodes are elected to the joint committee, and then the network is sharded and the remaining nodes are assigned to different shards.

Joint committee members are elected every epoch, and at the end of the current term, each member can apply to be a candidate for the next epoch of committee members, and each applicant is required to pledge a portion of their assets for incentives and disincentives to prevent malicious nodes. During the election process, the collateralized values of all candidates are ranked and the top $$\begin{equation} \frac{3 M}{2(K+1)} \end{equation}$$ members are selected to form the nomination pool.Then these $$\begin{equation} \frac{3 M}{2(K+1)} \end{equation}$$ members are allowed to solve the PoW puzzle [28].

(4)

$$\begin{equation} I D=H(\text { EpochRand }\|I P\| P K \| \text { Nonce })<D . \end{equation}$$

Where $$\begin{equation} \text { EpochRand } \end{equation}$$ denotes the random seed of PoW that is generated whenever reconstructing the shard. The public key $$\begin{equation} PK \end{equation}$$, $$\begin{equation} IP \end{equation}$$ and the random number Nonceare additionally used to compute the node $$\begin{equation} ID \end{equation}$$. $$\begin{equation} H \end{equation}$$ is the hash operator and $$\begin{equation} D \end{equation}$$ is the difficulty level of the PoW algorithm.

The first $$\begin{equation} \frac{M}{K+1} \end{equation}$$ nodes to complete the PoW puzzle will be elected to the joint committee and will be responsible for managing the allocation of subsequent nodes. After the election of the joint committee members is completed, the nodes that are not part of the joint committee will continue to complete the PoW puzzle and broadcast the computed correct $$\begin{equation} I D \end{equation}$$ value, $$\begin{equation} \text { Nonce } \end{equation}$$ value and the node’s own $$\begin{equation} I P \end{equation}$$ to the joint committee. When the joint committee receives $$\begin{equation} \text { Nonce } \end{equation}$$ and $$\begin{equation} I P \end{equation}$$ of the node, it will sort the list of nodes based on the $$\begin{equation} I D \end{equation}$$ value and assign nodes to the sharding network. For example, the preset number of shards is $$\begin{equation} K \end{equation}$$. After sorting the $$\begin{equation} I D \end{equation}$$ of each non joint committee, the first $$\begin{equation} \frac{M}{K+1} \end{equation}$$ is assigned to shard 1, followed by $$\begin{equation} \frac{M}{K+1} \end{equation}$$ to shard 2, and so on.

At the same time, in order to prevent malicious collusion among nodes from causing falsification of block data valida- tion, it is necessary to ensure the random distribution of nodes. Therefore, after the final consensus process of each epoch is completed, the above joint committee election and node network allocation process will be executed again to replace the old joint committee nodes with the newly elected joint committee nodes and form a new joint committee consisting of $$\begin{equation} \frac{M}{K+1} \end{equation}$$ nodes to guide the next round of the consensus protocol.

1) Intra-shard consensus: When the blockchain network sharding is completed, according to the number of shards, the data transactions uploaded by the IIoT devices will be divided into multiple data transaction pools, which are connected to multiple shard networks respectively. Each IIoT device registers on the blockchain and writes the FL-trained models, grouping configurations, and local training rounds via smart contracts. The transaction is then broadcasted to all IIoT devices in the blockchain. Each device is matched to the corresponding shard according to the assigned blockchain network address, which means that the transaction pool is divided into the number of corresponding blockchain shards $$\begin{equation} K \end{equation}$$. When a node in a shard receives the IIoT data in its corresponding transaction pool, the shard leader node will pack this transaction into the local block and run the PBFT consensus protocol.

2) Joint committee mechanism: The joint committee ver- ifies the model parameters obtained from each shard, and then the model parameters are compared with the global aggregation model parameters of the previous round to obtain a relevance score, and only when the relevance score is larger than a certain threshold can it be proved that this is a qualified update, which is merged with the local blocks uploaded by other shards based on the PBFT consensus. In addition to this, the joint committee generates an update block containing the training round $$\begin{equation} t_{f e d} \end{equation}$$, the global model parameters of the previous round, the global model parameters of the current round, the shard address and its corresponding score. Finally, the final block generated by combining all the blocks together is packed onto the blockchain, the committee will be elected again, the blockchain network is re-partitioned, and the next round of training starts.

Thus nodes within each shard store the local blocks of the shard as well as the final blocks that are eventually merged together, and local model parameters and global aggregated model parameters are stored on each shard subchain, so nodes have access to the latest aggregated model parameters for training at any time. If a malicious node participates in the training, the uploaded model parameters will not participate in the final aggregation due to the consensus mechanism of the joint committee to defend against model poisoning attacks.

3) Incentive mechanism: In order to incentivize committee members to act honestly, a reward/penalty mechanism is in- troduced in this paper. In the joint committee election process, committee member candidates must pledge some assets on the blockchain. Assuming that the value of the pledged assets is $$\begin{equation} c \end{equation}$$, the judgment is based on the ratio of the number of aggregation model parameters $$\begin{equation} K_B \end{equation}$$, to the number of shards $$\begin{equation} K \end{equation}$$, in the final consensus process.

(5)

$$\begin{equation} R P=\left\{\begin{array}{cl} -c, & 0<\frac{K_B}{K} \leq \lambda_a \\ 0, & \lambda_a <\frac{K_B}{K}<\lambda_b \\ \varphi\left(a_t\right) c \frac{K_B}{K}, & \frac{K_B}{K} \geq \lambda_b \end{array}\right. \end{equation}$$

$$\begin{equation} \lambda_a \end{equation}$$ is used to determine whether the committee member’s operation is invalid. If $$\begin{equation} \frac{K_B}{K} \end{equation}$$ is less than the threshold $$\begin{equation} \lambda_a \end{equation}$$, the representative’s operation will be judged invalid and all collateralized value will be forfeited. $$\begin{equation} \lambda_a \end{equation}$$ is determined based on historical data of malicious node behavior, ensuring ef- fective identification of inefficient or malicious operations. $$\begin{equation} \lambda_b \end{equation}$$ is used to determine whether the committee member’s operation is valid and eligible for rewards. If $$\begin{equation} \frac{K_B}{K} \end{equation}$$ is greater than the threshold $$\begin{equation} \lambda_a \end{equation}$$ and less than the threshold $$\begin{equation} \lambda_b \end{equation}$$, the operation of the representative will also be ruled invalid, but the collateralized value will not be forfeited. The reason for this is that the validation results may contain some errors that should not be attributed to the representative. If $$\begin{equation} \frac{K_B}{K} \end{equation}$$ is greater than the threshold $$\begin{equation} \lambda_b \end{equation}$$, the representative’s payoff is positively correlated with the mortgage value, the ratio of $$\begin{equation} \frac{K_B}{K} \end{equation}$$ , and the model accuracy $$\begin{equation} a_t \end{equation}$$. If the model accuracy drops significantly compared to the global model accuracy in the previous round, the value will be negative, i.e., the reward will change to a penalty. $$\begin{equation} \lambda_b \end{equation}$$ is also determined based on the distribution of efficient operations in historical data, ensuring that committee members are incentivized to actively participate and improve system performance. In practical applications, $$\begin{equation} \lambda_a \end{equation}$$ and $$\begin{equation} \lambda_b \end{equation}$$ can be dynamically adjusted based on network load and device status to balance system security and incentive effectiveness.

This incentive mechanism is designed to motivate the com- mittee members to perform their tasks honestly and efficiently, ensuring the accuracy of the global model and the overall reliability of the system. By incorporating asset staking, we encourage stakeholders to maintain the integrity of the ag- gregation process, which ultimately improves the quality and accuracy of the trained models.

C. Fed-Filt

1) Initialize: In the initialization phase, we consider the IIoT set of devices in the $$\begin{equation} i \end{equation}$$th subgroup as $$\begin{equation} N_{\mathcal{K S}}=\left\{N_{11}, \cdots, N_{K S}\right\}, \end{equation}$$ where $$$$ is the number of shards, and inline-formula>$$\begin{equation} S \end{equation}$$ is the number of IIoT devices within the shard. The set of this dataset is $$\begin{equation} D_{\mathcal{K S}}=\left\{D_{11}, \cdots, D_{K S}\right\}, \end{equation}$$, all local models are $$\begin{equation} w_{\mathcal{K S}}=\left\{w_{11}, \cdots, w_{K S}\right\}, \end{equation}$$, and the global initial model to be trained is $$\begin{equation} W^0 \end{equation}$$ (the subscripts denote the number of iteration rounds, and 0 means that no training has been performed yet).

2) Local model training within a shard: The joint commit- tee obtains the training task and publishes the training task to each shard. Devices $$\begin{equation} N_{k s} \end{equation}$$ in the shard train the model using the local dataset $$\begin{equation} D_{k s} \end{equation}$$ to obtain the model parameters inline-formula>$$\begin{equation} w \end{equation}$$. The goal of the training for device $$\begin{equation} N_{k s} \end{equation}$$ is to minimize $$\begin{equation} f_{k s}(w) \end{equation}$$:

(6)

$$\begin{equation} f_{k s}(w)=\frac{1}{n_k} \sum_{t=1}^{n_k} l\left(x_k, y_k, w\right) \end{equation}$$

$$\begin{equation} l\left(x_k, y_k, w\right) \end{equation}$$ represents the loss function of the current data sample, $$\begin{equation} n_k \end{equation}$$ represents the local dataset size, $$\begin{equation} x_k \end{equation}$$ is the feature vector, and $$\begin{equation} y_k \end{equation}$$ is the label vector.

Get the gradient $$\begin{equation} w^{t-1} \end{equation}$$ at the moment $$\begin{equation} t-1 \end{equation}$$, and each participating training node performs local stochastic gradient descent (SGD) to optimize its objective.

(7)

$$\begin{equation} w^t=w^{t-1}-\eta \nabla f_{k s}\left(w^{t-1}\right) \end{equation}$$

(8)

$$\begin{equation} \min l=\sum_{t=0}^n l\left(y_k, f\left(x_k\right)\right) \end{equation}$$

3) Intra-shard aggregation: When $$\begin{equation} N_{k s} \end{equation}$$ completes the local training, $$\begin{equation} N_{k s} \end{equation}$$ uploads the optimal model parameters $$\begin{equation} w_{k s} \end{equation}$$, and the leader node $$\begin{equation} N_k \end{equation}$$ elected by the shard will use the federated averaging algorithm to aggregate the best parameters in the shard.

(9)

$$\begin{equation} F(w)=\sum_{s=1}^{|S|} \frac{D_{k s}}{\sum_{i=1}^{|S|} D_{k i}} l_{k s}(w, x, y) . \end{equation}$$

The gradient aggregation process is currently the most commonly used FedAvg algorithm. The gradient $$\begin{equation} w^{t-1} \end{equation}$$ at the moment $$\begin{equation} t-1 \end{equation}$$ is first obtained, and each participating training node performs local stochastic gradient descent (SGD) to optimize its objective.

(10)

$$\begin{equation} w_{k s}^t=w_{k s}^{t-1}-\eta \nabla F_{k s}\left(w_{k s}^{t-1}\right) . \end{equation}$$

4) Global aggregation: The leader nodes of all the shards will upload the optimal parameters $$\begin{equation} \left\{w_1, \cdots, w_K\right\} \end{equation}$$to the joint committee for global aggregation, and $$$$$ denotes the number of shards. The Pearson correlation coefficient between the local model update in the current round and the global model update in the previous round is calculated to determine the performance of the local update. $$\begin{equation} w_k^\tau \end{equation}$$ is the local model param- eter of the kth shard in the $$\begin{equation} \tau \end{equation}$$th round of training, and $$\begin{equation} w_G^{\tau-1} \end{equation}$$ is the global aggregation model parameter of the previous $$\begin{equation} \tau-1 \end{equation}$$ round. According to the Pearson correlation coefficient algorithm, the Pearson correlation coefficient $$\begin{equation} r \end{equation}$$ between the local model and the global model in this round is:

(11)

$$\begin{equation} \begin{aligned} r & =\frac{\operatorname{cov}\left(w_k^\tau, W_G^{\tau-1}\right)}{\sigma_{w_k^\tau} \sigma_{W_G^{\tau-1}}} \\ & =\frac{E\left[\left(w_k^\tau-\mu_{w_k^\tau}\right)\left(W_G^{\tau-1}-\mu_{W_G^{\tau-1}}\right)\right]}{\sqrt{\sigma_{w_k^\tau} \sigma_{W_G^{\tau-1}}^2}} . \end{aligned} \end{equation}$$

At the beginning of training, local models may not be too similar to the global model. Therefore, if a large threshold is set at the beginning, too many local models may be filtered. Whereas if a small threshold is set at the beginning, some unnecessary models will not be filtered. We choose to set the threshold dynamically to the correlation at the location of the unfiltered models in the previous round. Assuming that the threshold after round $$\begin{equation} tau \end{equation}$$ is $$\begin{equation} \operatorname{th}(\tau) \end{equation}$$ and the correlation $$\begin{equation} r \end{equation}$$ between local and global models is less than $$\begin{equation} \operatorname{th}(\tau) \end{equation}$$, the model parameters are filtered and do not participate in the global aggregation.

The final global aggregation model is:

(12)

$$\begin{equation} W_G^\tau=\frac{\sum_{i=1}^{K_m} r_i \times S\left(\beta_k \times \frac{n_i}{n}\right) w_i^\tau}{\sum_{i=1}^{K_m} r_i \times S\left(\beta_k \times \frac{n_i}{n}\right)} . \end{equation}$$

$$\begin{equation} K_m \end{equation}$$ refers to the number of filtered shards, $$\begin{equation} r_i \end{equation}$$ is the model cor- relation coefficient of the $$\begin{equation} i \end{equation}$$th shard, and the scaling factors $$\begin{equation} \beta_k \end{equation}$$ and $$\begin{equation} S(x)=\frac{1}{1+e^{-x}} \end{equation}$$ are used to adjust the ratio of samples in shard $$\begin{equation} i \end{equation}$$ to the total samples.

IV. DRL-BASED DYNAMIC SHARDING SCHEME

A. Performance Analysis

The bottleneck of the mechanism is sharding, which strongly relies on the construction of the shards. Fixed sharding will limit the throughput of FL-enabled blockchain frameworks. In this section, the process of sharding and the blockchain parameter tuning are modeled as Markov decision- making process. By defining the state space, action space, and reward function, the MDP agents can achieve the joint optimization of the system throughput and the FL accuracy by maximizing the reward function.

The workflow consists of five phases: shard configuration, FL training, shard consensus, aggregation and broadcast. FL training and shard consensus belong to intra-shard, and inter- shard aggregation and broadcast. Therefore, we divide the processing time in the shard formation round into the intra- shard FL training time $$\begin{equation} T_{f e d}, \end{equation}$$, the intra-shard delay time $$\begin{equation} T_{c o n}^k \end{equation}$$ for $$\begin{equation} k \end{equation}$$ shards, and the consensus time $$\begin{equation} T_{c o n}^k \end{equation}$$ for the final consensus process. The DRL training process is executed only before shard formation, and the FL training process is started only after the shard formation. Then, if the correlation of the shard model parameters is too low, the FL full aggregation cannot be executed.

(13)

$$\begin{equation} \begin{aligned} T_{\text {shard }} & =T_{\text {con }}^k+T_{\text {fcon }}^k, \\ T_{\text {con }}^k & =T_{\text {prop }}^k+T_{\text {val }}^k+\tau_{\text {max }} T_{\text {fed }}^k, \\ T_{\text {fcon }}^m & =T_{\text {fprop }}^m+T_{\text {fval }}^m . \end{aligned} \end{equation}$$

Consensus delay includes message propagation and message validation delay. PBFT consensus is used within the shard, and in the pre-preparation phase, the leader node within the shard will broadcast to the remaining $$\begin{equation} S-1 \end{equation}$$ nodes. In the preparation phase, the $$\begin{equation} S-1 \end{equation}$$ nodes broadcast messages to all the nodes. And in the commit phase, all the nodes broadcast messages to each other,

(14)

$$\begin{equation} \begin{aligned} T_{p r o p}^k & =2 S(S-1) \frac{M}{R_t} \\ T_{v a l}^k & =3 T_v \end{aligned} \end{equation}$$

where $$\begin{equation} R_t \end{equation}$$, $$\begin{equation} T_v \end{equation}$$ and $$\begin{equation} M \end{equation}$$ are the message transmission rate, message verification time and message size for each stage.

Because of the model parameters for each shard are sent to the joint committee, the inter-shard delay $$\begin{equation} \begin{equation} T_{\text {fcon }}^k \end{equation} \end{equation}$$:

(15)

$$\begin{equation} T_{f c o n}^k=[K+2 K(K+1)] \frac{M}{R_t}. \end{equation}$$

$$\begin{equation} \begin{equation} T_{\text {fcon }}^k \end{equation} \end{equation}$$ should be limited to the duration of $$\begin{equation} \begin{equation} T_{\text {fcon }}^k \end{equation}. \end{equation}$$.

Algorithm 2	Fed-Filt
1:	procedure JOINT COMMITTEE EXECUTES
2:	initialize $$\begin{equation} W^0 \end{equation}$$
3:	for each round $$\begin{equation} \tau=1,2, \cdots, \tau_{\max } \end{equation}$$ do
4:	for each shard $$\begin{equation} k \in \mathcal{K} \end{equation}$$ do
5:	$$\begin{equation} w_k^{\tau+1} \leftarrow \operatorname{ShardUpdate}\left(k, W_G^\tau\right) \end{equation}$$
6:	end for
7:	$$\begin{equation} \left\{r_k\right\}_{k \in \mathcal{K}}^\tau \leftarrow \operatorname{GetScore}() \end{equation}$$ as equation(11)
8:	for each $$\begin{equation} r_k \end{equation}$$ do
9:	if $$\begin{equation} r_k>\operatorname{th}(\tau) \end{equation}$$ then
10:	Ditch $$\begin{equation} w_k^{\tau+1} \end{equation}$$ in this round
11:	end if
12:	end for
13:	$$\begin{equation} W_G^{\tau+1} \leftarrow \frac{\sum_{i=1}^{K_m} r_i \times S\left(\beta_k \times \frac{n_i}{n}\right) w_i^\tau}{\sum_{i=1}^{K m} r_i \times S\left(\beta_k \times \frac{n_i}{n}\right)} \end{equation}$$
14:	end for
15:	end procedure
16:	procedure SHARDUPDATE$$\begin{equation} \left(k, W_G^\tau\right) \end{equation}$$
17:	initialize batch size $$\begin{equation} B \end{equation}$$, local round $$\begin{equation} E \end{equation}$$
18:	for each client $$\begin{equation} s \end{equation}$$ in the shard $$\begin{equation} k \end{equation}$$ do
19:	for each local epoch $$\begin{equation} t \end{equation}$$ in $$\begin{equation} E \end{equation}$$ do
20:	for batch $$\begin{equation} b \end{equation}$$ samples based on $$\begin{equation} B \end{equation}$$ do
21:	$$\begin{equation} w_{k s}^t \leftarrow w_{k s}^{t-1}-\eta \nabla l\left(w_{k s}^{t-1}, b\right) \end{equation}$$
22:	end for
23:	end for
24:	end for
25:	$$\begin{equation} w_k \leftarrow \sum_{s=1}^S \frac{n_s}{n} w_{k s} \end{equation}$$
26:	end procedure

B. Model Designq

1) State Space: The state space at the discrete-time calen- dar element as the states of all IIoT devices N in the shard, the data transfer rate $$\begin{equation} R_t \end{equation}$$, the training time $$\begin{equation} T_fed \end{equation}$$, and the FL final model accuracy $$\begin{equation} A c c_{W_G} \end{equation}$$. Therefore, the state space can be represented as

(16)

$$\begin{equation} S^t=\left[N, R_t, T_{f e d}, A c c_{W_G}\right]^t \end{equation}$$

2) Action Space: To adapt to the dynamic environment, several measures are required. They include internal node balancing factor $$\begin{equation} B \end{equation}$$, leaf node balancing factor $$\begin{equation} L \end{equation}$$, cluster radius $$\begin{equation} \mathcal{T} \end{equation}$$, and data size $$\begin{equation} M \end{equation}$$. Thus, the action space can be expressed as

(17)

$$\begin{equation} A^t=[\mathrm{B}, \mathrm{~L}, \mathcal{T}, \mathrm{M}]^t . \end{equation}$$

3) Reward Function: The slice takes action to obtain the next state and gets a reward function based on the feedback. The reward function is used to maximize the blockchain throughput and the grouping quality of the mechanism, which can be expressed as

(18)

$$\begin{equation} \begin{array}{ll} P 1: & \max _A \mathcal{Q}(S, A), \\ C 1: & T_{\text {con }}^k \leq \rho T_{\text {fcon }}^k, \\ C 2: & K \leq K_{\max }, \\ C 3: & M \leq M_{\max } . \end{array} \end{equation}$$

$$\begin{equation} K=L \end{equation}$$ is the number of shards and $$\begin{equation} M \end{equation}$$ is the data size. If $$\begin{equation} C1 \end{equation}$$, $$\begin{equation} C2 \end{equation}$$, and $$\begin{equation} C3 \end{equation}$$ are all satisfied at round $$\begin{equation} t \end{equation}$$, then the instantaneous reward function is $$\begin{equation} R^t\left(S^t, A^t\right)=\mu_s S_\theta(t)+\mu_\omega \Omega(t) \end{equation}$$. If neither is satisfied, it is 0. $$\begin{equation} S_\theta(t) \end{equation}$$ is the evaluation metrics for IIoT user groupings, where $$\begin{equation} \theta=\{0,1,2, \cdots\} \end{equation}$$ denotes different evaluation metrics for clustered groupings, such as NMI, AMI, ARI, silhouette coefficient, $$\begin{equation} \Omega(t)=\frac{M K}{M_t T_{\text {eopch }}} \end{equation}$$ is the throughput of the sharded blockchain, and $$\begin{equation} \mu_s \end{equation}$$ and $$\begin{equation} \mu_w \end{equation}$$ are weighting coefficients representing the objective’s bias on grouping quality and throughput performance, which can be dynamically adjusted according to the actual scenario.

In order to receive the best long-term aggregation rewards,

(19)

$$\begin{equation} \begin{aligned} & \mathcal{Q}^*(S, A)= \\ & \quad \mathbb{E}_{\pi^*}\left[\sum_{t=0}^{\infty} \gamma^t R^t\left(S^t, A^t\right) \mid S^{(0)}=S, A^{(0)}=A, \pi\right] \end{aligned} \end{equation}$$

The temporal discount $$\begin{equation} \gamma \in[0,1] \end{equation}$$ determines how the agent views future rewards. Too small a value of $$\begin{equation} \gamma \end{equation}$$ will cause the agent to be short-term and seek to maximize short-term benefits, and a large value of $$\begin{equation} \gamma \end{equation}$$ will cause the agent to take a long-term view and maximize gains over a longer time frame. $$\begin{equation} \pi \end{equation}$$ is the behavioral strategy, and the goal is to find the optimal strategy $$\begin{equation} \pi^* \end{equation}$$.

V. SIMULATION RESULT

The simulation tool for this experiment is python3.9, tensorflow2.0, CPU is Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz, GPU is NVIDIA GeForce RTX 3060, and the dataset is Sensor stream contains information (temperature, humidity, light, and sensor voltage) collected from 54 sensors deployed in Intel Berkeley Research Lab [55], where the sensor IDs are used as class labeled real signatures, so that external evaluation metrics can be used as a comparison. In this we use a subset of 13477 as a block training.

TABLE I

PERFORMANCE COMPARISON.

Model	Accuracy(%)	Recall(%)	F1-score(%)
Birch-Kmeans	88.85	88.07	88.46
DBSCAN	73.07	71.09	72.08

Birch-Kmeans achieves significantly higher accuracy is 88.85% and recall is 88.07% compared to DBSCAN (73.07% and 71.09%). This is attributed to Birch-Kmeans preprocessing capability, which provides better initial cluster centers for Kmeans, improving clustering quality. Birch-Kmeans F1-score is 88.46% which is far exceeds DBSCAN (72.08%), demon- strating its balanced performance in precision and recall.

Fig. 2 shows the comparison of the quality of different subgroups under different grouping schemes. NMI measures the similarity between two subgroups based on the concept of information theory, which is able to take into account the mutual information and entropy of the subgroups, and a higher value indicates that the two subgroups are more similar. AMI is an improved version of NMI, which is able to more accurately measure the similarity of the subgroups by adjusting for the error generated by random factors, and a higher value indicates that the subgroups are more consis- tent. ARI calculates the relative consistency of groupings by

Fig. 2.

Comparison of evaluation metrics for different subgroups. (a) is a comparison of NMI, (b) is a comparison of AMI, (c) is a comparison of ARI, and (d) is a comparison of Silhouette Coefficient.

comparing the common points and differences between the actual groupings and the clustering results, with higher values indicating that the clustering results are more consistent with the actual groupings [30]. For DRL, we use Twin Delayed DDPG (TD3) algorithm [31], TD3 uses Twin Q-networks, which can estimate the action value function more accurately by maintaining two independent Q-networks, reducing the error estimation of the action value function, thus improving the accuracy of strategy evaluation and improving the accuracy of strategy evaluation in terms of the separation of the target strategy from the action strategy separation, adaptive target strategy noise and standardized action space, and other mech- anisms are improved, with better stability, convergence and explorability. We set $$\begin{equation} \gamma=0.99 \end{equation}$$, $$\begin{equation} t a u=0.005 \end{equation}$$, the learning rate as $$\begin{equation} l_r=0.0004 \end{equation}$$, the noise trimming as 0.5, the noise exploration range as 0.2, $$\begin{equation} \delta_0=0.4 \end{equation}$$, and the maximum number of shard $$\begin{equation} K \end{equation}$$ as 64, and after that, we respectively perform 100 episodes using our proposed method without birch preprocessing and using DBSCAN method as a comparison algorithm, and each round has 30 rounds of stepnum DRL training, it can be seen that all the algorithms converge successfully in the end. The preprocessing of birch algorithm makes the grouping quality of the initial k-means to be improved and speeds up the convergence of the training, and better training results are obtained. Also compared to DBSCAN, our algorithm is better not only in external evaluation metrics but also in internal evaluation metrics.

Fig. 3 and Fig. 4 show the accuracy of FL under the valida- tion of this sharding mechanism with the Skoltech Anomaly Benchmark (SKAB) dataset [32], which is an anomaly bench-

Fig. 3.

Model accuracy and loss within 25% malicious nodes.

mark for evaluating anomaly detection algorithms. For the experimental setup, we set the number of shards to 4, the number of devices within a slice to 13, the number of local training rounds $$\begin{equation} \tau_{\max }=40 \end{equation}$$, the number of global aggregation rounds to 100, and the learning rate $$\begin{equation} l_{r f e d}=0.02 \end{equation}$$. The comparison of Fed-Filt, MISO, and FedAvg algorithms under different proportions of malicious nodes further validates their performance. Fig. 3 is under 25% malicious nodes, and Fig. 4 is under 50% malicious nodes. In the experiments, due to low model correlation in the early stages of training, the Fed- Filt algorithm cannot completely filter out malicious nodes under the preset correlation threshold (e.g., 0.45). This leads to some accuracy degradation in the initial phase, as the model is still affected by malicious nodes. However, as the training of normal nodes progresses, the correlation of malicious nodes drops significantly, leading to their dynamic exclusion. This allows the model accuracy to recover and eventually stabilize at a high level.

In comparison, the MISO demonstrates a smoother declin- ing trend in the early stages, showing some robustness against interference. However, MISO falls short in resisting malicious nodes and recovering in the later stages, compared to Fed-Filt. Particularly in the final convergence phase, MISO’s accuracy is noticeably lower than Fed-Filt, indicating its limitations in dynamic filtering and model protection. By contrast, Fed- Filt’s dynamic filtering mechanism not only enables rapid recovery during the mid-training phase but also achieves higher accuracy in the final convergence stage.

On the other hand, FedAvg, which aggregates all local mod- els equally, fails to distinguish malicious nodes from normal ones effectively. This results in the model being persistently affected by malicious nodes, with accuracy significantly lower than both Fed-Filt and MISO. Overall, Fed-Filt demonstrates clear advantages in resisting poisoning attacks and recovery capabilities, while MISO, though better than FedAvg, does not match the performance of Fed-Filt.

VI. CONCLUSION

In this paper, we study the self-adaptive blockchain sharding strategy based on the grouping of IIoT devices, and the FL anomaly detection training is carried out when the grouping

Fig. 4.

Model accuracy and loss within 50% malicious nodes.

is completed. We innovate in the grouping algorithm, con- sensus mechanism and aggregation process to improve the security of FL training while considering the scalability of the sharded blockchain and the reasonableness of device grouping. Specifically, we adopt Birch for preprocessing to improve the convergence of K-means, and combine distance matrix and FL to aggregate secure nodes together to improve the security of sharding. A joint committee mechanism is used to improve the accuracy of the model by combining the filtering mechanism of global aggregation with FL. Self-adaptive DRL sharding mechanism is obtained by training with TD3 agent. The experimental results also show the effectiveness of the scheme. At the same time, because the blockchain system has to be re-partitioned in every round, there are problems such as high computational consumption and miner assets, so the transaction mechanism and the partitioning mechanism between the partitions have to be improved in the future. In this paper, we only consider model relevance as a screening criterion, and in the future, we will research to find more reasonable judging criteria to better defend against model poisoning attacks.

Biography

Song Luo

Song Luo received his Master degree in Communication and Information System from the Academy of Telecommunication Science and Technology in 2006. He is the deputy director of the Institute of Industrial Internet and Internet of Things of the China Academy of Information and Communications Technology, the head of the domestic counterpart group of the ITU-T Internet of Things and Smart City Study Group (SG20), and the vice chairman of the ITU-T SG20. He is currently a Ph.D. candidate in Tsinghua University, his research interests include Industrial Internet policies and technologies, basic common technical standards and industrial development of the Internet of Things.

Biography

Pengyi Zeng

Pengyi Zeng is recently studing for a Master’s degree in Electronic Engineering at Beijing University of Posts and Telecommunications. Her research areas are blockchain, industrial Internet of Things, and federal learning.

Biography

Chao Ma

Chao Ma received his Ph.D. in Communications Engineering from Aston University, Birmingham, UK in 2014. He joined the China Academy of Information and Communications Technology, as the Director of the Business Development Department of the Institute of Industrial Internet and Internet of Things in 2020. He is also the vice chairman of WP2 of the ITU-T SG20, the registered expert of ISO/IEC JTC1 SC41 IoT Technical Committee, the expert of W3C Web of Things, Research areas include IoT and smart city architecture and standards, blockchain technology and applications, industrial Internet.

Biography

Yifei Wei

Yifei Wei received the B.Sc. and Ph.D. degrees in Electronic Engineering from Beijing University of Posts and Telecommunications (BUPT, China), in 2004 and 2009, respectively. He was a visiting Ph.D. student in Carleton University (Canada) from 2008 to 2009. He was a postdoctoral research fellow in the Dublin City University (Ireland) in 2013. He was the vice dean of school of science in BUPT from 2014 to 2016. He was a visiting scholar in the University of Houston (USA) from 2016 to 2017. He is currently a Professor and the vice-dean of school of electronic engineering at BUPT. His current research interests are in intelligent optimization of network resources, deep learning and blockchain technology.

References

1. S. De, M. Bermudez-Edo, H. Xu, and Z. Cai, "Deep generative models in the industrial internet of things: A survey,"IEEETrans.Ind.Informat., vol. 18, no. 9, pp. 5728-5737, 2022.
2. M. Alabadi, A. Habbal, and X. Wei, "Industrial internet of things: Requirements, architecture, challenges, and future research directions," IEEE Access, vol. 10, pp. 66 374-66 400, 2022.
3. S. Misra, C. Roy, T. Sauter, A. Mukherjee, and J. Maiti, "Industrial internet of things for safety management applications: A survey," IEEE Access, vol. 10, pp. 83 415-83 439, 2022.
4. B. Chhetri, S. Gopali, R. Olapojoye, S. Dehbashi, and A. S. Namin, "A survey on blockchain-based federated learning and data privacy," in Proc. COMPSAC, 2023.
5. R. Huoetal., "A comprehensive survey on blockchain in industrial internet of things: Motivations, research progresses, and future challenges," IEEE Commun. Surveys Tuts., vol. 24, no. 1, pp. 88-122, 2022.
6. V . K, J. R, G. Kommineni, M. Tanna, and G. Prerna, "Sharding in blockchain systems: Concepts and challenges," in Proc. SMART GENCON, 2022.
7. H. Mao, T. Nie, H. Sun, D. Shen, and G. Yu, "A survey on crosschain technology: Challenges, development, and prospect,"IEEEAccess, vol. 11, pp. 45 527-45 546, 2023.
8. G. Santin, I. Skarbovsky, F. Fournier, and B. Lepri, "A framework for verifiable and auditable collaborative anomaly detection," IEEE Access, vol. 10, pp. 82 896-82 909, 2022.
9. A. Yazdinejad et al., "Block hunter: Federated learning for cyber threat hunting in blockchain-based iiot networks," IEEE Trans. Ind. Informat., vol. 18, no. 11, pp. 8356-8366, 2022.
10. Y . Xu et al., "Besifl: Blockchain-empowered secure and incentive federated learning paradigm in iot," IEEE Internet Things J., vol. 10, no. 8, pp. 6561-6573, 2023.
11. Z. Wang, Q. Hu, R. Li, M. Xu, and Z. Xiong, "Incentive mechanism design for joint resource allocation in blockchain-based federated learning," IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 5, pp. 1536-1547, 2023.
12. Y . Chen et al., "Dim-ds: Dynamic incentive model for data sharing in federated learning based on smart contracts and evolutionary game theory,"IEEEInternetThingsJ., vol. 9, no. 23, pp. 24 572-24 584, 2022.
13. T. Vats et al., "Explainable context-aware iot framework using human digital twin for healthcare," Multimedia Tools and Appl., pp. 1-25, 2023.
14. B. B. Gupta et al., "Blockchain-assisted secure fine-grained searchable encryption for a cloud-based healthcare cyber-physical system," IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 12, pp. 1877-1890, 2021.
15. G. N. Nguyenetal., "Secure blockchain enabled cyber-physical systems in healthcare using deep belief network with resnet model," J. Parallel Distrib. Comput., vol. 153, pp. 150-160, 2021.
16. Y . Yin, J. Jang-Jaccard, F. Sabrina, and J. Kwak, "Improving multilayerperceptron(mlp)-based network anomaly detection with birch clustering on cicids-2017 dataset," in Proc. IEEE CSCWD, 2023.
17. S. Vandanov, A. Plyasunov, and A. Ushakov, "Parallel clustering algorithm for the k-medoids problem in high-dimensional space for largescale datasets," in Proc. OPCS, 2023.
18. C. Pacher et al., "Awt - clustering meteorological time series using an aggregated wavelet tree," in Proc. IEEE DSAA, 2022.
19. L. Zhang, S. Jiang, X. Shen, B. B. Gupta, and Z. Tian, "Pwg-ids: an intrusion detection model for solving class imbalance in iiot networks using generative adversarial networks," arXiv preprint arXiv:2110.03445, 2021.
20. A. Raj and S. Prakash, "A privacy-preserving authentic healthcare monitoring system using blockchain," International J. Softw. Sci. Comput. Intell., vol. 14, no. 1, pp. 1-23, 2022.
21. C. Luo et al., "Fission: Autonomous, scalable sharding for iot blockchain," in Proc. IEEE COMPSAC, 2022.
22. X. Yang et al., "An overlapping self-organizing sharding scheme based on drl for large-scale iiot blockchain," IEEE Internet Things J., vol. 11, no. 4, pp. 5681-5695, 2024.
23. Y . Lin et al., "Drl-based adaptive sharding for blockchain-based federated learning," IEEE Trans. Commun., vol. 71, no. 10, pp. 5992-6004, 2023.
24. N. D. Sugiharto et al., "Mall customer clustering using gaussian mixture model, k-means, and birch algorithm," in Proc. ICOIACT, 2023.
25. Y . Tang, Y . Xie, C. Zhang, Z. Zhang, and W. Zhang, "One-step multiview subspace segmentation via joint skinny tensor learning and latent clustering," IEEE Trans. Cybern., vol. 52, no. 9, pp. 9179-9193, 2022.
26. M. Wati et al., "Implementation of the k-means clustering for the public health center data," in Proc. ICEECIT, 2022.
27. H. Shin, E. Jang, S. Kim, and K. Y . Lee, "Sdcg: Silhouette-based deep clustering with gnn for improved graph node clustering," in Proc. IEEE/ACIS SERA, 2023.
28. Zilliqa. The zilliqa technical whitepaper. 2017. (Online). Available: https://docs.zilliqa.com/whitepaper.pdf
29. X. Zhu, "Stream data mining repositoryr," http://www.cse.fau.edu/ xqzhu/stream.html, 2010.
30. Q. Li, R. Ren, and L. Zhao, "An adaptive spectral clustering algorithm for high-dimensional data," in Proc. CSECS, 2023.
31. Y . Cui, T. Lv, and Y . Cao, "Drl-based resource management in risassisted uplink cell-free network," in Proc. IEEE GC Wkshps, 2022.
32. I. D. Katser and V . O. Kozitsin, "Skoltech anomaly benchmark (skab)," https://www.kaggle.com/dsv/1693952, 2020.

Journal of Communications and Networks

Received: March 27 2024

Revision received: March 1 2025

Accepted: April 6 2025

Published (Electronic): April 30 2025

Song Luo, Tsinghua University, Department of Computer Science and Technology, and also with China Academy of Information and Communi- cations Technology (CAICT), Institute of Industrial Internet and Internet of Things, Beijing, China, luosong@caict.ac.cn.

Pengyi Zeng, School of Electronic Engineering, Beijing University of Posts and Telecommunications Ringgold standard institution, Beijing, China, zengpy@bupt.edu.cn

Chao Ma, China Academy of Information and Communications Technology (CAICT), Institute of Industrial Internet and Internet of Things, Beijing, China, machao@caict.ac.cn

Yifei Wei, School of Electronic Engineering, Beijing University of Posts and Telecommunications Ringgold standard institution, Beijing, China, weiyifei@bupt.edu.cn

Article information Continued

Copyright © 2025 JCN.
Open Access Policy: This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.