Key Indicator Prediction

Overview
Data Analysis & Pre-process
LSTM-based Model Design
Transformer-based Model Design
CL-based Model Design
Experiments

Overview

Technology route and project development plan

In modern industrial processes, it is necessary to monitor the quality variables and critical variables of the production process to improve product quality and yield, maintain the safety of the production process, save energy, etc.
In this project, we collaborated with a company to apply cutting-edge AI techniques to predict key indicators by using the data provided by them on changes in the content of several chemical elements in iron in the blast furnace ironmaking production process; changes in production environment factors such as CO concentration, air supply temperature, etc.
We first analyzed the overall data and designed the corresponding data pre-processing method; the second step was to build a deep learning framework and build 4 benchmark models based on long and short term memory network (LSTM) for experiments; the third step was to build a model based on Transformer architecture for application to time series index prediction and conduct experiments; the fourth step was to build a model for continual learning temporal data prediction based on Multi-Head Self-Attention, 1-dimensional Residual Neural Network (ResNet), LSTM, and Multilayer Perceptron (MLP), with reference to the continual learning (CL) theory that mimics the hippocampus and neocortex of the human brain, and conduct experiments.
Finally, this project recorded and summarized the experimental results. For 6-dimensional key variable prediction, the 4 LSTM-based models built in this project can achieve an \(R^2\) accuracy of 0.9570 with an RMSE loss of 0.03959; the Transformer-based model has an effect of 0.9524 / 0.05201; the CL-based model results in 0.9474 / 0.05312. But in multi-step forecasting, when the time step is greater than 2, the \(R^2\) accuracy of the CL-based model is the highest, and it reaches 0.9146 when the time step is 3; when the time step is greater than 8, the RMSE loss of the CL-based model The lowest, which reaches 0.06789 at a step size of 9. In 1-dimensional quality variable prediction, the current best method (SoTA) results in 0.9334 / 0.03596; the best LSTM-based model built in this project results in 0.9024 / 0.03928; the Transformer- based model results in 0.9902 / 0.00923, outperforming SoTA by about 5.68%; CL- based model results in 0.9352 / 0.03656, which is almost equal to SoTA.
Code is available at: https://github.com/lebronlihd/key-indicator_prediction.
Keywords: Blast furnace ironmaking; Temporal data prediction; Continual Learning; Attention Mechanism; Transformer; LSTM; Residual Neural Network

Data Analysis & Pre-process

The dataset we used covers 29070 time steps of 115 dimensions, among them 6 are key indicators and the rest 109 are auxiliary indicators.

Hot metal Si (01)	Hot metal S (53)	Hot metal Mn (54)


Hot metal P (55)	Hot metal C (56)	Hot metal Ti (57)

Data characteristics & correlation distribution & frequency domain distribution of selected 6 key variables

The first row represents the data characteristics & correlation distribution among variables .
- The upper part of the graph is the distribution line of the data, the yellow is the original data, and the red is the data after mean smoothing;
- The lower half of the graph represents the distribution of correlation coefficients between this indicator and all 115 indicators.
Images on the second row are the frequency domain distribution of 6 key variables.
The data pre-processing method we applied is: Max-Min Normalization & numpy.nan_to_num.

LSTM-based Model Design

This project designed 4 variants based on LSTM, which are called Simple_LSTM, CNN_LSTM, ResNet_LSTM, and EfficientNetV2_LSTM, details as follows.

`Simple_LSTM`	`ResNet_LSTM`

`CNN_LSTM`	`EfficientNetV2_LSTM`: `tf.keras.applications.EfficientNetV2S` + LSTM

Basic architecture of Simple_LSTM, CNN_LSTM, ResNet_LSTM, and EfficientNetV2_LSTM

Transformer-based Model Design

Transformer is a major innovation in the field of deep learning in recent years. Although it was born for Neural Machine Translation (NMT), we believe that Transformer based on Multi-Head Self-Attention is very likely to be competent for key variables in the blast furnace ironmaking scene. problem of prediction. This chapter will introduce the Transformer-based model designed in this project.
As shown in figures below, the Transformer-based model built in this project adopts the form of Sequence-to-Sequence to design DataLoader, that is, the input data is shifted to the right and the prediction step is used as the corresponding target output. The sequence length is set to 50, which is consistent with 4 LSTM-based models and the CL-based model described below. But the difference is that the Transformer-based model contains two input channels,
- one of which (Channel A) passes through Positional Encoding and Transformer Encoder, the data shape changes from 50×6 to 512×6, and enters as Key and Value Multi-Head Self-Attention of the first layer of Transformer Decoder;
- another channel (Channel B) after Positional Encoding, the data shape changes to 512×6, and enters Multi-Head Self-Attention of the first layer of Transformer Decoder as a query. After the calculation of Transformer Decoder, the output data of 50×6 is obtained.

Overview of the Transformer-based model

Encoder (or Channel A)	Decoder (or Channel B + Decoder)

Detailed architecture of the Transformer-based model

CL-based Model Design

The human brain can realize learning behavior through the two learning systems of “hippocampus” and “neocortex”. Among them, the hippocampus focuses on rapid learning; and through memory consolidation, memories in the hippocampus are transferred to the neocortex over time to form long-term memories. However, it should be noted that Continual Learning (CL) in the current academic field is mainly aimed at research in Domain Adaptation (DA) and Transfer Learning (TL) across datasets in the field of Compter Vision (CV), especially based on image classification such as CIFAR100. The nature of the data used in this project is special, and there are no good conditions for model transferability research. Therefore, the CL-based model introduced in this chapter is still a time-series data prediction model, but the model is inspired by CL. The main point is: the two branches of FastNet and SlowNet, study NewData and Memory respectively.

Overview of the CL-based model

Technically speaking, the CL-based model designed in this project mainly applies Multi-Head Self-Attention, 1-dim ResNet, LSTM and MLP. As shown in figures above and after, the sequence length of New Data in this CL-based model is set to 50; the sequence length of Memory is set to 500, and New Data will supplement Memory. Macroscopically, the CL-based model mainly consists of two branches, FastNet and SlowNet, each branch contains two sub-models, namely FastNet_1, FastNet_2; and SlowNet_1, SlowNet_2, and finally MLP_End outputs the prediction results of 6 key variables.

`FastNet_1`	`FastNet_2` & `SlowNet_2`

`SlowNet_1`	`MLP_End`

Detailed architecture of the CL-based model

Experiments

Model	RMSE Loss	\(R^2\) Accuracy
`CNN_LSTM`	0.047456759959459305±3.58e-3	0.9457983374595642±6.59e-3
CL-based Model	0.053124434375849953±8.71e-4	0.9473923005326821±2.04e-3
Transformer-based Model	0.05200807997651065±1.50e-3	0.9524179648107557±2.88e-3
`EfficientNetV2_LSTM`	0.043179091066122055±2.08e-3	0.9531577825546265±3.95e-3
`ResNet_LSTM`	0.04068516939878464±2.65e-4	0.9558192491531372±3.10e-4
`Simple_LSTM`	0.0395905040204525±1.43e-4	0.9569856524467468±1.75e-4

Loss and Accuracy results on 6 key indicators prediction (time_step = 0, only values on the next time step is predicted)

\(R^2\) Accuracy	1	2	3	4	5	6	7
`CNN_LSTM`	0.908867	0.886703	0.899269	0.892492	0.887989	0.891364	0.887657
`ResNet_LSTM`	0.929079	0.913988	0.907212	0.900088	0.896830	0.889898	0.888573
Transformer-based Model	0.924479	0.905764	0.890927	0.887882	0.879788	0.877154	0.867402
`EfficientNetV2_LSTM`	0.926505	0.909907	0.865817	0.846057	0.893718	0.879946	0.888579
`Simple_LSTM`	0.932606	0.917350	0.908768	0.904000	0.899892	0.896070	0.893260
CL-based Model	0.926022	0.915719	0.914630	0.911664	0.907598	0.904705	0.905521
	8	9	10	11	12	13	14
`CNN_LSTM`	0.890357	0.852183	0.878856	0.876410	0.882148	0.876686	0.867974
`ResNet_LSTM`	0.887636	0.883562	0.879436	0.875042	0.879181	0.879674	0.873856
Transformer-based Model	0.868933	0.862485	0.859569	0.853113	0.847314	0.847392	0.846858
`EfficientNetV2_LSTM`	0.885395	0.888055	0.877716	0.880843	0.880393	0.868341	0.878926
`Simple_LSTM`	0.891410	0.887690	0.883684	0.883256	0.881757	0.882073	0.878427
CL-based Model	0.909347	0.908416	0.903857	0.905245	0.903743	0.903044	0.899642
	15	16	17	18	19	20
`CNN_LSTM`	0.873978	0.860266	0.865002	0.864405	0.873603	0.872325
`ResNet_LSTM`	0.875850	0.876290	0.867004	0.870304	0.869956	0.871067
Transformer-based Model	0.846743	0.838788	0.840361	0.835966	0.836579	0.830244
`EfficientNetV2_LSTM`	0.862367	0.871897	0.871534	0.878088	0.874382	0.864462
`Simple_LSTM`	0.876781	0.877600	0.874004	0.872749	0.874959	0.871901
CL-based Model	0.898448	0.902504	0.903457	0.900310	0.897716	0.890362

Accuracy results on 6 key indicators prediction in multi time steps (1~20)

RMSE Loss	1	2	3	4	5	6	7
`CNN_LSTM`	0.062719	0.069669	0.065025	0.068296	0.070231	0.068054	0.069016
`ResNet_LSTM`	0.052612	0.058794	0.061268	0.063988	0.065415	0.068184	0.068628
Transformer-based Model	0.065352	0.073093	0.078722	0.079686	0.082578	0.083615	0.086860
`EfficientNetV2_LSTM`	0.054545	0.059823	0.077574	0.084122	0.065652	0.072862	0.068345
`Simple_LSTM`	0.050440	0.056559	0.060024	0.061961	0.063950	0.065591	0.066960
CL-based Model	0.062023	0.064925	0.065745	0.066262	0.067535	0.068989	0.068911
	8	9	10	11	12	13	14
`CNN_LSTM`	0.067706	0.081297	0.073171	0.073568	0.071251	0.073282	0.075785
`ResNet_LSTM`	0.069234	0.070170	0.072334	0.073452	0.071944	0.072439	0.074497
Transformer-based Model	0.086253	0.088425	0.089364	0.091480	0.093046	0.092977	0.093226
`EfficientNetV2_LSTM`	0.068981	0.068923	0.071301	0.071452	0.072011	0.075173	0.071815
`Simple_LSTM`	0.067253	0.068736	0.070218	0.070494	0.071119	0.070932	0.072555
CL-based Model	0.067742	0.067886	0.069549	0.068565	0.069679	0.069927	0.069841
	15	16	17	18	19	20
`CNN_LSTM`	0.072892	0.079644	0.078324	0.077303	0.074049	0.074703
`ResNet_LSTM`	0.073929	0.073369	0.076450	0.075966	0.075809	0.075544
Transformer-based Model	0.093441	0.095786	0.095165	0.096494	0.096241	0.098178
`EfficientNetV2_LSTM`	0.077215	0.074327	0.075223	0.071996	0.073958	0.077426
`Simple_LSTM`	0.073133	0.073189	0.074096	0.074467	0.073924	0.075296
CL-based Model	0.071422	0.070228	0.069699	0.070448	0.071425	0.073082

Loss results on 6 key indicators prediction in multi time steps (1~20)

Accuracy trend	Loss trend

Accuracy trend and loss trend of all models

Model	RMSE Loss	\(R^2\) Accuracy
`CNN_LSTM`	0.0405864343047142	0.8959161043167114
`Simple_LSTM`	0.039217736572027206	0.901961088180542
`ResNet_LSTM`	0.03927604481577873	0.9023586511611938
Baseline	0.03596	0.9334
CL-based Model	0.036562133335719144	0.9352364961074216
Transformer-based Model	0.009228735077959387	0.9901837524193436

Loss and Accuracy results on 1 key indicators prediction (only Hot metal Si (01) and time_step = 0). EfficientNetV2_LSTM requires the number of selected key variables must be divisible by 3 so it is not tested here

Here are the training log & prediction result visualization (take EfficientNetV2_LSTM with 6 key indicators scenario with time_step = 0 for example)

Training log of EfficientNetV2_LSTM

Hot metal Si (01)	Hot metal S (53)	Hot metal Mn (54)

Hot metal P (55)	Hot metal C (56)	Hot metal Ti (57)

Prediction result visualization of EfficientNetV2_LSTM

This project aims at the real-time prediction needs of key variables in the blast furnace ironmaking production scenario, and designed several algorithms based on deep learning methods, namely Simple_LSTM, CNN_LSTM, ResNet_LSTM, EfficientNetV2_LSTM, Transformer-based Model, and CL-based Model. The results show that these deep learning models designed in this project preformed well in the prediction of key variables in blast furnace ironmaking.