Key Indicator Prediction

Prediction method of key indicators of industrial process based on LSTM, Transformer, and Continual Learning (CL) theory

© Haodong Li

Overview

Technology route and project development plan
  • In modern industrial processes, it is necessary to monitor the quality variables and critical variables of the production process to improve product quality and yield, maintain the safety of the production process, save energy, etc.
  • In this project, we collaborated with a company to apply cutting-edge AI techniques to predict key indicators by using the data provided by them on changes in the content of several chemical elements in iron in the blast furnace ironmaking production process; changes in production environment factors such as CO concentration, air supply temperature, etc.
  • We first analyzed the overall data and designed the corresponding data pre-processing method; the second step was to build a deep learning framework and build 4 benchmark models based on long and short term memory network (LSTM) for experiments; the third step was to build a model based on Transformer architecture for application to time series index prediction and conduct experiments; the fourth step was to build a model for continual learning temporal data prediction based on Multi-Head Self-Attention, 1-dimensional Residual Neural Network (ResNet), LSTM, and Multilayer Perceptron (MLP), with reference to the continual learning (CL) theory that mimics the hippocampus and neocortex of the human brain, and conduct experiments.
  • Finally, this project recorded and summarized the experimental results. For 6-dimensional key variable prediction, the 4 LSTM-based models built in this project can achieve an \(R^2\) accuracy of 0.9570 with an RMSE loss of 0.03959; the Transformer-based model has an effect of 0.9524 / 0.05201; the CL-based model results in 0.9474 / 0.05312. But in multi-step forecasting, when the time step is greater than 2, the \(R^2\) accuracy of the CL-based model is the highest, and it reaches 0.9146 when the time step is 3; when the time step is greater than 8, the RMSE loss of the CL-based model The lowest, which reaches 0.06789 at a step size of 9. In 1-dimensional quality variable prediction, the current best method (SoTA) results in 0.9334 / 0.03596; the best LSTM-based model built in this project results in 0.9024 / 0.03928; the Transformer- based model results in 0.9902 / 0.00923, outperforming SoTA by about 5.68%; CL- based model results in 0.9352 / 0.03656, which is almost equal to SoTA.
  • Code is available at: https://github.com/lebronlihd/key-indicator_prediction.
  • Keywords: Blast furnace ironmaking; Temporal data prediction; Continual Learning; Attention Mechanism; Transformer; LSTM; Residual Neural Network

Data Analysis & Pre-process

  • The dataset we used covers 29070 time steps of 115 dimensions, among them 6 are key indicators and the rest 109 are auxiliary indicators.
Hot metal Si (01) Hot metal S (53) Hot metal Mn (54)
Hot metal P (55) Hot metal C (56) Hot metal Ti (57)
Data characteristics & correlation distribution & frequency domain distribution of selected 6 key variables
  • The first row represents the data characteristics & correlation distribution among variables .
    • The upper part of the graph is the distribution line of the data, the yellow is the original data, and the red is the data after mean smoothing;
    • The lower half of the graph represents the distribution of correlation coefficients between this indicator and all 115 indicators.
  • Images on the second row are the frequency domain distribution of 6 key variables.
  • The data pre-processing method we applied is: Max-Min Normalization & numpy.nan_to_num.

LSTM-based Model Design

  • This project designed 4 variants based on LSTM, which are called Simple_LSTM, CNN_LSTM, ResNet_LSTM, and EfficientNetV2_LSTM, details as follows.
Simple_LSTM ResNet_LSTM
CNN_LSTM EfficientNetV2_LSTM: tf.keras.applications.EfficientNetV2S + LSTM
Basic architecture of Simple_LSTM, CNN_LSTM, ResNet_LSTM, and EfficientNetV2_LSTM

Transformer-based Model Design

  • Transformer is a major innovation in the field of deep learning in recent years. Although it was born for Neural Machine Translation (NMT), we believe that Transformer based on Multi-Head Self-Attention is very likely to be competent for key variables in the blast furnace ironmaking scene. problem of prediction. This chapter will introduce the Transformer-based model designed in this project.
  • As shown in figures below, the Transformer-based model built in this project adopts the form of Sequence-to-Sequence to design DataLoader, that is, the input data is shifted to the right and the prediction step is used as the corresponding target output. The sequence length is set to 50, which is consistent with 4 LSTM-based models and the CL-based model described below. But the difference is that the Transformer-based model contains two input channels,
    • one of which (Channel A) passes through Positional Encoding and Transformer Encoder, the data shape changes from 50×6 to 512×6, and enters as Key and Value Multi-Head Self-Attention of the first layer of Transformer Decoder;
    • another channel (Channel B) after Positional Encoding, the data shape changes to 512×6, and enters Multi-Head Self-Attention of the first layer of Transformer Decoder as a query. After the calculation of Transformer Decoder, the output data of 50×6 is obtained.
Overview of the Transformer-based model
Encoder (or Channel A) Decoder (or Channel B + Decoder)
Detailed architecture of the Transformer-based model

CL-based Model Design

  • The human brain can realize learning behavior through the two learning systems of “hippocampus” and “neocortex”. Among them, the hippocampus focuses on rapid learning; and through memory consolidation, memories in the hippocampus are transferred to the neocortex over time to form long-term memories. However, it should be noted that Continual Learning (CL) in the current academic field is mainly aimed at research in Domain Adaptation (DA) and Transfer Learning (TL) across datasets in the field of Compter Vision (CV), especially based on image classification such as CIFAR100. The nature of the data used in this project is special, and there are no good conditions for model transferability research. Therefore, the CL-based model introduced in this chapter is still a time-series data prediction model, but the model is inspired by CL. The main point is: the two branches of FastNet and SlowNet, study NewData and Memory respectively.
Overview of the CL-based model
  • Technically speaking, the CL-based model designed in this project mainly applies Multi-Head Self-Attention, 1-dim ResNet, LSTM and MLP. As shown in figures above and after, the sequence length of New Data in this CL-based model is set to 50; the sequence length of Memory is set to 500, and New Data will supplement Memory. Macroscopically, the CL-based model mainly consists of two branches, FastNet and SlowNet, each branch contains two sub-models, namely FastNet_1, FastNet_2; and SlowNet_1, SlowNet_2, and finally MLP_End outputs the prediction results of 6 key variables.
FastNet_1 FastNet_2 & SlowNet_2
SlowNet_1 MLP_End
Detailed architecture of the CL-based model

Experiments

Model RMSE Loss \(R^2\) Accuracy
CNN_LSTM 0.047456759959459305±3.58e-3 0.9457983374595642±6.59e-3
CL-based Model 0.053124434375849953±8.71e-4 0.9473923005326821±2.04e-3
Transformer-based Model 0.05200807997651065±1.50e-3 0.9524179648107557±2.88e-3
EfficientNetV2_LSTM 0.043179091066122055±2.08e-3 0.9531577825546265±3.95e-3
ResNet_LSTM 0.04068516939878464±2.65e-4 0.9558192491531372±3.10e-4
Simple_LSTM 0.0395905040204525±1.43e-4 0.9569856524467468±1.75e-4
Loss and Accuracy results on 6 key indicators prediction (time_step = 0, only values on the next time step is predicted)
\(R^2\) Accuracy 1 2 3 4 5 6 7
CNN_LSTM 0.908867 0.886703 0.899269 0.892492 0.887989 0.891364 0.887657
ResNet_LSTM 0.929079 0.913988 0.907212 0.900088 0.896830 0.889898 0.888573
Transformer-based Model 0.924479 0.905764 0.890927 0.887882 0.879788 0.877154 0.867402
EfficientNetV2_LSTM 0.926505 0.909907 0.865817 0.846057 0.893718 0.879946 0.888579
Simple_LSTM 0.932606 0.917350 0.908768 0.904000 0.899892 0.896070 0.893260
CL-based Model 0.926022 0.915719 0.914630 0.911664 0.907598 0.904705 0.905521
  8 9 10 11 12 13 14
CNN_LSTM 0.890357 0.852183 0.878856 0.876410 0.882148 0.876686 0.867974
ResNet_LSTM 0.887636 0.883562 0.879436 0.875042 0.879181 0.879674 0.873856
Transformer-based Model 0.868933 0.862485 0.859569 0.853113 0.847314 0.847392 0.846858
EfficientNetV2_LSTM 0.885395 0.888055 0.877716 0.880843 0.880393 0.868341 0.878926
Simple_LSTM 0.891410 0.887690 0.883684 0.883256 0.881757 0.882073 0.878427
CL-based Model 0.909347 0.908416 0.903857 0.905245 0.903743 0.903044 0.899642
  15 16 17 18 19 20  
CNN_LSTM 0.873978 0.860266 0.865002 0.864405 0.873603 0.872325  
ResNet_LSTM 0.875850 0.876290 0.867004 0.870304 0.869956 0.871067  
Transformer-based Model 0.846743 0.838788 0.840361 0.835966 0.836579 0.830244  
EfficientNetV2_LSTM 0.862367 0.871897 0.871534 0.878088 0.874382 0.864462  
Simple_LSTM 0.876781 0.877600 0.874004 0.872749 0.874959 0.871901  
CL-based Model 0.898448 0.902504 0.903457 0.900310 0.897716 0.890362  
Accuracy results on 6 key indicators prediction in multi time steps (1~20)
RMSE Loss 1 2 3 4 5 6 7
CNN_LSTM 0.062719 0.069669 0.065025 0.068296 0.070231 0.068054 0.069016
ResNet_LSTM 0.052612 0.058794 0.061268 0.063988 0.065415 0.068184 0.068628
Transformer-based Model 0.065352 0.073093 0.078722 0.079686 0.082578 0.083615 0.086860
EfficientNetV2_LSTM 0.054545 0.059823 0.077574 0.084122 0.065652 0.072862 0.068345
Simple_LSTM 0.050440 0.056559 0.060024 0.061961 0.063950 0.065591 0.066960
CL-based Model 0.062023 0.064925 0.065745 0.066262 0.067535 0.068989 0.068911
  8 9 10 11 12 13 14
CNN_LSTM 0.067706 0.081297 0.073171 0.073568 0.071251 0.073282 0.075785
ResNet_LSTM 0.069234 0.070170 0.072334 0.073452 0.071944 0.072439 0.074497
Transformer-based Model 0.086253 0.088425 0.089364 0.091480 0.093046 0.092977 0.093226
EfficientNetV2_LSTM 0.068981 0.068923 0.071301 0.071452 0.072011 0.075173 0.071815
Simple_LSTM 0.067253 0.068736 0.070218 0.070494 0.071119 0.070932 0.072555
CL-based Model 0.067742 0.067886 0.069549 0.068565 0.069679 0.069927 0.069841
  15 16 17 18 19 20  
CNN_LSTM 0.072892 0.079644 0.078324 0.077303 0.074049 0.074703  
ResNet_LSTM 0.073929 0.073369 0.076450 0.075966 0.075809 0.075544  
Transformer-based Model 0.093441 0.095786 0.095165 0.096494 0.096241 0.098178  
EfficientNetV2_LSTM 0.077215 0.074327 0.075223 0.071996 0.073958 0.077426  
Simple_LSTM 0.073133 0.073189 0.074096 0.074467 0.073924 0.075296  
CL-based Model 0.071422 0.070228 0.069699 0.070448 0.071425 0.073082  
Loss results on 6 key indicators prediction in multi time steps (1~20)
Accuracy trend Loss trend
Accuracy trend and loss trend of all models
Model RMSE Loss \(R^2\) Accuracy
CNN_LSTM 0.0405864343047142 0.8959161043167114
Simple_LSTM 0.039217736572027206 0.901961088180542
ResNet_LSTM 0.03927604481577873 0.9023586511611938
Baseline 0.03596 0.9334
CL-based Model 0.036562133335719144 0.9352364961074216
Transformer-based Model 0.009228735077959387 0.9901837524193436
Loss and Accuracy results on 1 key indicators prediction (only Hot metal Si (01) and time_step = 0). EfficientNetV2_LSTM requires the number of selected key variables must be divisible by 3 so it is not tested here
  • Here are the training log & prediction result visualization (take EfficientNetV2_LSTM with 6 key indicators scenario with time_step = 0 for example)
Training log of EfficientNetV2_LSTM
Hot metal Si (01) Hot metal S (53) Hot metal Mn (54)
Hot metal P (55) Hot metal C (56) Hot metal Ti (57)
Prediction result visualization of EfficientNetV2_LSTM
  • This project aims at the real-time prediction needs of key variables in the blast furnace ironmaking production scenario, and designed several algorithms based on deep learning methods, namely Simple_LSTM, CNN_LSTM, ResNet_LSTM, EfficientNetV2_LSTM, Transformer-based Model, and CL-based Model. The results show that these deep learning models designed in this project preformed well in the prediction of key variables in blast furnace ironmaking.