© Haodong Li
Overview
Technology route and project development plan
- In modern industrial processes, it is necessary to monitor the quality variables
and critical variables of the production process to improve product quality and yield,
maintain the safety of the production process, save energy, etc.
- In this project, we
collaborated with a company to apply cutting-edge AI techniques to predict key
indicators by using the data provided by them on changes in the content of
several chemical elements in iron in the blast furnace ironmaking production process;
changes in production environment factors such as CO concentration, air supply
temperature, etc.
- We first analyzed the overall data and designed the
corresponding data pre-processing method; the second step was to build
a deep learning framework and build 4 benchmark models based on long and
short term memory network (LSTM) for experiments; the third step
was to build a model based on Transformer architecture for application to time
series index prediction and conduct experiments; the fourth step was to
build a model for continual learning temporal data prediction based on Multi-Head
Self-Attention, 1-dimensional Residual Neural Network (ResNet), LSTM,
and Multilayer Perceptron (MLP), with reference to the continual learning (CL)
theory that mimics the hippocampus and neocortex of the human brain, and
conduct experiments.
- Finally, this project recorded and summarized the
experimental results. For 6-dimensional key variable prediction, the 4 LSTM-based models built in this project can achieve an \(R^2\) accuracy of 0.9570 with an
RMSE loss of 0.03959; the Transformer-based model has an effect of 0.9524 / 0.05201;
the CL-based model results in 0.9474 / 0.05312.
But in multi-step forecasting, when
the time step is greater than 2, the \(R^2\)
accuracy of the CL-based model is the highest,
and it reaches 0.9146 when the time step is 3; when the time step is greater than 8,
the RMSE loss of the CL-based model The lowest, which reaches 0.06789 at a
step size of 9.
In 1-dimensional quality variable
prediction, the current best method (SoTA) results in 0.9334 / 0.03596; the best
LSTM-based model built in this project results in 0.9024 / 0.03928; the Transformer-
based model results in 0.9902 / 0.00923, outperforming SoTA by about 5.68%; CL-
based model results in 0.9352 / 0.03656, which is almost equal to SoTA.
- Code is available at: https://github.com/lebronlihd/key-indicator_prediction.
- Keywords: Blast furnace ironmaking; Temporal data prediction; Continual Learning;
Attention Mechanism; Transformer; LSTM; Residual Neural Network
Data Analysis & Pre-process
- The dataset we used covers 29070 time steps of 115 dimensions, among them 6 are key indicators and the rest 109 are auxiliary indicators.
Hot metal Si (01) |
Hot metal S (53) |
Hot metal Mn (54) |
|
|
|
|
|
|
Hot metal P (55) |
Hot metal C (56) |
Hot metal Ti (57) |
|
|
|
|
|
|
Data characteristics & correlation distribution & frequency domain distribution of selected 6 key variables
- The first row represents the data characteristics & correlation distribution among variables .
- The upper part of the graph is the distribution line of the data, the yellow is the original data, and the red is the data after mean smoothing;
- The lower half of the graph represents the distribution of correlation coefficients between this indicator and all 115 indicators.
- Images on the second row are the frequency domain distribution of 6 key variables.
- The data pre-processing method we applied is: Max-Min Normalization &
numpy.nan_to_num
.
LSTM-based Model Design
- This project designed 4 variants based on LSTM, which are called
Simple_LSTM
, CNN_LSTM
, ResNet_LSTM
, and EfficientNetV2_LSTM
, details as follows.
Simple_LSTM |
ResNet_LSTM |
|
|
CNN_LSTM |
EfficientNetV2_LSTM : tf.keras.applications.EfficientNetV2S + LSTM |
|
|
Basic architecture of Simple_LSTM
, CNN_LSTM
, ResNet_LSTM
, and EfficientNetV2_LSTM
- Transformer is a major innovation in the field of deep learning in recent years. Although it was born for Neural Machine Translation (NMT), we believe that Transformer based on Multi-Head Self-Attention is very likely to be competent for key variables in the blast furnace ironmaking scene. problem of prediction. This chapter will introduce the Transformer-based model designed in this project.
- As shown in figures below, the Transformer-based model built in this project adopts the form of Sequence-to-Sequence to design DataLoader, that is, the input data is shifted to the right and the prediction step is used as the corresponding target output. The sequence length is set to 50, which is consistent with 4 LSTM-based models and the CL-based model described below. But the difference is that the Transformer-based model contains two input channels,
- one of which (Channel A) passes through Positional Encoding and Transformer Encoder, the data shape changes from 50×6 to 512×6, and enters as Key and Value Multi-Head Self-Attention of the first layer of Transformer Decoder;
- another channel (Channel B) after Positional Encoding, the data shape changes to 512×6, and enters Multi-Head Self-Attention of the first layer of Transformer Decoder as a query. After the calculation of Transformer Decoder, the output data of 50×6 is obtained.
Overview of the Transformer-based model
Encoder (or Channel A) |
Decoder (or Channel B + Decoder) |
|
|
Detailed architecture of the Transformer-based model
CL-based Model Design
- The human brain can realize learning behavior through the two learning systems of “hippocampus” and “neocortex”. Among them, the hippocampus focuses on rapid learning; and through memory consolidation, memories in the hippocampus are transferred to the neocortex over time to form long-term memories. However, it should be noted that Continual Learning (CL) in the current academic field is mainly aimed at research in Domain Adaptation (DA) and Transfer Learning (TL) across datasets in the field of Compter Vision (CV), especially based on image classification such as CIFAR100. The nature of the data used in this project is special, and there are no good conditions for model transferability research. Therefore, the CL-based model introduced in this chapter is still a time-series data prediction model, but the model is inspired by CL. The main point is: the two branches of FastNet and SlowNet, study NewData and Memory respectively.
Overview of the CL-based model
- Technically speaking, the CL-based model designed in this project mainly applies Multi-Head Self-Attention, 1-dim ResNet, LSTM and MLP. As shown in figures above and after, the sequence length of New Data in this CL-based model is set to 50; the sequence length of Memory is set to 500, and New Data will supplement Memory. Macroscopically, the CL-based model mainly consists of two branches, FastNet and SlowNet, each branch contains two sub-models, namely FastNet_1, FastNet_2; and SlowNet_1, SlowNet_2, and finally MLP_End outputs the prediction results of 6 key variables.
FastNet_1 |
FastNet_2 & SlowNet_2 |
|
|
SlowNet_1 |
MLP_End |
|
|
Detailed architecture of the CL-based model
Experiments
Model |
RMSE Loss |
\(R^2\) Accuracy |
CNN_LSTM |
0.047456759959459305±3.58e-3 |
0.9457983374595642±6.59e-3 |
CL-based Model |
0.053124434375849953±8.71e-4 |
0.9473923005326821±2.04e-3 |
Transformer-based Model |
0.05200807997651065±1.50e-3 |
0.9524179648107557±2.88e-3 |
EfficientNetV2_LSTM |
0.043179091066122055±2.08e-3 |
0.9531577825546265±3.95e-3 |
ResNet_LSTM |
0.04068516939878464±2.65e-4 |
0.9558192491531372±3.10e-4 |
Simple_LSTM |
0.0395905040204525±1.43e-4 |
0.9569856524467468±1.75e-4 |
Loss and Accuracy results on 6 key indicators prediction (time_step
= 0, only values on the next time step is predicted)
\(R^2\) Accuracy |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
CNN_LSTM |
0.908867 |
0.886703 |
0.899269 |
0.892492 |
0.887989 |
0.891364 |
0.887657 |
ResNet_LSTM |
0.929079 |
0.913988 |
0.907212 |
0.900088 |
0.896830 |
0.889898 |
0.888573 |
Transformer-based Model |
0.924479 |
0.905764 |
0.890927 |
0.887882 |
0.879788 |
0.877154 |
0.867402 |
EfficientNetV2_LSTM |
0.926505 |
0.909907 |
0.865817 |
0.846057 |
0.893718 |
0.879946 |
0.888579 |
Simple_LSTM |
0.932606 |
0.917350 |
0.908768 |
0.904000 |
0.899892 |
0.896070 |
0.893260 |
CL-based Model |
0.926022 |
0.915719 |
0.914630 |
0.911664 |
0.907598 |
0.904705 |
0.905521 |
|
8 |
9 |
10 |
11 |
12 |
13 |
14 |
CNN_LSTM |
0.890357 |
0.852183 |
0.878856 |
0.876410 |
0.882148 |
0.876686 |
0.867974 |
ResNet_LSTM |
0.887636 |
0.883562 |
0.879436 |
0.875042 |
0.879181 |
0.879674 |
0.873856 |
Transformer-based Model |
0.868933 |
0.862485 |
0.859569 |
0.853113 |
0.847314 |
0.847392 |
0.846858 |
EfficientNetV2_LSTM |
0.885395 |
0.888055 |
0.877716 |
0.880843 |
0.880393 |
0.868341 |
0.878926 |
Simple_LSTM |
0.891410 |
0.887690 |
0.883684 |
0.883256 |
0.881757 |
0.882073 |
0.878427 |
CL-based Model |
0.909347 |
0.908416 |
0.903857 |
0.905245 |
0.903743 |
0.903044 |
0.899642 |
|
15 |
16 |
17 |
18 |
19 |
20 |
|
CNN_LSTM |
0.873978 |
0.860266 |
0.865002 |
0.864405 |
0.873603 |
0.872325 |
|
ResNet_LSTM |
0.875850 |
0.876290 |
0.867004 |
0.870304 |
0.869956 |
0.871067 |
|
Transformer-based Model |
0.846743 |
0.838788 |
0.840361 |
0.835966 |
0.836579 |
0.830244 |
|
EfficientNetV2_LSTM |
0.862367 |
0.871897 |
0.871534 |
0.878088 |
0.874382 |
0.864462 |
|
Simple_LSTM |
0.876781 |
0.877600 |
0.874004 |
0.872749 |
0.874959 |
0.871901 |
|
CL-based Model |
0.898448 |
0.902504 |
0.903457 |
0.900310 |
0.897716 |
0.890362 |
|
Accuracy results on 6 key indicators prediction in multi time steps (1~20)
RMSE Loss |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
CNN_LSTM |
0.062719 |
0.069669 |
0.065025 |
0.068296 |
0.070231 |
0.068054 |
0.069016 |
ResNet_LSTM |
0.052612 |
0.058794 |
0.061268 |
0.063988 |
0.065415 |
0.068184 |
0.068628 |
Transformer-based Model |
0.065352 |
0.073093 |
0.078722 |
0.079686 |
0.082578 |
0.083615 |
0.086860 |
EfficientNetV2_LSTM |
0.054545 |
0.059823 |
0.077574 |
0.084122 |
0.065652 |
0.072862 |
0.068345 |
Simple_LSTM |
0.050440 |
0.056559 |
0.060024 |
0.061961 |
0.063950 |
0.065591 |
0.066960 |
CL-based Model |
0.062023 |
0.064925 |
0.065745 |
0.066262 |
0.067535 |
0.068989 |
0.068911 |
|
8 |
9 |
10 |
11 |
12 |
13 |
14 |
CNN_LSTM |
0.067706 |
0.081297 |
0.073171 |
0.073568 |
0.071251 |
0.073282 |
0.075785 |
ResNet_LSTM |
0.069234 |
0.070170 |
0.072334 |
0.073452 |
0.071944 |
0.072439 |
0.074497 |
Transformer-based Model |
0.086253 |
0.088425 |
0.089364 |
0.091480 |
0.093046 |
0.092977 |
0.093226 |
EfficientNetV2_LSTM |
0.068981 |
0.068923 |
0.071301 |
0.071452 |
0.072011 |
0.075173 |
0.071815 |
Simple_LSTM |
0.067253 |
0.068736 |
0.070218 |
0.070494 |
0.071119 |
0.070932 |
0.072555 |
CL-based Model |
0.067742 |
0.067886 |
0.069549 |
0.068565 |
0.069679 |
0.069927 |
0.069841 |
|
15 |
16 |
17 |
18 |
19 |
20 |
|
CNN_LSTM |
0.072892 |
0.079644 |
0.078324 |
0.077303 |
0.074049 |
0.074703 |
|
ResNet_LSTM |
0.073929 |
0.073369 |
0.076450 |
0.075966 |
0.075809 |
0.075544 |
|
Transformer-based Model |
0.093441 |
0.095786 |
0.095165 |
0.096494 |
0.096241 |
0.098178 |
|
EfficientNetV2_LSTM |
0.077215 |
0.074327 |
0.075223 |
0.071996 |
0.073958 |
0.077426 |
|
Simple_LSTM |
0.073133 |
0.073189 |
0.074096 |
0.074467 |
0.073924 |
0.075296 |
|
CL-based Model |
0.071422 |
0.070228 |
0.069699 |
0.070448 |
0.071425 |
0.073082 |
|
Loss results on 6 key indicators prediction in multi time steps (1~20)
Accuracy trend |
Loss trend |
|
|
Accuracy trend and loss trend of all models
Model |
RMSE Loss |
\(R^2\) Accuracy |
CNN_LSTM |
0.0405864343047142 |
0.8959161043167114 |
Simple_LSTM |
0.039217736572027206 |
0.901961088180542 |
ResNet_LSTM |
0.03927604481577873 |
0.9023586511611938 |
Baseline |
0.03596 |
0.9334 |
CL-based Model |
0.036562133335719144 |
0.9352364961074216 |
Transformer-based Model |
0.009228735077959387 |
0.9901837524193436 |
Loss and Accuracy results on 1 key indicators prediction (only Hot metal Si (01) and time_step
= 0). EfficientNetV2_LSTM
requires the number of selected key variables must be divisible by 3 so it is not tested here
- Here are the training log & prediction result visualization (take
EfficientNetV2_LSTM
with 6 key indicators scenario with time_step
= 0 for example)
Training log of EfficientNetV2_LSTM
Hot metal Si (01) |
Hot metal S (53) |
Hot metal Mn (54) |
|
|
|
Hot metal P (55) |
Hot metal C (56) |
Hot metal Ti (57) |
|
|
|
Prediction result visualization of EfficientNetV2_LSTM
- This project aims at the real-time prediction needs of key variables in the blast furnace ironmaking production scenario, and designed several algorithms based on deep learning methods, namely
Simple_LSTM
, CNN_LSTM
, ResNet_LSTM
, EfficientNetV2_LSTM
, Transformer-based Model, and CL-based Model. The results show that these deep learning models designed in this project preformed well in the prediction of key variables in blast furnace ironmaking.