Process Data Analytics: Leveraging the Data Revolution


  • Sirish L. Shah
  • Bhushan Gopaluni


  • Bhushan Gopaluni, University of British Columbia, Canada
  • Sirish L. Shah, University of Alberta, Canada
  • Biao Huang, University of Alberta, Canada
  • Alf Isaksson, ABB Future Labs, Sweden
  • Manbu Kano, Kyoto University, Japan
  • Arun Tangirala, IIT Madras, India
  • Nina Thornhill, Imperial College, UK


We are currently at the cusp of the fourth industrial revolution (4IR) or Industry 4.0 that is poised to reshape all the sectors of economy and society with an unprecedented depth and breadth. Process industries are in a unique position to benefit from Industry 4.0, as they have the right infrastructure, and are in possession of massive amounts of heterogeneous industrial data. Industry 4.0 is poised to provide economic and competitive advantages in the face of ever-increasing demands on energy, environment and quality by providing a level of automation and efficiency never seen before. The process industries have been using data analytics in various forms for more than three decades. In particular, statistical techniques, such as principal components analysis (PCA), partial least squares (PLS), canonical variate analysis (CVA); and time-series methods for modelling, such as maximum-likelihood and prediction-error methods have been successfully applied on industrial data. Recent developments in artificial intelligence, machine learning and advanced analytics provide new openings for leveraging industrial data for solving complex systems engineering problems.
The emphasis in this tutorial workshop will be on tools and techniques that help in the process of understanding data and discovering information that will lead to predictive monitoring and diagnosis of process faults, design of soft-sensors, process performance monitoring and on-line modeling methods.
Highly interconnected process plants are now common and monitoring and analysis of root causes of process abnormality including predictive risk analysis is non-trivial. It is the extraction of information from the fusion of process data, alarm and event data and process connectivity that should form the backbone of a viable process data analytics strategy and this will be the main focus of this tutorial workshop.



The emphasis in this workshop will be on tools and techniques that help with understanding data and discovering information and patterns in routine process data. The objective is to deliver a coherent and coordinated work flow for the audience to know what tools to use when and the pitfalls to avoid.

The goal is to inform the audience about how to accomplish the following steps to succeed in an analytics project that will ultimately lead to predictive monitoring and diagnosis of process faults, design of soft-sensors, process performance monitoring and on-line modeling methods:

  • Define the problem and ask the right questions; define clear objectives;
  • Get good data in context of the problem;
  • Get to know your data: visualize, explore and analyze;
  • Find the features that affect the outcome of interest;
  • Build meaningful models for soft-sensing; process and performance monitoring;
  • Make the model operational and maintain these models.

Towards this end the workshop will discuss the following commonly used methodologies for data analytics supplemented with successful industrial case studies. In any analytic project there are always some “do’s” and “don’t’s” and these too will be discussed.

  • Data visualization and quality checks; steps in data ingestion and data management;
  • Unsupervised learning using classical clustering methods such as kNN (k nearest neighbours); Principal Components Analysis (PCA);
  • Supervised learning using:  Multivariate linear regression and its variants including LASSO; Logistic regression; Classification and Regression Trees (CART) including Random Forests; Support Vector Classification and Regression methods; Gaussian Process Regression, kernel methods, model maintenance and feature extraction;
  • Causality analysis and process topology reconstruction methods;  
  • Reinforcement Learning methods


  • Industry 4.0 and analytics: A Vendor’s perspective, Alf Isaksson
  • Data checks and preparation; unsupervised learning and clustering analysis with industrial application(s); Lessons learnt from the application of analytics over the last 2 decades, Nina Thornhill
  • Data Visualization with examples; Broad overview of supervised learning; Classification and regression trees; Random Forests with applications. Support vector machines with applications, Sirish Shah
  • Soft-sensor design; Preliminary analysis; Do’s and don’ts; Image based soft-sensors with industrial applications, Biao Huang
  • Integration of domain knowledge and data analytics for process modeling and optimization with engineering and medical applications, Manabu Kano
  • Deep and Reinforcement learning and the future of analytics, Bhushan Gopaluni
  • Causality analysis for reconstruction of process network / topology, Arun Tangirala
  • Panel discussion, All speakers + more:
    • Data science education for undergraduate or graduate school?
    • Is industry hard-wired and ready with ‘digital plumbing’ to do analytics?
    • What software tools are ready for deployment?
    • Other topics