Biblio
Open Science Big Data is emerging as an important area of research and software development. Although there are several high quality frameworks for Big Data, additional capabilities are needed for Open Science Big Data. These include data provenance, citable reusable data, data sources providing links to research literature, relationships to other data and theories, transparent analysis/reproducibility, data privacy, new optimizations/advanced algorithms, data curation, data storage and transfer. An important part of science is explanation of results, ideally leading to theory formation. In this paper, we examine means for supporting the use of theory in big data analytics as well as using big data to assist in theory formation. One approach is to fit data in a way that is compatible with some theory, existing or new. Functional Data Analysis allows precise fitting of data as well as penalties for lack of smoothness or even departure from theoretical expectations. This paper discusses principal differential analysis and related techniques for fitting data where, for example, a time-based process is governed by an ordinary differential equation. Automation in theory formation is also considered. Case studies in the fields of computational economics and finance are considered.
Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.