We are seeking an experienced Machine Learning Engineer / Data Scientist specializing in industrial time-series analytics to develop and deploy advanced AI solutions using OSIsoft PI System and SQL Server data. The role involves building scalable ETL pipelines, engineering high-frequency sensor data, and developing predictive models for anomaly detection, predictive maintenance, and process optimization. The ideal candidate will have strong expertise in machine learning, time-series modeling, SQL Server integration, and MLOps practices, with the ability to operationalize models in production environments for real-time industrial applications.
Key Responsibilities:
1. Industrial Time-Series Data Engineering & Integration
- Design and implement robust ETL/ELT pipelines that extract high-volume, high-velocity data from OSIsoft PI Tags/Events using PI AF, ODBC, etc.
- Perform complex feature engineering on time-series data, including handling irregular sampling intervals, sensor gaps, outliers, and noise filtering.
- Synchronize PI System data with structured relational data in SQL Server to create rich training datasets.
- Optimize data retrieval strategies from PI System to ensure low-latency access for model training and real-time inference.
2. Machine Learning Model Development
- Develop, train, and validate machine learning models specifically for industrial time-series problems, such as:
- Predictive Maintenance (remaining useful life, fault detection).
- Anomaly Detection in sensor streams.
- Process Parameter Optimization and Yield Prediction.
- Apply advanced statistical methods and ML algorithms (ARIMA, LSTM, XGBoost, Random Forest, Isolation Forests).
- Conduct extensive feature selection and dimensionality reduction techniques tailored to temporal dependencies.
3. SQL Server Integration & Deployment
- Write efficient T-SQL queries and stored procedures to aggregate, summarize, and join PI data with SQL Server tables.
- Deploy models into production environments, potentially leveraging SQL Server, deploying models via REST APIs integrated with SQL backends.
- Ensure seamless data flow between the PI System (historical/time-series) and SQL Server (transactional/relational) for model retraining pipelines.
4. MLOps & Operationalization
- Implement MLOps best practices for versioning time-series datasets and models.
- Monitor model performance and data drift, particularly accounting for changes in sensor behavior or process conditions.
Qualifications & Requirements:
- Academic: Bachelor's Degree in Computer Science or related fields.
- Experience: Minimum of 7 Years experience