Role Overview
Crimson Energy is looking for a highly capable AI/ML Engineer with strong expertise in Natural Language Processing (NLP), deterministic algorithm design, and production-grade system development. This role requires a hybrid skill set combining classical algorithmic problem-solving with modern machine learning approaches, particularly transformer-based architectures such as BERT.
The candidate will be responsible for building end-to-end intelligence pipelines that process structured and unstructured data (including maritime and OSINT datasets), leveraging both deterministic logic and machine learning models. A key responsibility includes pretraining and fine-tuning BERT models for domain-specific use cases, ensuring high-performance and contextual accuracy.
Key Responsibilities
1. NLP & Machine Learning Development
- Design, develop, and optimize NLP models using transformer architectures (BERT, RoBERTa, DistilBERT, etc.) via Hugging Face.
- Pretrain and fine-tune BERT models using domain-specific corpora (e.g., maritime intelligence, OSINT data).
- Implement tasks such as:
- Named Entity Recognition (NER)
- Text classification and clustering
- Semantic search and similarity
- Event and anomaly detection
- Handle multilingual and noisy datasets with custom preprocessing and tokenization strategies.
2. Deterministic Algorithm Engineering
- Develop rule-based and deterministic pipelines for data processing, filtering, and event detection.
- Build hybrid systems combining rule-based logic with ML outputs for explainability and reliability.
- Optimize algorithms for real-time or near-real-time data ingestion and processing.
3. Data Pipeline & System Design
- Architect scalable data pipelines for ingestion, transformation, and model inference.
- Work with streaming and batch data sources (AIS, logs, APIs, OSINT feeds).
- Ensure robustness, fault tolerance, and performance optimization in pipelines.
4. Backend & API Development
- Build and maintain high-performance APIs using FastAPI for model serving and system integration.
- Develop microservices for NLP inference and algorithmic processing.
- Ensure secure and efficient API design with proper documentation.
5. Model Deployment & MLOps
- Deploy models in production environments using Docker and containerized workflows.
- Manage version control and collaboration using Git.
- Implement CI/CD pipelines for model and service deployment.
- Monitor model performance and implement retraining pipelines.
Required Technical Skills
Core AI/ML & NLP
- Strong experience with:
- Hugging Face Transformers
- BERT pretraining (MLM, NSP) and fine-tuning
- PyTorch (preferred) or TensorFlow
- Deep understanding of:
- Tokenization strategies (WordPiece, BPE)
- Embedding techniques and contextual representations
- Evaluation metrics (F1, precision, recall, perplexity)
Programming & Frameworks
- Python (advanced proficiency)
- FastAPI for backend/API development
- Experience with data processing libraries (Pandas, NumPy)
Deterministic Systems
- Strong foundation in algorithms and data structures
- Experience building rule-based systems and hybrid ML pipelines
DevOps & Deployment
- Git (version control, branching strategies)
- Docker (containerization and deployment)
- Basic understanding of:
- CI/CD pipelines
Data Engineering (Preferred)
- Experience with:
- Message queues (Kafka / RabbitMQ)
- Databases (PostgreSQL, Neo4j)
- Vector databases (Milvus, FAISS)
Preferred Qualifications
- Experience working with maritime, geospatial, or OSINT datasets.
- Understanding of knowledge graphs and entity linking.
- Exposure to distributed training and large-scale data processing.
- Familiarity with model optimization techniques (quantization, pruning).
What We Offer
- Opportunity to work on cutting-edge AI systems in real-world intelligence applications.
- High-impact role with ownership across the ML lifecycle.
- Collaborative and innovation-driven engineering environment.