
Search by job, company or skills
Our Threat Research Team's mission is aggressive: achieve near-total coverage of global breach and leak data with 99%+ automation. Your work directly enables HEROIC's ability to identify exposures before they are weaponized.
Architect and operate large-scale, distributed crawling and discovery systems across:
Surface web, deep web, and dark web
Hacker forums, underground marketplaces, and breach communities
Chat platforms (Telegram, Discord, IRC, WhatsApp, etc.)
Paste sites, code repositories, and social platforms used for breach disclosure
Continuously discover, archive, and download newly released datasets, logs, credentials, and artifacts the moment they appear
Build automated collectors and archivers for anonymized and decentralized networks including:
Tor (.onion), I2P, ZeroNet, Freenet, IPFS, GNUnet, Lokinet, Yggdrasil, and similar systems
Design resilient workflows for unreliable, adversarial, or ephemeral data sources
Normalize and index data from non-traditional network protocols and formats
Develop automated scanning systems to identify:
Unsecured databases (Elasticsearch, MySQL, PostgreSQL, MongoDB, etc.)
Exposed cloud storage (S3, Azure, GCP, DigitalOcean Spaces)
Open FTP servers, backups, and misconfigured archives
Monitor and ingest data from file hosting and distribution platforms commonly used for breach dumps
Build ETL pipelines to clean, normalize, enrich, and index structured and unstructured data
Implement advanced anti-bot evasion strategies (proxy rotation, fingerprinting, CAPTCHA mitigation, session management)
Integrate collected intelligence into centralized databases and search systems
Design APIs and internal tooling to support downstream analysis and AI/ML workflows
Implement advanced anti-bot, evasion, and resiliency techniques (proxy rotation, fingerprinting, CAPTCHA mitigation, session handling)
Automate deployment, scaling, and monitoring using Docker, Kubernetes, and cloud infrastructure
Continuously optimize performance, reliability, and cost efficiency of crawler clusters
Minimum 4 years of hands-on experience in data engineering, intelligence collection, crawling, or distributed data pipelines
Strong Python expertise and experience with frameworks such as Scrapy, Playwright, Selenium, or custom async systems
Proven experience operating high-volume, automated data collection systems in production
Deep understanding of web protocols, HTTP, DOM parsing, and adversarial scraping environments
Experience with asynchronous, concurrent, and distributed architectures
Familiarity with SQL and NoSQL databases (PostgreSQL, MongoDB, Elasticsearch, Cassandra)
Strong Linux/Unix, shell scripting, and Git-based workflows
Experience deploying and operating systems using Docker, Kubernetes, AWS, or GCP
Excellent analytical, debugging, and problem-solving skills
Strong written and verbal communication skills.
Direct experience with dark web intelligence, breach data, OSINT, or threat research
Familiarity with Tor, I2P, underground forums, stealer logs, or credential ecosystems
Experience processing large breach datasets or stealer logs
Background working in adversarial data environments
Exposure to AI/ML-driven intelligence platforms
Job ID: 146838359