Multimodal Generative AI Engineer

ANI Calls India Private Limited

Hyderabad

1-5 Years

Save

Posted 7 hours ago
Over 50 applicants

Quick Apply

Job Description

About the Role

We are seeking a talented Multimodal Generative AI Engineer to design, build, and support intelligent applications that combine text, images, audio, video, and enterprise documents. The ideal candidate will collaborate with business, data, and engineering teams to deliver secure, scalable, and measurable AI solutions that leverage multimodal technologies.

Key Responsibilities

Design and develop multimodal AI applications integrating text, images, audio, video, and document data.
Build and optimize solutions using multimodal and vision-language models.
Develop and integrate APIs to enable seamless AI application workflows.
Design effective prompts and interaction strategies to improve model outputs and user experiences.
Collaborate with business stakeholders, data teams, and engineering teams to understand requirements and deliver AI solutions.
Evaluate model performance, accuracy, and user experience across multiple data modalities.
Implement scalable and secure AI pipelines for enterprise applications.
Support deployment, monitoring, and continuous improvement of multimodal AI systems.
Maintain documentation, experiment results, and best practices for model development and deployment.

Required Skills

Experience with multimodal AI models
Knowledge of vision-language models (VLMs)
Strong programming skills in Python
Experience developing and integrating APIs
Expertise in prompt design and optimization
Understanding of Generative AI and multimodal workflows

Experience Requirements

Up to 5 years of overall experience
Minimum 1–2 years of relevant hands-on experience in multimodal AI, Generative AI, computer vision, NLP, or related technologies