Returning Candidate?

Director of Data and ML Engineering - Infinia

Job ID: 2024-4993
Name Linked: Remote: US
Country: United States
City: Remote
Worker Type: Regular Full-Time Employee

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

We are seeking an experienced and accomplished Director of Data and ML Engineering to lead our ML Engineering organization. In this role, you will oversee the design, deployment, and optimization of large-scale AI/ML training and inference pipelines using Infinia as foundational data storage as well as the development of connectors to main open-source frameworks for data ingestion and streaming, such as Delta Lake, Apache Iceberg, Mosaic Streaming, Ray Data. You will guide a talented organization of engineers focused on advanced end-to-end storage platform for data ingestion, transformation, preparation, and streaming on high-performance AI applications. Collaborating closely with software developers, product teams, and partners, you will lead experiments with state-of-the-art models using open-source tools and cloud platforms.

Key Responsibilities:

Leadership & Management:

Lead, mentor, and grow a team of senior ML and data engineers, fostering a culture of innovation and excellence.
Set strategic direction for the ML engineering team in alignment with company goals.
Lead strategic partnerships on all areas of AI, from conception to execution to delivering, communicating complex technical concepts to non-technical stakeholders effectively.
Track, report, and manage the team’s performance against project milestones, ensuring on-time delivery of high-quality solutions.
Partner with architects, engineers, and cross-functional teams to ensure the delivery of innovative, high-quality technical designs.
Implement and refine engineering best practices, driving continuous improvements in quality, performance, and operational efficiency.

Technical Oversight:

Oversee the design and deployment of large-scale AI/ML training pipelines utilizing tools like Apache Spark and Apache Airflow.
Guide the integration of MLflow with DDN’s Infinia product for comprehensive experiment tracking, model versioning, and deployment.
Lead the integration of data ingestion and streaming pipelines open-source tools, like Delta Lake, Apache Iceberg, Ray Data, Mosaic Streaming, Tf.data, Torch Dataloader.
Drive the implementation and scaling of Retrieval-Augmented Generation (RAG) pipelines to enhance generative model performance.
Stay abreast of the latest developments in MLOps, AI/ML frameworks, and tooling.
Identify and implement solutions to optimize pipeline performance, runtime, and resource utilization on Infinia.

Required Qualifications:

Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field.
12+ years of experience in machine learning engineering, with at least 10 years in a leadership role.
Proven track record of building and scaling AI/ML pipelines and managing high-performing engineering teams.
Extensive experience with Apache Spark, Apache Airflow, and MLflow or equivalent tools.
Deep understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo).
Experience deploying open-source vector databases at scale.
Proficiency with containerization tools (Docker, Kubernetes) and infrastructure as code (Terraform, Ansible).
Solid understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing.
Excellent problem-solving and troubleshooting abilities with a keen eye for performance optimization.
Strong leadership, communication, and interpersonal skills.
Ability to drive strategic initiatives and manage multiple projects simultaneously.

Preferred Skills:

Experience with large-scale data processing and storage solutions (Hadoop, Hive, HDFS, Trino).
Knowledge of NLP techniques and tools for model deployment.
Implementation-level understanding of ML frameworks, data loaders, data formats, and table formats.
Experience with scaling RAG pipelines and integrating them with generative AI models.
Experience in operationalizing AI/ML models in production environments.

This role offers an exceptional opportunity to lead a high-impact engineering organization at the core of DDN’s cutting-edge storage solutions. If you are passionate about solving complex technical challenges and driving innovation in high-performance systems, we encourage you to apply.

DDN

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

Coding assessment: Often in a language of your choice.
Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
Meet and greet with the wider team.
Our goal is to finish the main process in 2-3 weeks at most.

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote

Options

Apply for this job onlineApply

Refer this job to a friendRefer

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.

Share on your newsfeed

Application FAQs