This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.
"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC
“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA
DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job Description
Location: Pune, India
About the Role
We are looking for a Principal DevOps Engineer to join our high-impact team in Pune, India.
You will lead the design and implementation of scalable, secure, and highly available
infrastructure across both cloud and on-premise environments. This role demands a deep
understanding of Linux systems, infrastructure automation, and performance tuning,
especially in high-performance computing (HPC) setups.
As a technical leader, you’ll collaborate closely with development, QA, and operations teams
to drive DevOps best practices, tool adoption, and overall infrastructure reliability.
Key Responsibilities
• Design, build, and maintain Linux-based infrastructure across cloud (primarily AWS) and physical data centers.
• Implement and manage Infrastructure as Code (IaC) using tools such as CloudFormation, Terraform, Ansible, and Chef.
• Develop and manage CI/CD pipelines using Jenkins, Git, and Gerrit to support continuous delivery.
• Automate provisioning, configuration, and software deployments with Bash, Python,Ansible, etc.
• Set up and manage monitoring/logging systems like Prometheus, Grafana, and ELK stack.
• Optimize system performance and troubleshoot critical infrastructure issues related to networking, filesystems, and services.
• Configure and maintain storage and filesystems including ext4, xfs, LVM, NFS, iSCSI, and potentially Lustre.
• Manage PXE boot infrastructure using Cobbler/Kickstart, and create/maintain custom ISO images.
• Implement infrastructure security best practices, including IAM, encryption, and firewall policies.
• Act as a DevOps thought leader, mentor junior engineers, and recommend tooling and process improvements.
• Maintain clear and concise documentation of systems, processes, and best practices.
• Collaborate with cross-functional teams to ensure reliable and scalable application delivery.
Required Skills & Experience
• 9+ years of experience in DevOps, SRE, or Infrastructure Engineering.
• Deep expertise in Linux system administration, especially around storage, networking, and process control.
• Strong proficiency in scripting (e.g., Bash, Python) and configuration management tools (Chef, Ansible).
• Proven experience in managing on-premise data center infrastructure, including provisioning and PXE boot tools.
• Familiar with CI/CD systems, Agile workflows, and Git-based source control (Gerrit/GitHub).
• Experience with cloud services, preferably AWS, and hybrid cloud models.
• Knowledge of virtualization (e.g., KVM, Vagrant) and containerization (Docker, Podman,Kubernetes).
• Excellent communication, collaboration, and documentation skills.
Nice to Have
• Hands-on with Lustre or other distributed/parallel filesystems.
• Experience in HPC (High-Performance Computing) environments.
• Familiarity with Kubernetes deployments in hybrid clusters.
Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.
Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:
DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
Software Powered by iCIMS
www.icims.com