Sr Staff Engineer – Quality Engineering - Infinia

Job Locations IN-MH-Pune
Job ID
2025-5510
Name Linked
Office: Pune
Country
India
City
Pune
Worker Type
Regular Full-Time Employee
Posting Location : State/Province
MH

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

 

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC 

 

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA 

 

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. 

 

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. 

 

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. 

Job Description

Role Overview

We are seeking a highly skilled and technically strong Senior Staff Quality Engineer to drive the end-to-end quality engineering efforts for Infinia, DDN’s large-scale distributed data platform.

In this role, you will be a senior technical authority responsible for designing, implementing, and validating complex test infrastructures that ensure the correctness, performance, and resilience of Infinia’s distributed architecture. You will work across core subsystems—including the I/O path, memory management, networking stack, scheduling layers, multi-tenant services, and NVMe-backed storage patterns—to ensure platform quality at scale.

This is a hands-on, high-impact IC role for someone who can solve hard problems, automate at scale, and elevate quality engineering across the organization.

Key Responsibilities

Quality Engineering & System Validation

  • Design detailed test strategies and validation plans for distributed system components such as task scheduling, tracing, memory, SPDK data path, and platform services.
  • Create scalable, automated test suites that validate multi-tenant behavior, concurrency, data consistency, and system-level performance.

Automation Frameworks & Tooling

  • Build and maintain robust automation using tools such as Pytest and container-based environments leveraging Docker, Jenkins, Kubernetes.
  • Develop reusable automation templates, harnesses, and utilities to accelerate test creation and reduce engineering overhead.

Performance, Reliability & Scale Testing

  • Construct and execute performance tests covering I/O throughput, system latency, NVMe access patterns, concurrency limits, and long-running workload stability.
  • Use advanced tools (profilers, fuzzers, failure-injection frameworks, trace analyzers) to uncover issues in distributed workflows.
  • Analyze CPU, memory, disk, and network utilization to diagnose performance bottlenecks and identify regression risks.

Cross-Functional Quality Leadership

  • Work closely with architects, developers, release engineering, DevOps, and customer engineering to drive quality-first design decisions.
  • Participate in feature design reviews, ensuring testability, observability, and resilience are built into system components.
  • Lead root cause analysis (RCA) for complex issues and propose long-term improvements to engineering practices and platform stability.

Documentation & Quality Standards

  • Produce clear, detailed test plans, automation guides, design-review feedback, and quality metrics reports.
  • Contribute to the development and maintenance of internal QA standards, best practices, and onboarding materials.

Required Qualifications

  • 10+ years of experience in software quality engineering, with strong focus on distributed systems, system-level testing, or infrastructure platforms.
  • Hands-on expertise in test automation using Python, Bash, and modern CI/CD tooling (Git, Jenkins, etc.).
  • Strong understanding of:
    • Distributed concurrency
    • File systems and I/O stack behavior
    • Storage performance analysis (NVMe, SPDK)
    • Networking, tracing, and system observability
  • Experience with large-scale performance testing, stress testing, and reliability validation.
  • Demonstrated skill in diagnosing complex system issues across logs, traces, network captures, and profiling tools.
  • ISTQB or equivalent certification preferred.

Preferred Qualifications

  • Experience validating large-scale data platforms, storage engines, or distributed scheduling systems.
  • Familiarity with observability technologies such as OpenTelemetry, Grafana, Prometheus.
  • Background in compliance or security testing (e.g., access control, backup/restore workflows, Section 508/HIPAA/PCI).
  • Contributions to open-source test frameworks or distributed systems validation tools.

Success Metrics – First 30 Days

Technical Ramp-Up

  • Develop a deep understanding of Infinia’s architecture, core subsystems, and existing quality gaps.
  • Deliver an assessment of current test coverage, automation maturity, and high-risk areas.

Early Impact

  • Implement or enhance a test automation component for a critical subsystem.
  • Identify 2–3 performance, reliability, or test infrastructure improvements and propose actionable plans.

Team Integration

  • Begin partnering with Dev, QE, Release, and SRE teams to integrate quality checks into design and implementation workflows.

Success Metrics – Beyond 30 Days

  • Increased automated coverage across core platform areas, including reliability, performance, and concurrency validations.
  • Measurable reduction in escaped defects, regressions, and late-cycle quality issues.
  • Introduction of new frameworks, tools, or validation approaches adopted by multiple teams.
  • Recognition across engineering as a go-to technical expert for distributed system quality, automation, and performance validation.

 

Join us to deliver the quality backbone of a world-class distributed platform—where scale, correctness, and reliability define success.

 

DDN

 

DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer.  All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed