Job Description

AI Inference Engineer – Stealth Startup | San Fransisco Onsite

Compensation: $200K–$300K + equity

Join a stealth-stage team backed by prominent academic research and successful technical founders, working at the bleeding edge of AI infrastructure. As generative AI continues to scale rapidly, the bottleneck is no longer training—it’s inference. This team is rebuilding the core systems that power inference, from kernel-level GPU optimizations to full-stack distributed deployment.

This role is ideal for engineers who want to go deep: working on quantization, KV caching, attention mechanisms like FlashAttention, and designing new strategies for parallelism across heterogeneous compute. You'll contribute to an integrated software-hardware stack that enables large-scale model deployment with dramatically improved performance, efficiency, and quality—at production scale.

What You’ll Be Doing:

Research and implement state-of-the-art techniques to improve AI model inference speed and quality
Architect and optimize distributed AI infrastructure across both GPU kernel and software layers
Profile, benchmark, and debug system performance across varied hardware environments
Drive improvements in model execution through compiler-level tuning, caching, and runtime strategies

What They’re Looking For:

Bachelor's degree in Computer Science, Engineering, Applied Math, or a related field
Strong experience with performance optimization and systems-level thinking
Proficiency in Python, C++, and CUDA
Familiarity with AI frameworks like PyTorch, TensorFlow, ONNX, or vLLM

Nice to Have:

Graduate degree in a technical field
Experience with MLIR or other compiler frameworks
Hands-on work with large-scale GPU infrastructure or custom kernels

This is a hands-on, foundational role in a fast-moving environment, offering the chance to shape the backbone of the next generation of AI systems.

Job Tags

Similar Jobs

Stanford University, Pediatric Critical Care Medicine

Administrative Associate Job at Stanford University, Pediatric Critical Care Medicine

...The Divisions of Pediatric Critical Care, Pediatric Pulmonary, Asthma, and Sleep Medicine, and Quality of Life and Pediatric Palliative Care at Stanford University are seeking a detail-oriented Administrative Associate 2 to provide comprehensive administrative support...

Vibrissa Beer

Brewery Server Job at Vibrissa Beer

...$23-27 per hour Applicants must have flexible and weekend availability Craft Beer and Food! Vibrissa Beer is a production brewery and kitchen right on historic Main Street, Front Royal, Virginia. We have a beautiful facility and a full-service kitchen and taproom...

Phoenix Resource Group, LLC

Maintenance Supervisor (3rd Shift) Job at Phoenix Resource Group, LLC

...proficient problem solving skills. 5. Must be proficient and have organizational skills and be able to manage multiple projects and maintenance situations. 6. Understanding of PLC logic controls 7. Understanding of VFD and motor control 8. Must be able to read and...

Staffing Science

Product Designer Job at Staffing Science

...emphasis on mobile platforms. This platform will be used by millions of users in the travel/tourism/event space and will have a chance to design this new platform from the ground up. We take pride in creating products that our passionate community members truly love. If...

LHH

Call Center Customer Service Representative Job at LHH

Job Title: Call Center Account Management Specialist Location: Hybrid - El Segundo, CA Company Overview: Join our dynamic team where we are dedicated to providing exceptional customer service and support. We are looking for a motivated and detail-oriented Call Center...

AI Inference Engineer Job at Signify Technology, Santa Clara, CA

VDJmalJhOHZtMm5PM1hCRnU5emNuY0FZSXc9PQ==