Our premier Silicon Valley AI startup client is hiring a Senior Software Engineer to design and implement high-performance compute kernels for their AI system. This role will include writing code for Linux kernel memory applications.
Note: All candidates MUST have at least 5 years of experience writing Linux kernel applications.
Responsibilities:
- Design and implement high-performance compute kernels for AI primitives such as GEMM, attention, normalization, and convolution.
- Optimize for throughput, latency, and memory hierarchy across heterogeneous compute units (SIMD, matrix engines, DMA).
- Collaborate with compiler and runtime teams to integrate kernels into Triton, PyTorch, or SYCL pipelines.
- Profile and tune kernels using tools like Perfetto, VTune, Tracy, or custom simulators.
- Prototype and evaluate precision formats (FP16/BF16/FP8/e5m2, etc.) and stochastic rounding.
- Contribute to micro-architecture feedback loops, helping co-design ISA and memory features with the hardware team.
- Write clear, well-structured, and reusable code (C++/CUDA/Triton/LLVM MLIR).
Requirements:
- Bachelor’s or Master’s in Computer Science, Computer Engineering, or a related field.
- 7+ years of experience writing code for Linux based kernel applications.
- Strong background in parallel programming (CUDA, Triton, SYCL, OpenCL, Metal, POSIX Threads, or OpenMP).
- Experience with optimization of irregular algorithms, such as graph computations or sparse numerical linear algebra, combining high-level data structure design with low-level SIMD and synchronization optimizations.
- Deep understanding of memory layout, vectorization, thread/block scheduling, and cache behavior.
- Proficiency in C++11 or higher, with strong knowledge of standard algorithms, data structures, and generic programming paradigms.
- Experience with code generation for high-performance computations and knowledge of frameworks like BLAS/BLIS/Torch
- Hands-on experience profiling and optimizing compute or AI workloads (e.g., GEMM, softmax, attention).
- Solid grasp of numerical stability, precision formats, and mixed precision arithmetic.
Visa Requirements:
- Will transfer H1-B and TN Visas for candidates that resides in the San Francisco Bay/Silicon Valley Area.
#linux #kernel #softwareengineering #cplus #artificialintelligence #startup #embeddedsystems #diversity #hiring #semiconductor
Job CateEgory: Artificial Intelligence Embedded System Engineering Linux Kernel Software Engineering Startup
Job Location: Silicon Valley