Do you know how to take existing AI models and create end-to-end applications? If yes, welcome : ) If not, sorry…
Basemark is an AR, automotive, defense and compute & graphics software product company (say that 5 times fast). We are based in Helsinki, with satellite offices in Germany and the US, and a team of around 80 stellar human beings worldwide. Real-time 3D graphics, GPU-compute programming and design are what we do, and we ship the platform that productizes that expertise into, wow.
The RoleYou will help us build production AI agent systems that handle complex, multi-step, end-to-end engineering workflows. As a part of this role, you’ll design, build & operate agentic AI systems that accelerate product development, running autonomously against real backlogs & codebases.
Let’s break it down like a fraction, you will
- Design multi-agent architectures that decompose ambiguous tasks, gather missing context, plan, execute, and self-verify
- Integrate agents with full SDLC: ticketing, version control, CI/CD, code review tooling
- Build eval harnesses, traces & dashboards to understand & optimize agent performance
- Define human-in-loop boundary: what agents do alone, what to escalate, what to hand back
- Optimize for cost, latency, and reliability at production scale
- Partner with Product, Engineering & DevOps teams to ship safely
- 5+ years shipping any production software; you’ve owned services, not just notebooks
- Fluent in Python, and at least one statically typed language (C++, Rust, TypeScript etc.)
- Deep working knowledge of Git workflows, GitHub / Azure DevOps Pipelines, code review practices, and testing pyramid
- Ability to read & modify unfamiliar codebases, tooling, infra-as-code & build systems
- Hands-on experience building with production agent framework (Claude Agent SDK, LangGraph, OpenAI Agents SDK, AutoGen, CrewAI, or equivalent)
- Solid grasp of tool use / function calling, structured outputs, and modern agent planning and orchestration patterns
- Excellent understanding of Context engineering & prompt engineering techniques
- Built evals for non-deterministic systems and run them as regression suites in CI
- Comfortable with tracing observability tooling for LLM systems; you treat agent runs as debuggable artifacts
- You can measure & reduce hallucinations, tool-use failures, and silent regressions
- You can design & ship long-running asynchronous systems with human-in-loop approval
- Experience with RAG architectures, fine-tuning, or hybrid retrieval pipelines
- Background in developer tooling, IDEs, or platform engineering
- Contributions to open-source agent frameworks or evals tooling
- Familiarity with EU AI Act and enterprise compliance constraints
