[Remote] Senior AI Software Engineer, Agent Systems
Note: The job is a remote job and is open to candidates in USA. Scale Army Careers is seeking a Senior AI Software Engineer specializing in Agent Systems. The role involves designing self-running agent loops and multi-agent swarms, ensuring systems operate autonomously while maintaining verification and oversight. The engineer will be responsible for building and shipping robust agent platforms that perform real work in production environments.
Responsibilities
- Design self-running loops. Define the trigger, scope, action, budget, stop condition, and reporting so an agent runs unattended, stays inside cost and iteration limits, and knows when it is done versus when to escalate
- Build multi-agent swarms. Orchestrator plus specialized agents with clear file and task ownership, shared state or a shared mailbox, quality gates between stages, and handoffs that do not step on each other
- Make verification first-class. Build the part of the system that can say no: the checks, evals, and reviewer agents that catch confident mistakes before they merge. A loop is only as trustworthy as its ability to check its own work
- Own agent state and memory. Persistent on-disk state and per-turn context assembly so long-running tasks survive restarts and the system does not forget what the repo already knows
- Ship the platform around the agents. APIs, services, queues, and integrations in TypeScript and Node, deployed to AWS, with real tests, tracing, and observability for long multi-iteration runs
- Keep humans in the loop where it counts. Plan approval and pull request review, and active management of comprehension debt so the team understands what the swarm ships, not just that it shipped
Skills
- Strong engineering fundamentals. 5+ years writing production software that other engineers depend on. (Adjustable; we care more about what you have shipped than the number.)
- Hands-on loop engineering. You have designed agent loops with explicit stop conditions, budgets, retries, and self-verification. You can explain the difference between a task on repeat and a real loop, and you know why the verifier matters as much as the maker
- Multi-agent or swarm experience. You have built or operated systems where multiple agents coordinate: orchestration, handoffs, shared state, ownership or locking, and quality gates
- Fluency with modern agent tooling. Claude Code or Codex style agents, sub-agents, persistent memory and skills files, tool and function calling, MCP, and reason-act-observe loop patterns
- Solid TypeScript and Node. Comfort with a service framework (NestJS or similar) and a typed data layer (Prisma or similar)
- Cloud and delivery. AWS (ECS or Fargate or similar), Docker, and CI/CD. You can take something from repo to production yourself
- A verification mindset. You treat 'done' as a claim to be proven, and you build the checks that prove it
- Running 10+ parallel agents and managing token and cost budgets at scale
- Distributed systems, queues, and event-driven design
- React for agent-facing interfaces
- Prior work on developer tooling, orchestration frameworks, or internal agent platforms
- Familiarity with where loop engineering is heading next, including continual learning systems
- SOC 2 or ISO 27001 awareness for handling client data
Benefits
- REMOTE
- This role is open to candidates based in LATAM, Africa, and Eastern Europe. Please note that as this role supports U.S.-based clients, candidates must be available to work during U.S. business hours aligned with the client’s time zone.
Company Overview