Distributed Systems Engineer Job at Magic, San Francisco, CA

ZXJ3N0VQdWkzYmpaSHlvSFQ5TENEWHZj
  • Magic
  • San Francisco, CA

Job Description

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.

About the role:

As a distributed systems engineer, you will build the data and coordination systems that enable ultra-long context inference and training on Magic’s GPU clusters. 

What you might work on: 

  • High-performance storage and caching systems to support long-context inference and training

  • Hacking on the internals of deep learning frameworks in the distributed setting

  • Automating fault detection and recovery systems to enable highly available training

  • Troubleshooting complex issues across GPUs, network, storage, OS, and cloud environments.

What we’re looking for: 

  • Deep knowledge of distributed systems design and public cloud platforms

  • Experience designing and operating highly available, high-throughput data systems

  • Experience with the internals of distributed DBMS, batch and stream processing systems, and/or distributed file systems

  • Exceptional problem-solving skills up and down the stack

Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.

Our culture:

  • Integrity. Words and actions should be aligned

  • Hands-on. At Magic, everyone is building 

  • Teamwork. We move as one team, not N individuals

  • Focus. Safely deploy AGI. Everything else is noise

  • Quality. Magic should feel like magic

Compensation, benefits and perks (US):

  • Annual salary range: $100K - $550K

  • Equity is a significant part of total compensation, in addition to salary

  • 401(k) plan with 6% salary matching

  • Generous health, dental and vision insurance for you and your dependents

  • Unlimited paid time off

  • Visa sponsorship and relocation stipend to bring you to SF, if possible

  • A small, fast-paced, highly focused team

Job Tags

Remote job, Relocation,

Similar Jobs

ACS Consultancy Services

Network Security Engineer Job at ACS Consultancy Services

 ...candidates who meet the following qualifications Qualifications: Experience in administering and engineering Palo Alto and/or Fortinet firewall solutions. Hands-on experience with centralized firewall management platforms such as Panorama and/or FortiManager.... 

Advanced Recovery Systems

Psychiatrist- Medical Director Job at Advanced Recovery Systems

 ...leading the medical staff, dietitians, and physician services coordinator (where applicable) while building a collaborative team with nursing and clinical departments. Provides excellent medical care with the ARS philosophy outlined by the Chief Medical Officer. This... 

Better Talent

Liability Claims Adjuster Job at Better Talent

 ...COMPANY OVERVIEW: Proper Insurance is a specialty property & casualty program manager (MGA) focused on short-term rental risks (e.g., Airbnb, VRBO). Were known for white-glove claims service and pragmatic underwriting. If you like high standards, direct communication,... 

C&S Wholesale Grocers

Fleet Maintenance Manager Job at C&S Wholesale Grocers

 ...Position Overview The Fleet Maintenance Manager oversees the distribution center garage and directs a team of skilled mechanics responsible for inspecting, maintaining, and repairing tractors, trailers, automobiles, and light trucks. This role ensures all equipment... 

Openkyber

Cisco Firepower Engineer Job at Openkyber

 ...environments. Configure, administer, and maintain Cisco FTD, Cisco ISE, Meraki, Catalyst, and Nexus platforms. Manage Fortinet FortiGate firewalls, including NAT, VPNs, ACLs, and network segmentation. Operate and optimize Cisco Wireless Controllers...