We include things like an inefficient reference PyTorch implementation in gpt_oss/torch/design.py. This code takes advantage of standard PyTorch operators to show the exact product architecture, with a small addition of supporting tensor parallelism in MoE so the greater design can operate using this type of code (e.Our only difficulty using this t… Read More