为 torch.compile 启用缓存#

SGLang 使用 torch.compile 的 max-autotune-no-cudagraphs 模式。自动调优过程可能会很慢。如果你想在多台不同的机器上部署模型，可以将 torch.compile 的缓存分发到这些机器上，从而跳过编译步骤。

此方法基于 https://pytorch.ac.cn/tutorials/recipes/torch_compile_caching_tutorial.html

TORCHINDUCTOR_CACHE_DIR=/root/inductor_root_cache python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --enable-torch-compile