离线引擎 API#

SGLang 提供了一个直接的推理引擎,无需 HTTP 服务器,尤其适用于附加 HTTP 服务器会增加不必要的复杂性或开销的用例。这里有两种一般用例

  • 离线批量推理

  • 构建在引擎之上的自定义服务器

本文档重点介绍离线批量推理,展示了四种不同的推理模式

  • 非流式同步生成

  • 流式同步生成

  • 非流式异步生成

  • 流式异步生成

此外,你可以在 SGLang 离线引擎之上轻松构建自定义服务器。在 Python 脚本中工作的详细示例可在 custom_server 中找到。

Nest Asyncio#

请注意,如果你想在 ipython 或其他嵌套循环代码中使用 离线引擎,你需要添加以下代码

import nest_asyncio

nest_asyncio.apply()

高级用法#

该引擎支持 VLM 推理 以及 提取隐藏状态

更多用例请参阅示例

离线批量推理#

SGLang 离线引擎支持高效调度的批量推理。

[1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.17it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.17it/s]

非流式同步生成#

[2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text:  Cindy. I am 16 years old. My favorite subject is English. I like to write stories and read books. I love the thrill of adventure. I love the people. I like to stay at home and watch movies on my computer. I am 14 years old and my favorite subject is science. I like to play with my friends and watch cartoons. My mom and dad like to play with me and I like to write stories and do homework. I love my family and my parents. I have a friend called Taylor. He is 15 years old. He likes to play with his friends and read books.
===============================
Prompt: The president of the United States is
Generated text:  an elected office. The United States president is elected by the people and is the chief executive of the United States government. The president's main role is to make decisions that will help the government to operate effectively. They are the head of the executive branch. Some of the duties of the president are making decisions in the national defense, foreign policy and policy regarding the budget. The vice president is the president's deputy. The vice president is also a member of the United States Senate. The president is usually elected to a four-year term. In 2012, the president was 82 years old. The vice president was
===============================
Prompt: The capital of France is
Generated text:  ________. A. Paris B. London C. Moscow D. Stockholm
Answer:
A

As a child, I wanted to be a model. My parents told me that my parents are not very rich and that they do not want me to become a model. However, I was still determined to become a model and the only way to do so was to work very hard. After some time, I decided to become a model. After finishing high school, I decided to go to a model school, but the financial burden was too heavy. I had to earn my own money to pay for the school fees, which is a difficult task
===============================
Prompt: The future of AI is
Generated text:  bright, but it also presents a number of challenges, including cybersecurity and privacy concerns. What are the top 10 cybersecurity challenges that AI poses for organizations?

As AI and machine learning continue to evolve, cybersecurity becomes an increasingly important consideration for organizations. With the rise of artificial intelligence and machine learning systems, there are several cybersecurity challenges that AI poses for organizations, including:

1. Privacy and data security: With AI systems relying heavily on data, organizations must ensure that they protect sensitive information and comply with data privacy regulations such as GDPR and CCPA. This includes implementing strong access controls, encryption, and other security measures to prevent unauthorized access to

流式同步生成#

[3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()
=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about your interests and what you're looking for in a job. What can I do for you today? [Name] is looking for a [job title] at [company name]. [Name] is interested in [job title] at [company name]. [Name] is looking for a [job title] at [company name]. [Name] is interested in [job title] at [company name]. [Name] is looking for a [job title] at [company name]. [Name] is

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, which is known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and Louvre Museum. It is also a major center for art, culture, and politics, and is home to many world-renowned museums, theaters, and other cultural institutions. Paris is a popular tourist destination, known for its rich history, beautiful architecture, and vibrant culture. It is the largest city in France and a major economic and political center in Europe. The city is also home to many international organizations and institutions, including UNESCO and the European Union. Paris is a city of contrasts, with its modern architecture and historical landmarks

Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text:  likely to be characterized by a number of trends that are expected to shape the technology's direction. Here are some of the most likely trends:

1. Increased focus on ethical considerations: As AI becomes more integrated into our daily lives, there will be a greater emphasis on ethical considerations. This will include issues such as bias, privacy, and transparency.

2. Greater use of AI in healthcare: AI is already being used to improve the accuracy of medical diagnoses and treatment plans. As AI becomes more advanced, we may see even more widespread use in healthcare.

3. Increased use of AI in manufacturing: AI is already being used to optimize production processes

非流式异步生成#

[4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())
=== Testing asynchronous batch generation ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Your Name], and I am a [job title] at [Company Name]. I'm excited to be here and make a difference in [objective].

What brings you to [Company Name] and what makes you unique to the role?

Please share your story and how you got here. What inspired you to pursue this career path?

I want to know about the challenges you've faced and how you've overcome them.

Additionally, how do you balance your work and personal life?

Lastly, I would love to know your vision for [Company Name]. Can you explain your mission statement and how you believe it will shape

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. It is the largest city in the country and is home to many of France's cultural and historical landmarks. It is the seat of government, the heart of the European Union, and a popular tourist destination. Paris is known for its vibrant nightlife, rich cuisine, and striking architecture. The city is also home to many notable historical sites and museums, such as the Louvre and Notre-Dame Cathedral. Its status as a global financial hub and a major cultural center has made Paris a major global city. The city is often referred to as "the city of a thousand joys" and is also considered one of the most beautiful cities in

Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text:  a rapidly evolving field with many potential trends. Some of the most significant trends include:

1. Integration of AI into diverse industries: AI is already being used in a wide range of industries, but it's likely to continue to expand its reach in the future. This could include healthcare, transportation, retail, and finance, among others.

2. Increased use of AI in autonomous vehicles: As autonomous vehicles become more common, AI will play a more significant role in their operation and safety. This could lead to more efficient use of resources and reduced accidents.

3. Development of more advanced AI models: AI models are becoming increasingly sophisticated, and researchers

流式异步生成#

[5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [insert name]. I’m a person who enjoys [insert hobby or interest]. I spend a lot of time exploring the world, reading and writing, and learning about different cultures and perspectives. I’m always open to new experiences and ideas, and I try to use my knowledge to help people in my personal and professional life. I’m passionate about sharing my experiences and learning with others, and I’m always up for a good challenge. Thank you for taking the time to meet me.

Please note that the name and profession should be fictional and not have any specific meanings or connotations. Your response should be a short, neutral self-int

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.

To verify this statement, I will:
1. Search for information about France's capital city.
2. Look for the name of Paris.
3. Check if it's a significant city in France.
4. Confirm it's the capital city of France.

After researching, I can confirm that Paris is indeed the capital city of France. Therefore, I can summarize the information in the following way:

The capital of France is Paris. This statement is factual and accurate. It is widely recognized as the official and historical center of France, serving as the seat of government, administrative, cultural, and commercial activities. Paris is known for

Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text:  complex and will likely involve many different trends and developments. Here are some possible trends:

1. Increased use of AI in healthcare: AI is already being used in medical diagnosis, drug development, and patient care, but it is likely to become even more widespread in the coming years. AI will be used to improve the accuracy of diagnoses, reduce the risk of errors, and provide personalized treatment plans.

2. AI in manufacturing: AI is already being used in manufacturing, from automating production lines to optimizing supply chains. As AI technology continues to advance, it is likely to be used even more extensively in manufacturing to improve efficiency, reduce costs,
[6]:
llm.shutdown()