Rate Limits

DeepSeek API dynamically limits user concurrency based on server load. When you reach the concurrency limit, you will immediately receive an HTTP 429 response.

⚠️ Waiting Behavior

After your request is sent, it may take some time to receive a response. During this period:

  • Non-streaming requests: Continuously return empty lines
  • Streaming requests: Continuously return SSE keep-alive comments (: keep-alive)

These contents do not affect JSON body parsing. If you are parsing HTTP responses yourself, please handle these empty lines or comments appropriately. If the request has not started inference after 10 minutes, the server will close the connection.