Happy Birthday Merlin! banner

How To Fix OpenAI Rate Limits & Timeout Errors.

LLMs are starting to get used across a wide variety of use cases. These include translation, sentiment analysis, generating code, blogs, emails, etc. However, integrating OpenAI API into your production directly has some problems as it is relatively new. Their APIs provide no SLAs and guarantee of uptimes, or even performance of the service. There are still rate limits on tokens per second & requests per second.

·4 min. read
Cover Image for How To Fix OpenAI Rate Limits & Timeout Errors.

LLMs are starting to get used across a wide variety of use cases. These include translation, sentiment analysis, generating code, blogs, emails, etc. However, integrating OpenAI API into your production directly has some problems as it is relatively new. Their APIs provide no SLAs and guarantee of uptimes, or even performance of the service. There are still rate limits on tokens per second & requests per second.

OpenAI recommends using various techniques to mitigate this. Let's explore a few of them briefly.

Exponential Backoff

Exponential backoff is a strategy used to handle rate limits by gradually increasing the time between subsequent retries in the event of a rate-limiting error. Below is an example in Node.Js:

const axios = require('axios'); // Make sure to install axios with npm or yarn.

const BASE_URL = 'https://api.openai.com/v1/chat/completions';

async function makeRequestWithBackoff(endpoint, params, retries = 3, backoffDelay = 500) {
  try {
    const response = await axios.post(endpoint, params, {
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer YOUR_OPENAI_API_KEY`,
      },
    });
    return response.data;
  } catch (error) {
    if (error.response && error.response.status === 429 && retries > 0) { // 429 is the HTTP status code for Too Many Requests
      // Wait for a random delay that increases exponentially with each retry
      const delay = Math.random() * backoffDelay;
      console.log(`Rate limit hit, retrying in ${delay}ms`);
      await new Promise((resolve) => setTimeout(resolve, delay));
      return makeRequestWithBackoff(endpoint, params, retries - 1, backoffDelay * 2);
    } else {
      // If it's not a rate limit error or we ran out of retries, throw the error
      throw error;
    }
  }
}

const params = {
  messages: [
        {role : "user", content: "Hi, Who are you?" }
  ]
  max_tokens: 50,
  model: "gpt-3.5-turbo"
};

makeRequestWithBackoff(BASE_URL, params)
  .then(data => console.log(data))
  .catch(error => console.error(error));

You can even modify the logic to change the exponential backoff to a linear or random one.

Batching

OpenAI also allows batch requests on the /completions endpoint. This can work if you are hitting requests per second but are good on tokens per second. But remember this API is being depreciated. Using the same example as above:

const BASE_URL = "https://api.openai.com/v1/completions";
const params = {
      model: "curie",
      prompts: [
        "Once upon a time there was a dog",
        "Once upon a time there was a cat",
        "Once upon a time there was a human"
        
      ]
};

makeRequestWithBackoff(BASE_URL, params)
  .then(data => console.log(data))
  .catch(error => console.error(error));

There are other techniques that you can use over and above these.

Caching

A lot of times your users are querying the same thing, some type of simple or semantic caching layer above your request can help you save cost & request time. But in this context, it will reduce the calls made to OpenAI.

Switching Between OpenAI and Azure.

You can apply for Azure's OpenAI service and set up a load balancing between both of the providers. This way even if one of them is down or slow, you can switch to the other provider.

Always Stream Responses

The OpenAI API provides a streaming feature that allows us to receive partial model responses in real-time, as they are generated. This approach offers a significant advantage over traditional non-streaming calls, where you might remain unaware of any potential timeouts until the entire response duration elapses, which can vary depending on your initial parameters such as request complexity and the number of max_tokens specified.

Streaming ensures that, regardless of the request's size or the max_tokens set, the model begins to deliver tokens typically within the first 5–6 seconds. Should there be a delay beyond this brief window, it's an early indicator that the request may time out or might not have been processed as expected. We can terminate such requests and retry them again.

Setting Up Fallbacks

For specific use cases where it is okay to get responses from other models, you can set up fallbacks to other models. The best alternatives could be Llama-70b, Gemini, or other smaller models like MIXTRAL 8X7B, Claude Instant, etc. to name a few. These are some common techniques that can be used to mitigate errors in production-grade applications.

That would be it, thank you for reading, and follow us at Merlin @ Twitter We at Merlin API provide all of these features & a lot more with 20+ models to choose from. We focus on the reliability of the API and we do all the switching, fallbacks, caching & rate-limit handling. We provide one Unified API and use one response format across all the models.

A Small Example of how to use Merlin API with Node.js:

import { Merlin } from "merlin-node"; // npm install merlin-node
 
// WARNING: test api key.
// Replace with your API key from Merlin Dashboard
// https://api.getmerlin.in
const apiKey = "merlin-test-3b7d-4bad-9bdd-2b0d7b3dcb6d";
const Merlin = new Merlin({ merlinConfig: { apiKey } });
 
const initChat = { 
  role: "system", 
  content: "You are a helpful assistant." 
}
 
async function createCompletion() {
  try {
    const completion = await Merlin.chat.completions.create({
      messages: [initChat],
      model: "gpt-3.5-turbo", // 20+ models as needed
    });
  } catch (error) {
    console.error("Error creating completion:", error);
  }
}
 
createCompletion();

Experience the full potential of ChatGPT with Merlin

Author
Kalpna Thakur

Kalpna Thakur

Our marketing powerhouse, crafts innovative solutions for every growth challenge - all while keeping the fun in our team!

Read more blogs

Cover Image for How to Fine-Tune Your Marketing Blog with an AI Paraphrasing Tool
How to Fine-Tune Your Marketing Blog with an AI Paraphrasing Tool
2024-02-19 | 5 min. read
Struggling to improve the quality of your marketing blog? Try a reliable paraphrasing tool! In today’s article, I’ll tell you how to use a paraphrasing tool to enhance your content quality.
Cover Image for Discover Wiseone: Your All-in-one tool for better reading and web searching
Discover Wiseone: Your All-in-one tool for better reading and web searching
2024-02-16 | 6 min. read
Elevate your online reading with Wiseone – the AI-powered browser extension that saves time, boosts productivity, and expands knowledge effortlessly.
Cover Image for How to make your own chatbot with Merlin AI for FREE
How to make your own chatbot with Merlin AI for FREE
2024-02-08 | 3 min. read
Empower your business with Merlin AI - Create your own chatbot for free in minutes! Streamline tasks, automate processes, and enhance efficiency effortlessly.
Cover Image for How To Use AI For Data Analysis
How To Use AI For Data Analysis
2024-02-07 | 3 min. read
Unlock the potential of data analysis with Merlin AI's Code Interpreter – a powerful tool for analyzing images, documents, spreadsheets, and so much more.
Cover Image for How to Get YouTube Video Transcripts
How to Get YouTube Video Transcripts
2024-01-16 | 3 min. read
Unlock the power of YouTube video transcripts easily - enhance accessibility, create compelling content, and elevate your viewing experience.
Cover Image for Talk to your Favorite Celebrities with AI
Talk to your Favorite Celebrities with AI
2023-12-23 | 2 min. read
Engage in fun and casual conversations with celebrity doppelgangers on Twitter using Merlin AI's innovative Doppleganger with the easy-to-use Merlin Chrome Extension
Cover Image for Astrology AI: You Need To Try It ASAP!
Astrology AI: You Need To Try It ASAP!
2023-12-22 | 2 min. read
Unlock personalized astrological insights with Merlin AI's Astrology AI! Explore your unique profile, ask questions, and discover the magic of self-discovery.
Cover Image for Amazon Hacks: Smart Shopping on Amazon with ASK AI
Amazon Hacks: Smart Shopping on Amazon with ASK AI
2023-12-22 | 3 min. read
Enhance your Amazon shopping experience with Merlin AI's Ask AI feature. Get real-time answers to product questions, saving time and making informed decisions.