Table of Contents

    How To Fix OpenAI Rate Limits & Timeout Errors.

    LLMs are starting to get used across a wide variety of use cases. These include translation, sentiment analysis, generating code, blogs, emails, etc. However, integrating OpenAI API into your production directly has some problems as it is relatively new. Their APIs provide no SLAs and guarantee of uptimes, or even performance of the service. There are still rate limits on tokens per second & requests per second.

    LLMs are starting to get used across a wide variety of use cases. These include translation, sentiment analysis, generating code, blogs, emails, etc. However, integrating OpenAI API into your production directly has some problems as it is relatively new. Their APIs provide no SLAs and guarantee of uptimes, or even performance of the service. There are still rate limits on tokens per second & requests per second.

    OpenAI recommends using various techniques to mitigate this. Let's explore a few of them briefly.

    Exponential Backoff

    Exponential backoff is a strategy used to handle rate limits by gradually increasing the time between subsequent retries in the event of a rate-limiting error. Below is an example in Node.Js:

    const axios = require('axios'); // Make sure to install axios with npm or yarn.
    
    const BASE_URL = 'https://api.openai.com/v1/chat/completions';
    
    async function makeRequestWithBackoff(endpoint, params, retries = 3, backoffDelay = 500) {
      try {
        const response = await axios.post(endpoint, params, {
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer YOUR_OPENAI_API_KEY`,
          },
        });
        return response.data;
      } catch (error) {
        if (error.response && error.response.status === 429 && retries > 0) { // 429 is the HTTP status code for Too Many Requests
          // Wait for a random delay that increases exponentially with each retry
          const delay = Math.random() * backoffDelay;
          console.log(`Rate limit hit, retrying in ${delay}ms`);
          await new Promise((resolve) => setTimeout(resolve, delay));
          return makeRequestWithBackoff(endpoint, params, retries - 1, backoffDelay * 2);
        } else {
          // If it's not a rate limit error or we ran out of retries, throw the error
          throw error;
        }
      }
    }
    
    const params = {
      messages: [
            {role : "user", content: "Hi, Who are you?" }
      ]
      max_tokens: 50,
      model: "gpt-3.5-turbo"
    };
    
    makeRequestWithBackoff(BASE_URL, params)
      .then(data => console.log(data))
      .catch(error => console.error(error));
    

    You can even modify the logic to change the exponential backoff to a linear or random one.

    Batching

    OpenAI also allows batch requests on the /completions endpoint. This can work if you are hitting requests per second but are good on tokens per second. But remember this API is being depreciated. Using the same example as above:

    const BASE_URL = "https://api.openai.com/v1/completions";
    const params = {
          model: "curie",
          prompts: [
            "Once upon a time there was a dog",
            "Once upon a time there was a cat",
            "Once upon a time there was a human"
            
          ]
    };
    
    makeRequestWithBackoff(BASE_URL, params)
      .then(data => console.log(data))
      .catch(error => console.error(error));
    

    There are other techniques that you can use over and above these.

    Caching

    A lot of times your users are querying the same thing, some type of simple or semantic caching layer above your request can help you save cost & request time. But in this context, it will reduce the calls made to OpenAI.

    Switching Between OpenAI and Azure.

    You can apply for Azure's OpenAI service and set up a load balancing between both of the providers. This way even if one of them is down or slow, you can switch to the other provider.

    Always Stream Responses

    The OpenAI API provides a streaming feature that allows us to receive partial model responses in real-time, as they are generated. This approach offers a significant advantage over traditional non-streaming calls, where you might remain unaware of any potential timeouts until the entire response duration elapses, which can vary depending on your initial parameters such as request complexity and the number of max_tokens specified.

    Streaming ensures that, regardless of the request's size or the max_tokens set, the model begins to deliver tokens typically within the first 5–6 seconds. Should there be a delay beyond this brief window, it's an early indicator that the request may time out or might not have been processed as expected. We can terminate such requests and retry them again.

    Setting Up Fallbacks

    For specific use cases where it is okay to get responses from other models, you can set up fallbacks to other models. The best alternatives could be Llama-70b, Gemini, or other smaller models like MIXTRAL 8X7B, Claude Instant, etc. to name a few. These are some common techniques that can be used to mitigate errors in production-grade applications.

    That would be it, thank you for reading, and follow us at Merlin @ Twitter We at Merlin API provide all of these features & a lot more with 20+ models to choose from. We focus on the reliability of the API and we do all the switching, fallbacks, caching & rate-limit handling. We provide one Unified API and use one response format across all the models.

    A Small Example of how to use Merlin API with Node.js:

    import { Merlin } from "merlin-node"; // npm install merlin-node
     
    // WARNING: test api key.
    // Replace with your API key from Merlin Dashboard
    // https://api.getmerlin.in
    const apiKey = "merlin-test-3b7d-4bad-9bdd-2b0d7b3dcb6d";
    const Merlin = new Merlin({ merlinConfig: { apiKey } });
     
    const initChat = { 
      role: "system", 
      content: "You are a helpful assistant." 
    }
     
    async function createCompletion() {
      try {
        const completion = await Merlin.chat.completions.create({
          messages: [initChat],
          model: "gpt-3.5-turbo", // 20+ models as needed
        });
      } catch (error) {
        console.error("Error creating completion:", error);
      }
    }
     
    createCompletion();
    
    

    Experience the full potential of ChatGPT with Merlin

    Author
    Kalpna Thakur

    Kalpna Thakur

    Our marketing powerhouse, crafts innovative solutions for every growth challenge - all while keeping the fun in our team!

    Read more blogs

    Cover Image for ChatGPT 4 Vs ChatGPT 4o | Is GPT 4o Better Than GPT 4?
    ChatGPT 4 Vs ChatGPT 4o | Is GPT 4o Better Than GPT 4?
    2024-06-03 | 6 min. read
    Ever found yourself wondering if ChatGPT-4o is truly an upgrade from ChatGPT-4? You're not alone. In this blog, we dive into the nitty-gritty details that set these two AI models apart.
    Cover Image for Insider Tips: How to Use GPT-4, GPT-4 Turbo, & GPT-4o
    Insider Tips: How to Use GPT-4, GPT-4 Turbo, & GPT-4o
    2024-05-30 | 5 min. read
    Imagine having the power of cutting-edge AI models at your fingertips. This guide will take you through the ins and outs of using GPT-4, GPT-4 Turbo, and GPT-4o. Whether you're a tech enthusiast or a professional looking to leverage AI for your projects, we'll provide you with practical insights and step-by-step instructions.
    Cover Image for Get ChatGPT-4o For FREE with unlimited prompts! - How to use GPT 4o
    Get ChatGPT-4o For FREE with unlimited prompts! - How to use GPT 4o
    2024-05-28 | 7 min. read
    This comprehensive guide will walk you through the best methods to maximize your use of OpenAI's powerful language model without spending a dime. Enhance your content creation, automate tasks, and explore the limitless potential of AI with our step-by-step instructions and valuable tips.
    Cover Image for Training an Image-to-Text Translation Model with Python
    Training an Image-to-Text Translation Model with Python
    2024-05-28 | 4 min. read
    Learn how to train an Image-to-Text Translation model using Python. This step-by-step guide covers everything from installing necessary libraries (OpenCV, Pytesseract, GoogleTrans) to pre-processing images, extracting text, and translating it between languages. Ideal for developers and tech enthusiasts looking to automate image translations efficiently.
    Cover Image for How to Ask ChatGPT the Right Questions : Unlock Hidden Features of Chatbots in 2024
    How to Ask ChatGPT the Right Questions : Unlock Hidden Features of Chatbots in 2024
    2024-05-27 | 6 min. read
    Mastering the art of questioning ChatGPT can significantly enhance your interactions and results. Learn how to tap into the nuanced capabilities of chatbots, enabling you to access hidden features and functionalities that will make your interactions more productive and insightful in 2024. From practical tips to expert advice, this guide is your key to elevating your chatbot experience.
    Cover Image for Predicting The Trends In The Development Of AI Into CRM Software in 2024
    Predicting The Trends In The Development Of AI Into CRM Software in 2024
    2024-05-02 | 8 min. read
    Explore the future of AI in CRM software for 2024. Discover trends, implementation strategies, and the impact on customer service in next-generation CRM systems.
    Cover Image for Best ChatGPT Prompts For Writing Research You Need to Know
    Best ChatGPT Prompts For Writing Research You Need to Know
    2024-04-29 | 5 min. read
    Unlock the power of ChatGPT with personalized prompts! Streamline your interactions, save time, get tailored responses for all your needs, and so much more.
    Cover Image for Get Free GPT-4 Turbo with Microsoft Copilot
    Get Free GPT-4 Turbo with Microsoft Copilot
    2024-04-26 | 5 min. read
    Microsoft Copilot: Now featuring the Free GPT-4 Turbo model! This blog delves into how the integration of GPT-4 Turbo enhances Copilot's capabilities, making it even more powerful for handling tasks across Microsoft 365 applications.