Merlin

    How To Fix OpenAI Rate Limits & Timeout Errors.

    LLMs are starting to get used across a wide variety of use cases. These include translation, sentiment analysis, generating code, blogs, emails, etc. However, integrating OpenAI API into your production directly has some problems as it is relatively new. Their APIs provide no SLAs and guarantee of uptimes, or even performance of the service. There are still rate limits on tokens per second & requests per second.

    LLMs are starting to get used across a wide variety of use cases. These include translation, sentiment analysis, generating code, blogs, emails, etc. However, integrating OpenAI API into your production directly has some problems as it is relatively new. Their APIs provide no SLAs and guarantee of uptimes, or even performance of the service. There are still rate limits on tokens per second & requests per second.

    OpenAI recommends using various techniques to mitigate this. Let's explore a few of them briefly.

    Exponential Backoff

    Exponential backoff is a strategy used to handle rate limits by gradually increasing the time between subsequent retries in the event of a rate-limiting error. Below is an example in Node.Js:

    const axios = require('axios'); // Make sure to install axios with npm or yarn.
    
    const BASE_URL = 'https://api.openai.com/v1/chat/completions';
    
    async function makeRequestWithBackoff(endpoint, params, retries = 3, backoffDelay = 500) {
      try {
        const response = await axios.post(endpoint, params, {
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer YOUR_OPENAI_API_KEY`,
          },
        });
        return response.data;
      } catch (error) {
        if (error.response && error.response.status === 429 && retries > 0) { // 429 is the HTTP status code for Too Many Requests
          // Wait for a random delay that increases exponentially with each retry
          const delay = Math.random() * backoffDelay;
          console.log(`Rate limit hit, retrying in ${delay}ms`);
          await new Promise((resolve) => setTimeout(resolve, delay));
          return makeRequestWithBackoff(endpoint, params, retries - 1, backoffDelay * 2);
        } else {
          // If it's not a rate limit error or we ran out of retries, throw the error
          throw error;
        }
      }
    }
    
    const params = {
      messages: [
            {role : "user", content: "Hi, Who are you?" }
      ]
      max_tokens: 50,
      model: "gpt-3.5-turbo"
    };
    
    makeRequestWithBackoff(BASE_URL, params)
      .then(data => console.log(data))
      .catch(error => console.error(error));
    

    You can even modify the logic to change the exponential backoff to a linear or random one.

    Batching

    OpenAI also allows batch requests on the /completions endpoint. This can work if you are hitting requests per second but are good on tokens per second. But remember this API is being depreciated. Using the same example as above:

    const BASE_URL = "https://api.openai.com/v1/completions";
    const params = {
          model: "curie",
          prompts: [
            "Once upon a time there was a dog",
            "Once upon a time there was a cat",
            "Once upon a time there was a human"
            
          ]
    };
    
    makeRequestWithBackoff(BASE_URL, params)
      .then(data => console.log(data))
      .catch(error => console.error(error));
    

    There are other techniques that you can use over and above these.

    Caching

    A lot of times your users are querying the same thing, some type of simple or semantic caching layer above your request can help you save cost & request time. But in this context, it will reduce the calls made to OpenAI.

    Switching Between OpenAI and Azure.

    You can apply for Azure's OpenAI service and set up a load balancing between both of the providers. This way even if one of them is down or slow, you can switch to the other provider.

    Always Stream Responses

    The OpenAI API provides a streaming feature that allows us to receive partial model responses in real-time, as they are generated. This approach offers a significant advantage over traditional non-streaming calls, where you might remain unaware of any potential timeouts until the entire response duration elapses, which can vary depending on your initial parameters such as request complexity and the number of max_tokens specified.

    Streaming ensures that, regardless of the request's size or the max_tokens set, the model begins to deliver tokens typically within the first 5–6 seconds. Should there be a delay beyond this brief window, it's an early indicator that the request may time out or might not have been processed as expected. We can terminate such requests and retry them again.

    Setting Up Fallbacks

    For specific use cases where it is okay to get responses from other models, you can set up fallbacks to other models. The best alternatives could be Llama-70b, Gemini, or other smaller models like MIXTRAL 8X7B, Claude Instant, etc. to name a few. These are some common techniques that can be used to mitigate errors in production-grade applications.

    That would be it, thank you for reading, and follow us at Merlin @ Twitter We at Merlin API provide all of these features & a lot more with 20+ models to choose from. We focus on the reliability of the API and we do all the switching, fallbacks, caching & rate-limit handling. We provide one Unified API and use one response format across all the models.

    A Small Example of how to use Merlin API with Node.js:

    import { Merlin } from "merlin-node"; // npm install merlin-node
     
    // WARNING: test api key.
    // Replace with your API key from Merlin Dashboard
    // https://api.getmerlin.in
    const apiKey = "merlin-test-3b7d-4bad-9bdd-2b0d7b3dcb6d";
    const Merlin = new Merlin({ merlinConfig: { apiKey } });
     
    const initChat = { 
      role: "system", 
      content: "You are a helpful assistant." 
    }
     
    async function createCompletion() {
      try {
        const completion = await Merlin.chat.completions.create({
          messages: [initChat],
          model: "gpt-3.5-turbo", // 20+ models as needed
        });
      } catch (error) {
        console.error("Error creating completion:", error);
      }
    }
     
    createCompletion();
    
    

    Experience the full potential of ChatGPT with Merlin

    Author
    Kalpna Thakur

    Kalpna Thakur

    Our marketing powerhouse, crafts innovative solutions for every growth challenge - all while keeping the fun in our team!

    Read more blogs

    Cover Image for Best ChatGPT Prompts For Writing Research You Need to Know
    Best ChatGPT Prompts For Writing Research You Need to Know
    2024-04-29 | 5 min. read
    Unlock the power of ChatGPT with personalized prompts! Streamline your interactions, save time, get tailored responses for all your needs, and so much more.
    Cover Image for GPT-4 vs GPT-4 Turbo: Which one to Use?
    GPT-4 vs GPT-4 Turbo: Which one to Use?
    2024-04-16 | 5 min. read
    Both GPT-4 and GPT-4 Turbo AI models are shaping the AI landscape with their advanced features. Still there are differences based on, their use case, speed, efficiency, and cost. In this blog understand these differences in detail to make the perfect choice for you.
    Cover Image for Top 10 Insights from the AI Tools Directory
    Top 10 Insights from the AI Tools Directory
    2024-04-03 | 9 min. read
    Discover the top 10 key insights uncovered within the comprehensive AI Tools Directory for unparalleled innovation and efficiency.
    Cover Image for How Rewording Generator Improves Your ChatGPT Content
    How Rewording Generator Improves Your ChatGPT Content
    2024-04-03 | 5 min. read
    Discover how a rewording generator enhances ChatGPT content by refining and diversifying language while maintaining context and clarity. Optimize your chatbot interactions effortlessly.
    Cover Image for ChatGPT Website Design: Best Prompts for a Killer Website
    ChatGPT Website Design: Best Prompts for a Killer Website
    2024-04-03 | 11 min. read
    Discover how ChatGPT helps website design with its diverse capabilities, from generating code to providing prompts. Transform your online presence today!
    Cover Image for How image-to-Text Conversion Tools Can Help You with Data Processing
    How image-to-Text Conversion Tools Can Help You with Data Processing
    2024-03-19 | 4 min. read
    Uncover the pivotal role of image-to-text conversion tools in streamlining data processing!! Explore how OCR technology facilitates seamless text extraction, indexing, and integration, while automating data entry tasks. Learn how online OCR tools revolutionize data processing efficiency, transforming the way organizations manage and utilize textual data.
    Cover Image for Is ChadGPT For Real!!!
    Is ChadGPT For Real!!!
    2024-03-14 | 3 min. read
    How a single letter typo can create such a useful product. Read further to see what is ChadGPT and how it differs from ChatGPT
    Cover Image for What are the best Rewriter Tools?
    What are the best Rewriter Tools?
    2024-03-14 | 4 min. read
    Whether you're a seasoned writer, a student striving for uniqueness, or just someone fascinated by the mystical arts of text transformation, this guide is your ticket to linguistic excellence.