Using the Reranker API

This document explains how to use the model: ClovisReranker

POST/v1/rerank

Request format

Request body

{
  "model": "ClovisReranker",
  "query": "What is the capital of France?",
  "documents": [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain."
  ],
  "top_n": 3
}

Response

{
  "id": "rerank-XXXXXXXXXXXXX"
  "results": [
    {
      "index": 0,
      "score": 0.98,
      "document":  {
        "text":  "Paris is the capital of France."
      }
    },
    {
      "index": 2,
      "score": 0.21,
      "document":  {
        "text": "Madrid is the capital of Spain."
      }
    },
    {
      "index": 1,
      "score": 0.04,
      "document":  {
        "text": "Berlin is the capital of Germany."
      }
    }
  ]
  "meta":  {
    "billed_units":  {
      "total_tokens":  54
    },
    "tokens":  {
      "input_tokens":  54
    }
  }
}

Interpreting results

Index	Document	Score
0	Paris is the capital of France	0.98
1	Madrid is the capital of Spain	0.21
2	Berlin is the capital of Germany	0.04

The documents should then be sorted in descending score order to obtain the final ranking.

Example

Python

#!/usr/bin/env python3
    """
    Minimalist script demonstrating how to use the reranking model.
    """

    import httpx

    # Configuration
    LITELLM_BASE_URL = "https://llm-gateway.clovis-ai.fr"
    RERANK_MODEL = "ClovisReranker"
    API_KEY = "YOUR-API-KEY"

    # Example query
    query = "What is machine learning?"

    # Example documents to rerank
    documents = [
        "Machine learning is a subset of artificial intelligence that enables systems to learn from data.",
        "Python is a popular programming language for data science and web development.",
        "Deep learning uses neural networks with multiple layers to process complex patterns.",
        "The weather today is sunny with a temperature of 25 degrees Celsius.",
        "Supervised learning algorithms require labeled training data to make predictions.",
    ]

    # Number of top results to return
    top_n = 3

    print(f"Query: {query}")
    print(f"
Reranking {len(documents)} documents...
")

    # Setup HTTP client
    headers = {"Authorization": f"Bearer {API_KEY}"}
    client = httpx.Client(base_url=LITELLM_BASE_URL, timeout=30.0)

    # Make the reranking request
    response = client.post(
        "/v1/rerank",
        json={
            "model": RERANK_MODEL,
            "query": query,
            "documents": documents,
            "top_n": top_n,
        },
        headers=headers,
    )
    response.raise_for_status()
    data = response.json()
    results = data.get("results", [])

    # Display results
    if results:
        print("Ranked results:")
        for rank, result in enumerate(results, 1):
            index = result["index"]
            score = result["relevance_score"]
            document = documents[index]
            print(f"
{rank}. Score: {score:.4f}")
            print(f"   Document: {document}")
    else:
        print("No results returned.")

Best practices

ideally use 10 to 30 documents
do not exceed 50 documents
perform vector retrieval before reranking