Getting Started with LLM APIs in Python

Large Language Models (LLMs) like GPT, Claude, and LLaMA have revolutionized the way developers interact with text, allowing applications to perform tasks such as summarization, question-answering, and semantic search. With Python, you can easily harness the power of LLM APIs for your projects. This guide covers everything from basic usage to advanced practices like embeddings, retrieval-augmented generation (RAG), and integrating with frameworks like LangChain and Hugging Face Transformers.

1. Understanding LLM APIs

LLM APIs allow you to access pre-trained models without building them from scratch. These APIs provide endpoints for text generation, chat, embeddings, and even fine-tuning.

Popular LLM APIs:

OpenAI GPT API (gpt-3.5-turbo, gpt-4)
Cohere
Anthropic Claude
Hugging Face Inference API
Frameworks for orchestration: LangChain, LlamaIndex

2. Basic Usage with Python

Using OpenAI’s GPT API is straightforward:

import openai

openai.api_key = "YOUR_API_KEY"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain recursion in Python with an example."}
    ],
    max_tokens=200
)

print(response.choices[0].message['content'])

Key Tips:

messages maintains conversation context.
temperature controls randomness (0 → deterministic, higher → more creative).
max_tokens limits response length to save costs.

3. Reusing Existing Models

Instead of training models from scratch, you can reuse pre-trained LLMs:

Embeddings Convert text into vectors for similarity search or clustering.embedding = openai.Embedding.create( model="text-embedding-3-large", input="Python is a versatile language." ) vector = embedding['data'][0]['embedding']
Fine-tuning / Adapters
- OpenAI allows fine-tuning GPT-3.5 models with task-specific data.
- Hugging Face offers LoRA / PEFT for lightweight fine-tuning.
Prompt Engineering Design prompts that provide context, rules, and examples to guide model output without retraining.

4. Useful Python Frameworks

LangChain
- Orchestrate LLMs with prompt templates, memory, chains, and agents.
from langchain.llms import OpenAI from langchain.prompts import PromptTemplate llm = OpenAI(model_name="gpt-4") prompt = PromptTemplate( input_variables=["topic"], template="Explain {topic} in simple terms." ) response = llm(prompt.format(topic="Python decorators")) print(response)
LlamaIndex (GPT Index)
- Build knowledge graphs and query structured data using LLMs.
Hugging Face Transformers
- Use models locally or via API:
from transformers import pipeline generator = pipeline('text-generation', model='gpt2') print(generator("Explain Python decorators", max_length=100))

5. Retrieval-Augmented Generation (RAG)

RAG combines embeddings with LLMs to provide accurate, context-aware responses:

Generate embeddings for your documents.
Store vectors in a vector database (FAISS, Pinecone).
Retrieve top-k similar documents based on query.
Feed retrieved documents + query to the LLM.

Example of in-memory RAG:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Example documents
docs = ["Python is a versatile language.", "Recursion is when a function calls itself."]
doc_embeddings = [get_embedding(doc) for doc in docs]

def retrieve_similar_docs(query, docs, doc_embeddings, top_k=1):
    query_vec = get_embedding(query)
    sims = cosine_similarity([query_vec], doc_embeddings)[0]
    top_indices = np.argsort(sims)[::-1][:top_k]
    return [docs[i] for i in top_indices]

retrieved_docs = retrieve_similar_docs("What is recursion?", docs, doc_embeddings)

6. Python Cheat Sheet for LLMs

Here’s a ready-to-use template covering chat, embeddings, RAG, and LangChain:

# Setup API key
openai.api_key = "YOUR_API_KEY"

# Chat with GPT
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain Python decorators"}],
    max_tokens=200
)

# Create embeddings
embedding = openai.Embedding.create(model="text-embedding-3-large", input="Python is versatile")['data'][0]['embedding']

# Simple RAG retrieval
# cosine_similarity between query and document embeddings (see previous example)

# LangChain prompt template
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

llm = OpenAI(model_name="gpt-4")
prompt_template = PromptTemplate(input_variables=["topic"], template="Explain {topic} in simple terms")
response = llm(prompt_template.format(topic="Python generators"))

7. Best Practices

Cache responses to reduce API calls.
Batch process prompts when possible.
Use embeddings + retrieval for repeated queries.
Limit tokens to control cost.
Design effective prompts to improve output quality.

8. Next Steps

Explore OpenAI Python SDK.
Experiment with Hugging Face Transformers locally.
Build LangChain pipelines for multi-step workflows.
Implement RAG with vector databases for large datasets.
Fine-tune models with LoRA / PEFT or OpenAI fine-tuning.

This guide provides a comprehensive overview of using LLM APIs in Python, covering chat, embeddings, RAG, LangChain, and practical coding examples. Whether you are building chatbots, summarizers, or knowledge assistants, these tools and frameworks allow you to leverage pre-trained models efficiently.