Token Limits and Rate Management

Understanding and optimizing token usage, rate limits, and costs in UniSync.

Understanding Token Limits Per Model

Each LLM model has different context window sizes and pricing. The context window is the maximum number of tokens (input + output) a single request can use.

Model	Context Window	Relative Cost	Best For
GPT-4o	128K tokens	Medium	General content generation, high-quality articles
GPT-4o-mini	128K tokens	Low	Classification, keyword mapping, shorter tasks
GPT-4 Turbo	128K tokens	High	Complex reasoning, detailed briefs
GPT-3.5 Turbo	16K tokens	Very Low	Simple classification, formatting

Key points:

Input tokens (your prompt + context) and output tokens (the model's response) both count toward the limit and cost.
Output tokens are typically 2-4x more expensive than input tokens.
UniSync tracks both input and output tokens separately in Token Analytics.

Cost Optimization Strategies

Use the Right Model for Each Task

Not every task needs the most powerful model. UniSync uses different models for different pipeline phases:

Classification and keyword mapping: Use a smaller, cheaper model (GPT-4o-mini or GPT-3.5 Turbo). These tasks have structured outputs and do not need advanced reasoning.
Content generation: Use GPT-4o for the best quality-to-cost ratio. Reserve GPT-4 Turbo for content that requires nuanced understanding.
SEO metadata (titles, descriptions): Smaller models handle these well since the output is short and templated.

Reduce Input Token Usage

Keep system prompts concise. Every token in the system prompt is sent with every request. Trim unnecessary instructions.
Limit context injection. When generating content, only include relevant research data -- not everything available.
Use keyword maps efficiently. The keyword map generator classifies keywords into secondary and long-tail categories before content generation, so the content prompt only includes relevant keywords rather than the full list.

Reduce Output Token Usage

Set appropriate max_tokens. Do not request 4,000 tokens of output for a task that needs 500.
Use structured output formats. JSON responses with defined schemas tend to be more concise than free-form text.
Batch similar operations. One request that processes 5 keywords is cheaper than 5 separate requests due to system prompt repetition.

Rate Limit Backoff

UniSync handles API rate limits automatically. You do not need to configure anything.

How it works:

When a 429 Too Many Requests response is received, UniSync waits before retrying.
The wait time increases exponentially with each retry (exponential backoff).
After several retries, if the rate limit persists, the operation fails and is logged.

What you can control:

Number of concurrent agents: Fewer agents running simultaneously means fewer parallel API calls.
Agent run intervals: Space out agent runs to distribute API usage over time.
Autopilot scheduling: Stagger autopilot start times across agents rather than running all at once.

When rate limits are frequent:

Check your API plan's rate limits on the provider dashboard.
Consider upgrading to a higher tier for increased limits.
Reduce the number of agents running in the same time window.
The Token Analytics dashboard can help identify peak usage times.

Monthly Budget Management with Token Analytics

UniSync includes a Token Analytics dashboard that tracks all LLM API spending.

Viewing Token Usage

Navigate to Token Analytics from the main menu.
View usage broken down by:
- Agent: See which agents consume the most tokens.
- Environment: Compare spending across different publish environments.
- Time period: Daily, weekly, and monthly views.
- Model: See cost distribution across different models.

Setting Budgets

Use the provider's billing settings to set hard spending limits:
- OpenAI: Set a monthly budget in Billing Settings.
Monitor the Token Analytics dashboard regularly to stay ahead of limits.

Cost Reduction Checklist

[ ] Use GPT-4o-mini for classification and keyword tasks instead of GPT-4o
[ ] Review agents with highest token usage -- are they running too frequently?
[ ] Check for agents in error loops (failing and retrying repeatedly wastes tokens)
[ ] Ensure autopilot intervals are reasonable (not running every few minutes)
[ ] Verify content strategies are approved before running production agents (avoids wasted generation)
[ ] Set billing alerts on your API provider dashboard
[ ] Review Token Analytics weekly to catch unexpected usage spikes

Token Limits and Rate Management ​

Understanding Token Limits Per Model ​

Cost Optimization Strategies ​

Use the Right Model for Each Task ​

Reduce Input Token Usage ​

Reduce Output Token Usage ​

Rate Limit Backoff ​

Monthly Budget Management with Token Analytics ​

Viewing Token Usage ​

Setting Budgets ​

Cost Reduction Checklist ​