Basic Usage
Message Roles
Chat messages support three roles:System Messages
Set the assistant’s behavior and context:User Messages
Messages from the user:Assistant Messages
Previous responses from the assistant (for conversation history):Parameters
Temperature
Control randomness (0.0 to 2.0):Max Tokens
Limit response length:Top P (Nucleus Sampling)
Alternative to temperature (0.0 to 1.0):Stop Sequences
Stop generation at specific tokens:Frequency Penalty
Reduce repetition (-2.0 to 2.0):Presence Penalty
Encourage new topics (-2.0 to 2.0):Completion Window
Time window for completing the request. Can be null, a duration string (e.g., ‘1s’, ‘24h’, ‘7d’), or ‘now’ for immediate processing. Example:Webhook URL
Optional webhook URL to receive completion notifications. Must be a valid HTTPS URL. Example:Notification Email
Optional email address to receive completion notifications.
Streaming
Stream responses token-by-token for better UX:Multi-turn Conversations
Maintain conversation context:Response Format
Standard Response
Streaming Response
Error Handling
Handle API errors gracefully:Best Practices
Use system messages effectively
Use system messages effectively
- Set clear instructions in system message
- Define the assistant’s role and constraints
- Include examples if needed
Manage conversation history
Manage conversation history
- Truncate old messages to stay within context window
- Keep important context at the beginning
- Use summarization for very long conversations
Optimize parameters
Optimize parameters
- Lower temperature (0.3-0.5) for factual tasks
- Higher temperature (0.7-1.0) for creative tasks
- Use max_tokens to control costs
Stream for better UX
Stream for better UX
- Always stream for user-facing applications
- Show typing indicators during generation
- Allow users to stop generation