Overview
The AI layer provides:- Chat Completion - Resume analysis, interview questions, and conversational responses
- Text Embeddings - Vector embeddings for knowledge base search (text-embedding-v3)
- Structured Output - Parsing AI responses into typed Java objects with retry logic
- Streaming SSE - Real-time streaming responses for chat interfaces
InterviewGuide uses Spring AI’s OpenAI-compatible client with DashScope’s compatibility endpoint, allowing seamless integration with Qwen models.
Quick Setup
Get API Key
- Go to Aliyun DashScope Console
- Sign up or log in with your Alibaba Cloud account
- Create an API Key in the API-KEY section
- Copy the key (starts with
sk-)
API Key Configuration
API key for Aliyun DashScope (百炼).Required: Yes - application will not start without this.Format: Starts with
sk- followed by alphanumeric charactersExample: sk-1234567890abcdef1234567890abcdefMaps to: spring.ai.openai.api-key in application.ymlGetting Your API Key
- Visit the Aliyun DashScope Console
- Navigate to API-KEY in the left sidebar
- Click Create API Key
- Copy the generated key immediately (you won’t be able to view it again)
- Store it securely in your environment variables or secrets manager
DashScope Pricing
DashScope Pricing
DashScope offers a free tier and pay-as-you-go pricing:
Check the official pricing page for current rates.
| Model | Free Tier | Pricing (after free tier) |
|---|---|---|
| qwen-turbo | 1M tokens/month | ¥0.001 / 1K tokens |
| qwen-plus | 1M tokens/month | ¥0.004 / 1K tokens |
| qwen-max | 100K tokens/month | ¥0.04 / 1K tokens |
| qwen-long | - | ¥0.0005 / 1K tokens |
| text-embedding-v3 | 10M tokens/month | ¥0.0007 / 1K tokens |
Model Selection
The Qwen model to use for chat completions.Maps to:
spring.ai.openai.chat.options.model in application.ymlAvailable Models
qwen-plus (Recommended)
qwen-plus (Recommended)
Balanced performance and cost
- Context window: 32K tokens
- Best for: General-purpose tasks, resume analysis, interviews
- Speed: Fast
- Cost: ¥0.004 / 1K tokens
- Free tier: 1M tokens/month
qwen-max (Highest Quality)
qwen-max (Highest Quality)
Maximum capability and accuracy
- Context window: 32K tokens
- Best for: Complex analysis, detailed evaluations
- Speed: Slower than qwen-plus
- Cost: ¥0.04 / 1K tokens (10x more expensive)
- Free tier: 100K tokens/month
qwen-long (Long Context)
qwen-long (Long Context)
Optimized for long documents
- Context window: 1M tokens (longest)
- Best for: Processing very long documents, extensive knowledge bases
- Speed: Optimized for throughput
- Cost: ¥0.0005 / 1K tokens (cheapest for long contexts)
qwen-turbo (Fastest)
qwen-turbo (Fastest)
Speed-optimized, lower quality
- Context window: 8K tokens
- Best for: Simple tasks, high-throughput scenarios
- Speed: Fastest
- Cost: ¥0.001 / 1K tokens (cheapest)
- Free tier: 1M tokens/month
Spring AI Configuration
The Spring AI integration is configured inapplication.yml:
Chat Configuration
OpenAI-compatible endpoint for DashScope.This allows Spring AI’s OpenAI client to work seamlessly with Qwen models.
Do not change this unless you’re using a different AI provider.
Default chat model. Overridden by
AI_MODEL environment variable.Sampling temperature for AI responses.Range: 0.0 to 2.0
0.0- Deterministic, consistent output0.2- Low randomness (current setting, good for factual tasks)0.7- Balanced creativity and consistency1.0+- High creativity, more variation
Embedding Configuration
The embedding model for converting text to vectors.Aliyun’s text-embedding-v3:
- Dimensions: 1024
- Max input: 2048 tokens
- Optimized for Chinese and English
- Cost: ¥0.0007 / 1K tokens
dimensions: 1024 setting in the vector store configuration.Retry Configuration
Number of automatic retry attempts for failed AI requests.Set to
1 (no retries) to let exceptions propagate immediately. Retries are handled at the business layer for structured output parsing.Whether to retry on 4xx client errors.Disabled because client errors (like invalid API keys) won’t be fixed by retrying.
Structured Output Configuration
InterviewGuide includes custom retry logic for parsing structured JSON responses from the AI:StructuredOutputInvoker.java for implementation details.
Maximum retry attempts when AI output fails to parse.Environment Variable:
APP_AI_STRUCTURED_MAX_ATTEMPTSWhen the AI returns JSON that doesn’t match the expected schema (e.g., resume analysis results), the system will:- Show the error to the AI
- Ask it to fix the output
- Retry up to this many times
Include parsing error details in retry prompts.Environment Variable: This helps the AI understand what went wrong and produce valid output.
APP_AI_STRUCTURED_INCLUDE_LAST_ERRORWhen enabled, the previous parsing error is sent back to the AI:Prompt Templates
AI prompts are stored in the resources directory:Customizing Prompts
To modify AI behavior, edit the prompt template files:Advanced Configuration
Connection Timeouts
For slow networks or high-latency regions, you may need to increase HTTP timeouts. Spring AI uses the underlying HTTP client’s defaults. Configure via application properties:Streaming Responses
For real-time chat experiences, InterviewGuide uses Server-Sent Events (SSE) streaming:Token Usage Tracking
To monitor API costs, enable usage metadata in responses:Production Checklist
Secure API key
- Use environment variables or secret management (AWS Secrets Manager, HashiCorp Vault)
- Never commit API keys to version control
- Rotate keys periodically
Choose appropriate model
- Start with
qwen-plusfor balanced performance - Upgrade to
qwen-maxonly if quality issues arise - Monitor token usage and costs
Set up monitoring
- Track API response times
- Monitor token usage and costs
- Set up alerts for failures or rate limits
Implement rate limiting
- DashScope has rate limits per API key
- Implement application-level rate limiting for users
- Handle 429 (Too Many Requests) errors gracefully
Configure retry logic
- Keep
spring.ai.retry.max-attempts: 1 - Let business layer handle retries with backoff
- Log all failures for analysis
Troubleshooting
Error: Unauthorized (401)
Error: Unauthorized (401)
Error: Rate limit exceeded (429)
Error: Rate limit exceeded (429)
Too many requests to DashScope:
- Wait a few minutes and retry
- Implement exponential backoff in your application
- Check your usage in DashScope Console
- Consider upgrading to a higher tier or spreading requests over time
- Implement user-level rate limiting
Error: Model not found (404)
Error: Model not found (404)
Invalid model name:
- Verify
AI_MODELis one of:qwen-plus,qwen-max,qwen-long,qwen-turbo - Check for typos in model name
- Ensure model is available in your region
Slow response times
Slow response times
Network latency or model performance:
- Check your network connection to Aliyun servers
- Consider using
qwen-turbofor faster responses - Increase timeout settings if needed
- Monitor DashScope service status
- For China regions, ensure you’re using China region endpoints
Structured output parsing failures
Structured output parsing failures
AI returning invalid JSON:
- Increase
APP_AI_STRUCTURED_MAX_ATTEMPTS(e.g., to 3) - Verify
APP_AI_STRUCTURED_INCLUDE_LAST_ERROR: trueto help AI fix errors - Check prompt templates have clear format instructions
- Review logs for specific parsing errors
- Consider using
qwen-maxfor better instruction-following
High token costs
High token costs
Optimizing API usage:
- Reduce prompt lengths where possible
- Lower
temperaturefor more deterministic (shorter) outputs - Use
qwen-turbofor simple tasks - Implement caching for repeated queries
- Monitor usage with
include-usage: true - Set usage alerts in DashScope Console
Vector store dimension mismatch
Vector store dimension mismatch
Embedding model changed but vector store not updated:
- Verify
spring.ai.vectorstore.pgvector.dimensions: 1024 - This must match
text-embedding-v3output (1024 dimensions) - If you change embedding models, you must:
- Drop existing vector store table
- Re-embed all documents
- Or set
remove-existing-vector-store-table: trueonce
See Also
- Environment Variables - AI configuration environment variables
- DashScope Documentation - Official Aliyun DashScope docs
- Spring AI Reference - Spring AI framework documentation
- Qwen Models - Model specifications and capabilities
