AI Model Configuration

InterviewGuide uses Aliyun’s DashScope platform (百炼) to access Qwen large language models for resume analysis, interview generation, and RAG-based knowledge queries. This page covers AI model configuration, API setup, and Spring AI integration.

Overview

The AI layer provides:

Chat Completion - Resume analysis, interview questions, and conversational responses
Text Embeddings - Vector embeddings for knowledge base search (text-embedding-v3)
Structured Output - Parsing AI responses into typed Java objects with retry logic
Streaming SSE - Real-time streaming responses for chat interfaces

InterviewGuide uses Spring AI’s OpenAI-compatible client with DashScope’s compatibility endpoint, allowing seamless integration with Qwen models.

Quick Setup

Get API Key

Go to Aliyun DashScope Console
Sign up or log in with your Alibaba Cloud account
Create an API Key in the API-KEY section
Copy the key (starts with sk-)

Set environment variable

export AI_BAILIAN_API_KEY=sk-your-key-here

Or add to .env file:

AI_BAILIAN_API_KEY=sk-your-key-here

Choose model (optional)

Default is qwen-plus. For higher quality:

export AI_MODEL=qwen-max

Start application

./mvnw spring-boot:run

The application will connect to DashScope on startup.

API Key Configuration

AI_BAILIAN_API_KEY

string

required

API key for Aliyun DashScope (百炼).Required: Yes - application will not start without this.Format: Starts with sk- followed by alphanumeric charactersExample: sk-1234567890abcdef1234567890abcdefMaps to: spring.ai.openai.api-key in application.yml

Keep this key secret! Never commit to version control. Use environment variables or secret management systems.

Getting Your API Key

Visit the Aliyun DashScope Console
Navigate to API-KEY in the left sidebar
Click Create API Key
Copy the generated key immediately (you won’t be able to view it again)
Store it securely in your environment variables or secrets manager

DashScope Pricing

DashScope offers a free tier and pay-as-you-go pricing:

Model	Free Tier	Pricing (after free tier)
qwen-turbo	1M tokens/month	¥0.001 / 1K tokens
qwen-plus	1M tokens/month	¥0.004 / 1K tokens
qwen-max	100K tokens/month	¥0.04 / 1K tokens
qwen-long	-	¥0.0005 / 1K tokens
text-embedding-v3	10M tokens/month	¥0.0007 / 1K tokens

Check the official pricing page for current rates.

Model Selection

AI_MODEL

string

default:"qwen-plus"

The Qwen model to use for chat completions.Maps to: spring.ai.openai.chat.options.model in application.yml

Available Models

qwen-plus (Recommended)

Balanced performance and cost

Context window: 32K tokens
Best for: General-purpose tasks, resume analysis, interviews
Speed: Fast
Cost: ¥0.004 / 1K tokens
Free tier: 1M tokens/month

This is the default and recommended model for most use cases.

qwen-max (Highest Quality)

Maximum capability and accuracy

Context window: 32K tokens
Best for: Complex analysis, detailed evaluations
Speed: Slower than qwen-plus
Cost: ¥0.04 / 1K tokens (10x more expensive)
Free tier: 100K tokens/month

Use when you need the highest quality output and cost is less of a concern.

AI_MODEL=qwen-max

qwen-long (Long Context)

Optimized for long documents

Context window: 1M tokens (longest)
Best for: Processing very long documents, extensive knowledge bases
Speed: Optimized for throughput
Cost: ¥0.0005 / 1K tokens (cheapest for long contexts)

Use when working with extremely long documents that exceed 32K tokens.

AI_MODEL=qwen-long

qwen-turbo (Fastest)

Speed-optimized, lower quality

Context window: 8K tokens
Best for: Simple tasks, high-throughput scenarios
Speed: Fastest
Cost: ¥0.001 / 1K tokens (cheapest)
Free tier: 1M tokens/month

Use when speed is critical and quality requirements are lower.

AI_MODEL=qwen-turbo

Spring AI Configuration

The Spring AI integration is configured in application.yml:

spring:
  ai:
    openai:
      base-url: https://dashscope.aliyuncs.com/compatible-mode
      api-key: ${AI_BAILIAN_API_KEY}
      chat:
        options:
          model: ${AI_MODEL:qwen-plus}
          temperature: 0.2
      embedding:
        options:
          model: text-embedding-v3
    retry:
      max-attempts: 1
      on-client-errors: false

Chat Configuration

spring.ai.openai.base-url

string

default:"https://dashscope.aliyuncs.com/compatible-mode"

OpenAI-compatible endpoint for DashScope.This allows Spring AI’s OpenAI client to work seamlessly with Qwen models.

Do not change this unless you’re using a different AI provider.

spring.ai.openai.chat.options.model

string

default:"qwen-plus"

Default chat model. Overridden by AI_MODEL environment variable.

spring.ai.openai.chat.options.temperature

float

default:"0.2"

Sampling temperature for AI responses.Range: 0.0 to 2.0

0.0 - Deterministic, consistent output
0.2 - Low randomness (current setting, good for factual tasks)
0.7 - Balanced creativity and consistency
1.0+ - High creativity, more variation

Lower temperature is preferred for interview and resume analysis to ensure consistent evaluation criteria.

Embedding Configuration

spring.ai.openai.embedding.options.model

string

default:"text-embedding-v3"

The embedding model for converting text to vectors.Aliyun’s text-embedding-v3:

Dimensions: 1024
Max input: 2048 tokens
Optimized for Chinese and English
Cost: ¥0.0007 / 1K tokens

This model must match the dimensions: 1024 setting in the vector store configuration.

Retry Configuration

spring.ai.retry.max-attempts

integer

default:"1"

Number of automatic retry attempts for failed AI requests.Set to 1 (no retries) to let exceptions propagate immediately. Retries are handled at the business layer for structured output parsing.

spring.ai.retry.on-client-errors

boolean

default:"false"

Whether to retry on 4xx client errors.Disabled because client errors (like invalid API keys) won’t be fixed by retrying.

Structured Output Configuration

InterviewGuide includes custom retry logic for parsing structured JSON responses from the AI:

app:
  ai:
    structured-max-attempts: ${APP_AI_STRUCTURED_MAX_ATTEMPTS:2}
    structured-include-last-error: ${APP_AI_STRUCTURED_INCLUDE_LAST_ERROR:true}

See StructuredOutputInvoker.java for implementation details.

app.ai.structured-max-attempts

integer

default:"2"

Maximum retry attempts when AI output fails to parse.Environment Variable: APP_AI_STRUCTURED_MAX_ATTEMPTSWhen the AI returns JSON that doesn’t match the expected schema (e.g., resume analysis results), the system will:

Show the error to the AI
Ask it to fix the output
Retry up to this many times

Higher values increase reliability but cost more API tokens.

app.ai.structured-include-last-error

boolean

default:"true"

Include parsing error details in retry prompts.Environment Variable: APP_AI_STRUCTURED_INCLUDE_LAST_ERRORWhen enabled, the previous parsing error is sent back to the AI:

Your previous output failed to parse with error:
"Expected field 'score' to be a number, got string"

Please fix the output and try again.

This helps the AI understand what went wrong and produce valid output.

Prompt Templates

AI prompts are stored in the resources directory:

app/src/main/resources/
├── prompts/
│   ├── resume-analysis.st        # Resume evaluation prompt
│   ├── interview-question.st     # Interview question generation
│   ├── interview-evaluation.st   # Answer evaluation
│   └── rag-query.st             # Knowledge base query rewriting

Prompts use StringTemplate format with Spring AI’s prompt template system.

Customizing Prompts

To modify AI behavior, edit the prompt template files:

You are an expert resume reviewer.

Analyze the following resume and provide:
1. Overall score (0-100)
2. Strengths (list)
3. Areas for improvement (list)
4. Recommended job roles (list)

Resume content:
{resumeContent}

Respond in JSON format:
{format}

Advanced Configuration

Connection Timeouts

For slow networks or high-latency regions, you may need to increase HTTP timeouts. Spring AI uses the underlying HTTP client’s defaults. Configure via application properties:

spring:
  ai:
    openai:
      http:
        connect-timeout: 10s
        read-timeout: 60s

Streaming Responses

For real-time chat experiences, InterviewGuide uses Server-Sent Events (SSE) streaming:

// Streaming is enabled automatically for endpoints returning Flux<String>
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamResponse(@RequestParam String query) {
    return chatClient.stream(query);
}

No additional configuration required - Spring AI handles streaming automatically.

Token Usage Tracking

To monitor API costs, enable usage metadata in responses:

spring:
  ai:
    openai:
      chat:
        options:
          include-usage: true

This adds token count information to AI responses, useful for cost tracking and optimization.

Production Checklist

Secure API key

Use environment variables or secret management (AWS Secrets Manager, HashiCorp Vault)
Never commit API keys to version control
Rotate keys periodically

Choose appropriate model

Start with qwen-plus for balanced performance
Upgrade to qwen-max only if quality issues arise
Monitor token usage and costs

Set up monitoring

Track API response times
Monitor token usage and costs
Set up alerts for failures or rate limits

Implement rate limiting

DashScope has rate limits per API key
Implement application-level rate limiting for users
Handle 429 (Too Many Requests) errors gracefully

Configure retry logic

Keep spring.ai.retry.max-attempts: 1
Let business layer handle retries with backoff
Log all failures for analysis

Optimize prompts

Keep prompts concise to reduce token usage
Use few-shot examples sparingly
Test prompt changes thoroughly

Troubleshooting

Error: Unauthorized (401)

Invalid or missing API key:

Verify AI_BAILIAN_API_KEY is set correctly
Check the key is valid in DashScope Console
Ensure no extra spaces or quotes in the environment variable
Regenerate key if necessary

Error: Rate limit exceeded (429)

Too many requests to DashScope:

Wait a few minutes and retry
Implement exponential backoff in your application
Check your usage in DashScope Console
Consider upgrading to a higher tier or spreading requests over time
Implement user-level rate limiting

Error: Model not found (404)

Invalid model name:

Verify AI_MODEL is one of: qwen-plus, qwen-max, qwen-long, qwen-turbo
Check for typos in model name
Ensure model is available in your region

Slow response times

Network latency or model performance:

Check your network connection to Aliyun servers
Consider using qwen-turbo for faster responses
Increase timeout settings if needed
Monitor DashScope service status
For China regions, ensure you’re using China region endpoints

Structured output parsing failures

AI returning invalid JSON:

Increase APP_AI_STRUCTURED_MAX_ATTEMPTS (e.g., to 3)
Verify APP_AI_STRUCTURED_INCLUDE_LAST_ERROR: true to help AI fix errors
Check prompt templates have clear format instructions
Review logs for specific parsing errors
Consider using qwen-max for better instruction-following

High token costs

Optimizing API usage:

Reduce prompt lengths where possible
Lower temperature for more deterministic (shorter) outputs
Use qwen-turbo for simple tasks
Implement caching for repeated queries
Monitor usage with include-usage: true
Set usage alerts in DashScope Console

Vector store dimension mismatch

Embedding model changed but vector store not updated:

Verify spring.ai.vectorstore.pgvector.dimensions: 1024
This must match text-embedding-v3 output (1024 dimensions)
If you change embedding models, you must:
- Drop existing vector store table
- Re-embed all documents
- Or set remove-existing-vector-store-table: true once

Getting Started

Core Features

Architecture

Configuration

Deployment

Overview

Quick Setup

API Key Configuration

Getting Your API Key

Model Selection

Available Models

Spring AI Configuration

Chat Configuration

Embedding Configuration

Retry Configuration

Structured Output Configuration

Prompt Templates

Customizing Prompts

Advanced Configuration

Connection Timeouts

Streaming Responses

Token Usage Tracking

Production Checklist

Troubleshooting

See Also

Build docs developers (and LLMs) love

Getting Started

Core Features

Architecture

Configuration

Deployment

​Overview

​Quick Setup

​API Key Configuration

​Getting Your API Key

​Model Selection

​Available Models

​Spring AI Configuration

​Chat Configuration

​Embedding Configuration

​Retry Configuration

​Structured Output Configuration

​Prompt Templates

​Customizing Prompts

​Advanced Configuration

​Connection Timeouts

​Streaming Responses

​Token Usage Tracking

​Production Checklist

​Troubleshooting

​See Also

Build docs developers (and LLMs) love

Overview

Quick Setup

API Key Configuration

Getting Your API Key

Model Selection

Available Models

Spring AI Configuration

Chat Configuration

Embedding Configuration

Retry Configuration

Structured Output Configuration

Prompt Templates

Customizing Prompts

Advanced Configuration

Connection Timeouts

Streaming Responses

Token Usage Tracking

Production Checklist

Troubleshooting

See Also