FaceNet Android provides several configuration options to optimize performance on different devices. This guide covers GPU acceleration, threading, and delegate options for both FaceNet and anti-spoofing models.
The app displays real-time performance metrics on the detection screen:
- Face detection - Time to locate faces in the frame
- Face embedding - Time to generate FaceNet embeddings
- Vector search - Time to find nearest neighbors
- Spoof detection - Time to analyze for presentation attacks
These metrics are captured in RecognitionMetrics data class in DataModels.kt:36-41:
data class RecognitionMetrics(
val timeFaceDetection: Long,
val timeVectorSearch: Long,
val timeFaceEmbedding: Long,
val timeFaceSpoofDetection: Long,
)
Monitor these metrics while testing different configurations to find the optimal settings for your target devices.
FaceNet model optimization
The FaceNet model is configured in FaceNet.kt:25-62 with several acceleration options.
GPU acceleration
GPU delegation can significantly improve inference speed:
val faceNet = FaceNet(
context = context,
useGpu = true, // Enable GPU
useXNNPack = false, // Disable when GPU is enabled
)
The implementation checks GPU compatibility:
val interpreterOptions = Interpreter.Options().apply {
if (useGpu) {
if (CompatibilityList().isDelegateSupportedOnThisDevice) {
addDelegate(GpuDelegate(CompatibilityList().bestOptionsForThisDevice))
}
} else {
numThreads = 4
}
useXNNPACK = useXNNPack
useNNAPI = true
}
GPU acceleration is enabled by default. Some devices may not support GPU delegates, in which case the implementation falls back to CPU execution.
CPU threading
When GPU is disabled, configure the number of CPU threads:
val interpreterOptions = Interpreter.Options().apply {
numThreads = 4 // Adjust based on device capabilities
useXNNPACK = true
useNNAPI = true
}
Thread count guidelines:
- 2 threads: Low-end devices, battery optimization
- 4 threads: Mid-range devices (default)
- 8 threads: High-end devices with 8+ cores
XNNPACK acceleration
XNNPACK provides CPU-optimized operations:
val faceNet = FaceNet(
context = context,
useGpu = false,
useXNNPack = true, // Enable for CPU optimization
)
XNNPACK is automatically disabled when GPU delegation is active. It provides 2-3x speedup on CPU execution for supported operations.
NNAPI delegation
NNAPI leverages hardware accelerators when available:
interpreterOptions.useNNAPI = true // Enabled by default
NNAPI is enabled by default in FaceNet.kt:59 and automatically uses:
- GPU (if available and compatible)
- DSP accelerators
- Neural processing units (NPUs)
- CPU fallback
Spoof detection optimization
The anti-spoofing models can be tuned separately from FaceNet in FaceSpoofDetector.kt:37-88.
Default configuration
val spoofDetector = FaceSpoofDetector(
context = context,
useGpu = false, // CPU by default
useXNNPack = false, // Disabled by default
useNNAPI = false, // Disabled by default
)
GPU configuration
val interpreterOptions = Interpreter.Options().apply {
if (useGpu) {
if (CompatibilityList().isDelegateSupportedOnThisDevice) {
addDelegate(GpuDelegate(CompatibilityList().bestOptionsForThisDevice))
}
} else {
numThreads = 4
}
useXNNPACK = useXNNPack
this.useNNAPI = useNNAPI
}
The spoof detection models are small (80×80 input). GPU overhead may exceed benefits. Test on your target devices before enabling GPU.
Typical inference times on a mid-range device (Snapdragon 730):
FaceNet-512 model
| Configuration | Inference Time |
|---|
| GPU + NNAPI | 35-45ms |
| CPU (4 threads) + XNNPACK | 55-70ms |
| CPU (4 threads) only | 80-100ms |
| CPU (2 threads) only | 120-150ms |
FaceNet-128 model
| Configuration | Inference Time |
|---|
| GPU + NNAPI | 25-35ms |
| CPU (4 threads) + XNNPACK | 40-55ms |
| CPU (4 threads) only | 60-80ms |
Spoof detection (both models)
| Configuration | Inference Time |
|---|
| CPU (4 threads) | 15-25ms |
| NNAPI | 10-20ms |
| GPU | 20-30ms |
Image preprocessing optimization
The FaceNet model normalizes input images in FaceNet.kt:38-42:
private val imageTensorProcessor = ImageProcessor
.Builder()
.add(ResizeOp(imgSize, imgSize, ResizeOp.ResizeMethod.BILINEAR))
.add(NormalizeOp())
.build()
The custom NormalizeOp divides pixels by 255:
class NormalizeOp : TensorOperator {
override fun apply(p0: TensorBuffer?): TensorBuffer {
val pixels = p0!!.floatArray.map { it / 255f }.toFloatArray()
val output = TensorBufferFloat.createFixedSize(p0.shape, DataType.FLOAT32)
output.loadArray(pixels)
return output
}
}
This preprocessing runs on CPU and is not affected by GPU/NNAPI delegates.
Vector search optimization
See the Vector search guide for details on HNSW vs flat search performance tuning.
Recommended configurations
// FaceNet
FaceNet(
context = context,
useGpu = true,
useXNNPack = false,
)
// Spoof detection
FaceSpoofDetector(
context = context,
useGpu = false,
useXNNPack = true,
useNNAPI = true,
)
Balanced (mid-range devices)
// FaceNet (default)
FaceNet(
context = context,
useGpu = true,
useXNNPack = true,
)
// Spoof detection (default)
FaceSpoofDetector(
context = context,
useGpu = false,
useXNNPack = false,
useNNAPI = false,
)
Battery optimized (low-end devices)
// FaceNet
FaceNet(
context = context,
useGpu = false,
useXNNPack = true,
)
// Modify in FaceNet.kt
numThreads = 2 // Reduce thread count
// Spoof detection
FaceSpoofDetector(
context = context,
useGpu = false,
useXNNPack = false,
useNNAPI = false,
)
// Modify in FaceSpoofDetector.kt
numThreads = 2 // Reduce thread count
Start with the balanced configuration and adjust based on your performance metrics and target devices.
Use Android Profiler to analyze performance:
- Open View > Tool Windows > Profiler in Android Studio
- Start a profiling session
- Monitor CPU, memory, and energy usage during face recognition
- Look for bottlenecks in model inference vs preprocessing
Common optimization pitfalls
Avoid these common mistakes:
- Enabling both GPU and XNNPACK (they conflict)
- Using too many threads (exceeds device CPU cores)
- Enabling GPU for small models (overhead exceeds benefit)
- Not testing on actual target devices (emulators don’t reflect real performance)
Further optimizations
For advanced users:
- Model quantization - Convert models to INT8 for faster inference and smaller size
- Resolution reduction - Process lower resolution camera frames
- Frame skipping - Run recognition every 2-3 frames instead of every frame
- Batching - Process multiple faces in a single inference call
These require modifying the source code beyond configuration changes.