Skip to main content

Overview

Streamlit enables rapid development of interactive web UIs for machine learning models without frontend expertise. This implementation supports both single predictions and batch processing.

Implementation

Application Structure

The Streamlit app (serving/ui_app.py) provides two interfaces:
serving/ui_app.py
import pandas as pd
import streamlit as st
from serving.predictor import Predictor

@st.cache_data
def get_model() -> Predictor:
    return Predictor.default_from_model_registry()

predictor = get_model()

def main():
    st.header("UI serving demo")
    tab1, tab2 = st.tabs(["Single prediction", "Batch prediction"])
    with tab1:
        single_pred()
    with tab2:
        batch_pred()

if __name__ == "__main__":
    main()
Key features:
  • Model caching with @st.cache_data for fast reloads
  • Tabbed interface for different use cases
  • Automatic model loading from W&B registry

Single Prediction Interface

Implementation

serving/ui_app.py
def single_pred():
    input_sent = st.text_input(
        "Type english sentence",
        value="This is example input"
    )
    if st.button("Run inference"):
        st.write("Input:", input_sent)
        pred = predictor.predict([input_sent])
        st.write("Pred:", pred)

User Experience

1

Enter text

User types or pastes text into the input field
2

Run inference

Click button to trigger prediction
3

View results

Probability distributions displayed immediately
Example output:
Input: This is example input
Pred: [[0.23 0.77]]

Batch Prediction Interface

Implementation

serving/ui_app.py
def batch_pred():
    uploaded_file = st.file_uploader("Choose a CSV file", type=["csv"])
    if uploaded_file:
        dataframe = pd.read_csv(uploaded_file)
        st.write("Input dataframe")
        st.write(dataframe)
        
        dataframe_with_pred = predictor.run_inference_on_dataframe(dataframe)
        st.write("Result dataframe")
        st.write(dataframe_with_pred)

Batch Predictor Method

serving/predictor.py
def run_inference_on_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
    correct_sentence_conf = []
    for idx in tqdm(range(len(df))):
        sentence = df.iloc[idx]["sentence"]
        conf = self.predict([sentence]).flatten()[1]
        correct_sentence_conf.append(conf)
    df["correct_sentence_conf"] = correct_sentence_conf
    return df
Features:
  • Upload CSV files via drag-and-drop
  • Preview input dataframe
  • Progress tracking with tqdm
  • Results displayed in interactive table

Example Usage

Input CSV:
sentence
This is a good example
This is bad example
Great work!
Output:
sentence,correct_sentence_conf
This is a good example,0.89
This is bad example,0.23
Great work!,0.95

Local Development

Using Make

make run_app_streamlit
This command:
  1. Builds Docker image with app-streamlit target
  2. Runs container on port 8081
  3. Forwards to internal port 8080
  4. Mounts W&B credentials

Using Docker

# Build
docker build -f Dockerfile \
  -t app-streamlit:latest \
  --target app-streamlit .

# Run
docker run -it -p 8081:8080 \
  -e WANDB_API_KEY=${WANDB_API_KEY} \
  app-streamlit:latest

Access the UI

Open browser to http://localhost:8081

Kubernetes Deployment

Manifest

k8s/app-streamlit.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-streamlit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app-streamlit
  template:
    metadata:
      labels:
        app: app-streamlit
    spec:
      containers:
        - name: app-streamlit
          image: ghcr.io/kyryl-opens-ml/app-streamlit:latest
          env:
          - name: WANDB_API_KEY
            valueFrom:
              secretKeyRef:
                name: wandb
                key: WANDB_API_KEY
---
apiVersion: v1
kind: Service
metadata:
  name: app-streamlit
spec:
  ports:
  - port: 8080
    protocol: TCP
  selector:
    app: app-streamlit
Configuration notes:
  • Single replica (Streamlit maintains session state)
  • ClusterIP service for internal access
  • W&B API key injected from Kubernetes secret

Deployment Steps

1

Create cluster

kind create cluster --name ml-in-production
2

Configure secrets

export WANDB_API_KEY='your-key'
kubectl create secret generic wandb \
  --from-literal=WANDB_API_KEY=$WANDB_API_KEY
3

Deploy application

kubectl create -f k8s/app-streamlit.yaml
4

Monitor deployment

kubectl get pods -l app=app-streamlit
kubectl logs -l app=app-streamlit -f
5

Access UI

kubectl port-forward --address 0.0.0.0 svc/app-streamlit 8080:8080
Open http://localhost:8080 in browser

Caching Strategy

Model Caching

@st.cache_data
def get_model() -> Predictor:
    return Predictor.default_from_model_registry()
Benefits:
  • Model loads once per session
  • Faster page reloads during development
  • Shared across all users in production
Use @st.cache_resource for models in production to share state across sessions.

Best Practice

@st.cache_resource
def get_model() -> Predictor:
    return Predictor.default_from_model_registry()
Differences:
  • cache_data: Serializes return value (slower, safer)
  • cache_resource: Shares object reference (faster, use for models)

Testing Streamlit Apps

Streamlit provides testing utilities:
tests/test_ui_app.py
from streamlit.testing.v1 import AppTest

def test_single_prediction():
    at = AppTest.from_file("serving/ui_app.py")
    at.run()
    
    # Simulate user input
    at.text_input[0].set_value("test sentence").run()
    at.button[0].click().run()
    
    # Assert output appears
    assert "Pred:" in at.text[0].value

def test_batch_prediction():
    at = AppTest.from_file("serving/ui_app.py")
    at.run()
    
    # Upload file
    at.file_uploader[0].upload_file("test.csv").run()
    
    # Check results displayed
    assert "correct_sentence_conf" in at.dataframe[1].value.columns

Production Considerations

Session State

Streamlit maintains per-user session state. Scale horizontally with sticky sessions.
Configuration for load balancers:
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.kubernetes.io/session-affinity: ClientIP
spec:
  sessionAffinity: ClientIP

Performance Optimization

Fragment caching for components:
@st.cache_data
def expensive_computation(data):
    return process(data)

def single_pred():
    input_sent = st.text_input("Type sentence")
    if st.button("Run"):
        result = expensive_computation(input_sent)  # Cached
        st.write(result)

Error Handling

Graceful error display:
def single_pred():
    input_sent = st.text_input("Type english sentence")
    if st.button("Run inference"):
        try:
            pred = predictor.predict([input_sent])
            st.success("Prediction complete!")
            st.write("Pred:", pred)
        except Exception as e:
            st.error(f"Prediction failed: {str(e)}")
            st.exception(e)

UI Enhancements

Visualization

Add charts for probability distributions:
import matplotlib.pyplot as plt

def single_pred():
    input_sent = st.text_input("Type english sentence")
    if st.button("Run inference"):
        pred = predictor.predict([input_sent])[0]
        
        # Display as bar chart
        fig, ax = plt.subplots()
        ax.bar(["Negative", "Positive"], pred)
        ax.set_ylabel("Probability")
        st.pyplot(fig)

Configuration Sidebar

def main():
    st.sidebar.header("Configuration")
    threshold = st.sidebar.slider(
        "Confidence threshold",
        min_value=0.0,
        max_value=1.0,
        value=0.5
    )
    
    st.header("UI serving demo")
    # Use threshold in predictions

Comparison: Streamlit vs Gradio

FeatureStreamlitGradio
Learning curveLowVery low
CustomizationHighLimited
Layout controlExcellentBasic
HuggingFace integrationManualBuilt-in
DeploymentSelf-hostedHF Spaces
Choose Streamlit when:
  • Building internal tools
  • Need custom layouts
  • Require data exploration features
  • Have existing Python codebase
Choose Gradio when:
  • Quick demos for HuggingFace
  • Simple input/output interfaces
  • Want hosted deployment

Best Practices

Use Caching

Cache expensive operations with @st.cache_data

Progress Indicators

Show st.spinner() for long-running tasks

Input Validation

Validate user input before processing

Error Messages

Display helpful errors with st.error()

Next Steps

Triton Inference Server

Deploy high-performance inference with Triton

Resources

Build docs developers (and LLMs) love