Skip to main content

Theoretical Concepts

Before diving into implementation, it’s essential to understand the theoretical foundations that underpin robotic arm systems with computer vision. This section covers the key concepts you’ll apply throughout the course.
A comprehensive PDF covering these concepts in detail is available in the course materials: theorical_concepts.pdf

Learning Objectives

After completing this section, you will understand:
  • Core robotics principles and coordinate systems
  • Computer vision fundamentals and object detection
  • Control system architectures
  • Communication protocol design principles
  • Real-time system considerations

1. Robotics Fundamentals

Coordinate Systems and Kinematics

Robotic arms operate in three-dimensional space using coordinate systems: Cartesian Coordinates (X, Y, Z)
  • World frame: Fixed reference point in the environment
  • Robot base frame: Origin at the robot’s mounting point
  • End-effector frame: Tool center point (TCP)
Joint Space
  • Each motor/actuator has an angular position
  • Robot configuration defined by joint angles: θ₁, θ₂, θ₃…
Forward Kinematics: Calculate end-effector position from joint anglesInverse Kinematics: Calculate required joint angles to reach a target position

Degrees of Freedom (DOF)

The number of independent movements a robot can make:
  • 3-DOF: Position in 3D space (X, Y, Z)
  • 6-DOF: Position + orientation (roll, pitch, yaw)
  • VEX IQ robots typically have 3-4 DOF

Robot Control Architecture

In this course, the Vision System runs on Raspberry Pi, while Joint Controllers execute on VEX Brain. Communication between them is critical!

2. Computer Vision Basics

Image Representation

Digital images are arrays of pixels:
  • Grayscale: Single channel (0-255 intensity)
  • RGB Color: Three channels (Red, Green, Blue)
  • Resolution: Width × Height pixels
# Example: Image dimensions
import cv2
image = cv2.imread('object.jpg')
height, width, channels = image.shape  # e.g., (480, 640, 3)

Object Detection Fundamentals

Object detection identifies and locates objects in images: Key Components:
  1. Classification: What is the object? (apple, orange, bottle)
  2. Localization: Where is it? (bounding box coordinates)
  3. Confidence Score: How certain is the detection? (0.0 to 1.0)
Bounding Boxes:
  • Rectangular regions containing detected objects
  • Represented as: (x1, y1, x2, y2) or (x, y, width, height)
  • x1, y1: Top-left corner
  • x2, y2: Bottom-right corner
In this course, we use YOLO (You Only Look Once) for real-time object detection. YOLO processes the entire image in a single forward pass, making it fast enough for robotics applications.

Deep Learning for Vision

Neural Networks:
  • Learn features automatically from training data
  • Hierarchical layers: edges → shapes → objects
  • Trained using labeled datasets
Convolutional Neural Networks (CNNs):
  • Specialized for image processing
  • Convolutional layers detect spatial patterns
  • Pooling layers reduce dimensionality
YOLO Architecture:
  • Single-stage detector (fast inference)
  • Divides image into grid cells
  • Each cell predicts bounding boxes and class probabilities
  • Output: List of detections with class, confidence, and location

Model Inference Pipeline

# Conceptual inference flow
image = capture_frame()           # Get image from camera
preprocessed = resize_and_normalize(image)  # Prepare for model
results = model.predict(preprocessed)       # Run inference
detections = parse_results(results)         # Extract bounding boxes
best = select_highest_confidence(detections)  # Choose target

3. Control Systems

Feedback Control Loop

Robotic systems use feedback to achieve accurate positioning: Components:
  • Setpoint: Desired target position
  • Controller: Calculates corrective actions
  • Plant: The physical robot
  • Feedback: Sensor measurements of actual position
  • Error: Difference between target and actual position

Control Strategies

Open-Loop Control:
  • Send commands without feedback
  • Simple but less accurate
  • Example: “Move motor forward for 2 seconds”
Closed-Loop Control:
  • Use sensor feedback to correct errors
  • More accurate and robust
  • Example: “Move arm until camera detects object centered”
PID Control (Proportional-Integral-Derivative):
  • P: Correct based on current error
  • I: Correct based on accumulated past errors
  • D: Correct based on rate of error change
VEX Brain controllers often include built-in PID control for motor positioning. In this course, you’ll focus on high-level vision-based control logic.

4. Serial Communication Concepts

Why Serial Communication?

Serial communication sends data one bit at a time over a single wire:
  • Simple hardware: Fewer wires than parallel communication
  • Long distances: More reliable over extended cables
  • Universal: Supported by most embedded systems

UART Protocol

UART (Universal Asynchronous Receiver-Transmitter) is the standard for serial communication: Key Parameters:
  • Baud Rate: Bits per second (common: 9600, 115200)
  • Data Bits: Usually 8 bits per byte
  • Stop Bits: Signal end of byte (usually 1)
  • Parity: Error checking (often none)
Both devices must use the same baud rate to communicate successfully. In this course, we use 115200 baud for fast communication.

Communication Patterns

Simplex: One-way (sensor → computer) Half-Duplex: Two-way, but not simultaneous Full-Duplex: Simultaneous bidirectional (Raspberry Pi ↔ VEX Brain)

Message Framing

How to identify where one message ends and another begins: Delimiter-Based:
Hello\n    # Newline character marks message end
Length-Prefixed:
5:Hello     # Number indicates message length
Fixed-Length:
HELLO       # Always exactly 5 characters
In this course, we use newline-delimited JSON messages for clear structure and easy debugging.

5. Real-Time System Considerations

Latency and Timing

Latency Sources:
  • Camera capture: 30-60ms (30 FPS)
  • Model inference: 50-200ms (depends on model size)
  • Serial transmission: 1-5ms
  • Motor response: 10-100ms
Total System Latency: 100-400ms from detection to action
For pick-and-place tasks, 200-300ms latency is acceptable since objects are typically stationary. For tracking moving objects, optimization is critical.

Threading and Concurrency

Robotic systems often need to do multiple things simultaneously:
  • Main Thread: Capture frames and process vision
  • Communication Thread: Listen for incoming serial messages
  • Control Thread: Send commands and monitor status
from threading import Thread

# Separate threads for reading and writing
read_thread = Thread(target=serial_read_loop)
write_thread = Thread(target=vision_processing_loop)

read_thread.start()
write_thread.start()

Error Handling

Robust systems must handle failures gracefully:
  • Connection Loss: Retry serial connection automatically
  • Invalid Data: Validate and discard malformed messages
  • Timeout: Don’t wait forever for responses
  • Sensor Failure: Continue operating with degraded performance

6. System Integration

Component Interaction

Our complete system architecture: Data Flow:
  1. Camera captures frame → Raspberry Pi
  2. YOLO model processes image → detections
  3. Best detection formatted as JSON → serial TX
  4. VEX Brain receives and parses message
  5. Motion planning executes arm movement
  6. Status message sent back → Raspberry Pi

Design Principles

Modularity: Each component has a single responsibility
class ModelLoader:    # Only loads models
class DetectionModel: # Only runs inference  
class SerialComm:     # Only handles communication
Abstraction: Hide implementation details
class DetectionModelInterface(ABC):
    @abstractmethod
    def inference(self, image: np.ndarray):
        pass
Robustness: Handle errors without crashing
try:
    result = model.predict(image)
except Exception as e:
    log.error(f'Inference failed: {e}')
    return default_result

Knowledge Check

Before moving to implementation, ensure you can answer:
  1. What is the difference between forward and inverse kinematics?
  2. How does YOLO differ from traditional object detection methods?
  3. Why is full-duplex communication important for robotics?
  4. What causes latency in vision-based robot control?
  5. Why do we use threading in serial communication?
Review the theorical_concepts.pdf in the course materials for detailed diagrams, equations, and additional examples.

Next Steps

Now that you understand the theoretical foundations, you’re ready to start building! Begin with the Serial Protocol lesson to establish communication between your devices.

Serial Protocol

Learn to implement reliable serial communication

Course Overview

Return to course structure and learning path

Build docs developers (and LLMs) love