The moment when a model you trained on your laptop runs on a Raspberry Pi and makes a correct prediction in real time — that is the moment most students realise what Embedded AI actually means. It is not abstract anymore. It is a physical device, in your hands, making intelligent decisions at the edge.
This guide walks you through the complete process — from setting up the environment to writing inference code to deploying across four real use cases we have built at Knowx Innovations. All code here has been tested on Raspberry Pi 4 running Raspberry Pi OS Bullseye (64-bit).
What is TensorFlow Lite and Why Use It on Raspberry Pi
TensorFlow Lite (TFLite) is a lightweight version of TensorFlow designed specifically for inference on resource-constrained devices — microcontrollers, Raspberry Pi, mobile phones, and edge processors. Unlike full TensorFlow which is optimised for training on powerful hardware, TFLite models are quantised — weights compressed from 32-bit float to 8-bit integer — making them 2–4x smaller and significantly faster to run on devices with limited memory and compute power.
A full TensorFlow model cannot run directly on a Raspberry Pi for most real-time applications — it is too large and too slow. TFLite solves this by compressing the model without significantly reducing accuracy. The workflow is: train your model on a laptop or cloud, convert it to .tflite format, copy it to the Pi, and run inference using the tflite-runtime library.
| Aspect | TensorFlow (Full) | TensorFlow Lite |
|---|---|---|
| Primary Use | Training on GPU/Cloud | Inference on edge devices |
| Model Size | 100MB–1GB+ | 1MB–10MB (quantised) |
| RAM Required | 4GB+ | 256KB–2MB |
| Runs on Raspberry Pi | Slow / impractical | Yes — 3–30 fps |
| Supported Hardware | GPU, TPU, Server | Pi, Arduino, ESP32, Mobile |
| Python Package | tensorflow (large) | tflite-runtime (tiny) |
Step-by-Step Setup — Raspberry Pi + TFLite Environment
This assumes you have a Raspberry Pi 4 (2GB or 4GB recommended) running Raspberry Pi OS Bullseye 64-bit. If you are on a fresh Pi, run these commands in sequence.
sudo apt update && sudo apt upgrade -y sudo apt install -y python3-pip python3-dev python3-venv sudo apt install -y libatlas-base-dev libopencv-dev
Always use a virtual environment on Raspberry Pi. System-level pip installs can conflict with OS packages and break things in unexpected ways. We learned this the hard way during our training sessions — a bare pip install corrupted a student's Pi OS and required a full reflash.
python3 -m venv ~/tflite-env source ~/tflite-env/bin/activate # Your prompt should now show (tflite-env)
Do not install the full tensorflow package on Raspberry Pi — it is enormous, slow to install, and often fails on ARM. The tflite-runtime package is all you need for inference and installs in under 2 minutes.
# Install tflite-runtime (for Pi 64-bit OS) pip install tflite-runtime # Install supporting packages pip install numpy opencv-python pillow pip install picamera2 # for Pi Camera module # Verify installation python3 -c "import tflite_runtime.interpreter as tflite; print('TFLite OK')"
You have three options for getting a .tflite model: convert your own trained TensorFlow model, download a pre-trained model from TensorFlow Hub, or export from Edge Impulse (which we use extensively at Knowx for student projects — it generates the .tflite file directly after training).
import tensorflow as tf # Load your trained SavedModel converter = tf.lite.TFLiteConverter.from_saved_model('my_model/') # Apply INT8 quantisation for faster inference on Pi converter.optimizations = [tf.lite.Optimize.DEFAULT] # Convert and save tflite_model = converter.convert() with open('model.tflite', 'wb') as f: f.write(tflite_model) print(f"Model size: {len(tflite_model) / 1024:.1f} KB")
Copy the model.tflite file to your Raspberry Pi using SCP or a USB drive: scp model.tflite pi@raspberrypi.local:~/tflite-project/
Basic Inference Code — How It Works
The TFLite inference pipeline is the same regardless of model type — classification, detection, or regression. Once you understand this pattern, deploying any model follows the same structure.
import numpy as np import tflite_runtime.interpreter as tflite # ── 1. Load the model ────────────────────────────────────── interpreter = tflite.Interpreter( model_path='model.tflite', num_threads=4 # use all 4 Pi cores ) interpreter.allocate_tensors() # ── 2. Get input and output details ──────────────────────── input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # Check expected input shape input_shape = input_details[0]['shape'] # e.g. [1, 224, 224, 3] input_dtype = input_details[0]['dtype'] # float32 or uint8 print(f"Input shape: {input_shape}, dtype: {input_dtype}") # ── 3. Preprocess your input ─────────────────────────────── # Example: resize image to match model input import cv2 img = cv2.imread('test_image.jpg') img = cv2.resize(img, (224, 224)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Normalise: float32 model expects [0,1], uint8 expects [0,255] if input_dtype == np.float32: input_data = (np.expand_dims(img, axis=0) / 255.0).astype(np.float32) else: input_data = np.expand_dims(img, axis=0).astype(np.uint8) # ── 4. Run inference ─────────────────────────────────────── interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() # ── 5. Read output ───────────────────────────────────────── output_data = interpreter.get_tensor(output_details[0]['index']) predicted_class = np.argmax(output_data[0]) confidence = np.max(output_data[0]) print(f"Predicted class: {predicted_class}, Confidence: {confidence:.2%}")
num_threads=4 when loading the interpreter. On Raspberry Pi 4, this single line makes inference 2–3x faster because it uses all four CPU cores instead of one. We see this in almost every first deployment in our training program.
4 Real Use Cases with Code — From Knowx Projects
These are not invented examples. Each one is a pattern we have implemented on real Raspberry Pi hardware for actual client projects or student program builds at Knowx Innovations.
Use Case 1 — Image Classification with Pi Camera
This is the pattern we use for crop disease detection (PlantVillage dataset), waste segregation (plastic/organic/metal classification), and quality inspection. The Pi Camera feeds frames continuously — the model classifies each frame and triggers an action based on the result.
import cv2 import numpy as np import tflite_runtime.interpreter as tflite from picamera2 import Picamera2 # Class labels — update for your project LABELS = ['Healthy', 'Early Blight', 'Late Blight', 'Nutrient Deficiency'] # Load model interpreter = tflite.Interpreter('crop_model.tflite', num_threads=4) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # Start camera cam = Picamera2() cam.configure(cam.create_preview_configuration( main={"size": (640, 480), "format": "RGB888"} )) cam.start() print("Running... Press Ctrl+C to stop") try: while True: frame = cam.capture_array() # Preprocess img = cv2.resize(frame, (224, 224)) input_data = (np.expand_dims(img, axis=0) / 255.0).astype(np.float32) # Inference interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() output = interpreter.get_tensor(output_details[0]['index'])[0] label = LABELS[np.argmax(output)] confidence = np.max(output) # Display result cv2.putText(frame, f"{label}: {confidence:.1%}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2) cv2.imshow('Crop Health Monitor', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break finally: cam.stop() cv2.destroyAllWindows()
Use Case 2 — YOLO Object Detection on Pi Camera
We use YOLO v8 nano for waste segregation (detecting plastic, metal, organic on a conveyor) and safety compliance (PPE detection). YOLO nano runs at 4–6 fps on Raspberry Pi 4 — sufficient for most embedded applications.
from ultralytics import YOLO import cv2 # Load YOLO nano — fastest on Pi model = YOLO('yolov8n.pt') # Export to TFLite for faster Pi inference model.export(format='tflite', imgsz=320) # smaller imgsz = faster # Run on Pi Camera stream cap = cv2.VideoCapture(0) # 0 = USB cam, or use Pi Camera while True: ret, frame = cap.read() if not ret: break # Run detection results = model(frame, imgsz=320, conf=0.5, verbose=False) # Draw bounding boxes annotated = results[0].plot() cv2.imshow('Waste Detection', annotated) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows()
Use Case 3 — Anomaly Detection from Sensor Data
This is the pattern behind our predictive maintenance systems — vibration from an accelerometer, temperature from a sensor, fed into an LSTM model to detect anomalous patterns that precede equipment failure. No camera needed — pure numerical sensor data.
import numpy as np import tflite_runtime.interpreter as tflite from collections import deque import board, busio import adafruit_mpu6050 # accelerometer library WINDOW = 50 # samples per inference THRESHOLD = 0.75 # anomaly score threshold interpreter = tflite.Interpreter('anomaly_model.tflite', num_threads=4) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # Sliding window buffer buffer = deque(maxlen=WINDOW) # I2C sensor setup i2c = busio.I2C(board.SCL, board.SDA) mpu = adafruit_mpu6050.MPU6050(i2c) while True: # Read accelerometer XYZ ax, ay, az = mpu.acceleration buffer.append([ax, ay, az]) if len(buffer) == WINDOW: window_data = np.array(buffer, dtype=np.float32) input_data = np.expand_dims(window_data, axis=0) # [1, 50, 3] interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() score = interpreter.get_tensor(output_details[0]['index'])[0][0] status = "⚠️ ANOMALY DETECTED" if score > THRESHOLD else "✅ Normal" print(f"Anomaly score: {score:.3f} — {status}")
Use Case 4 — Energy Consumption Prediction
The AI energy meter pattern — reading power consumption data from current and voltage sensors, running a time-series prediction model, and forecasting next-period consumption. This is the exact architecture behind the client project where we identified 18% energy savings potential — and one of the projects students build in our Embedded AI & IoT Product Engineer program.
import numpy as np import tflite_runtime.interpreter as tflite import Adafruit_ADS1x15 # ADC for current sensor import time, json import paho.mqtt.client as mqtt # Load LSTM prediction model interpreter = tflite.Interpreter('energy_lstm.tflite', num_threads=2) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # ADC setup (ADS1115) adc = Adafruit_ADS1x15.ADS1115() GAIN = 1 def read_power(): raw = adc.read_adc(0, gain=GAIN) voltage = raw * (4.096 / 32768) current = voltage / 0.066 # ACS712 30A: 66mV/A return round(current * 230, 2) # Watts (230V India) # Collect 24 hours of hourly readings then predict readings = [] while True: power = read_power() readings.append(power) if len(readings) >= 24: input_seq = np.array(readings[-24:], dtype=np.float32) input_seq = np.expand_dims(np.expand_dims(input_seq, -1), 0) interpreter.set_tensor(input_details[0]['index'], input_seq) interpreter.invoke() prediction = interpreter.get_tensor(output_details[0]['index'])[0][0] print(f"Current: {power}W | Next hour prediction: {prediction:.1f}W") time.sleep(3600) # Read every hour
Common Errors and Exactly How to Fix Them
These are the errors we see most frequently in our training program — every one of these has been hit by students at Knowx and fixed by our mentors.
| Error | Cause | Fix |
|---|---|---|
| ValueError: Cannot set tensor | Input shape or dtype mismatch | Print input_details and match your array shape exactly. Common fix: add np.expand_dims(img, axis=0) to add batch dimension |
| ImportError: No module named tflite_runtime | Wrong Python environment or not installed | Activate your virtual environment first: source ~/tflite-env/bin/activate, then pip install tflite-runtime |
| Inference very slow (1+ seconds per frame) | num_threads not set, or FLOAT32 model | Set num_threads=4 in Interpreter(). Convert model to INT8 quantised .tflite for 2–4x speedup |
| cv2.imshow() — display not working | Running over SSH without X forwarding | Either use VNC, add -X flag to SSH, or replace imshow with saving frames: cv2.imwrite('output.jpg', frame) |
| Picamera2 error: no cameras available | Camera not enabled in Pi config | Run sudo raspi-config → Interface Options → Camera → Enable. Reboot. |
| Model output values all near zero | Input not normalised correctly | FLOAT32 models expect [0.0, 1.0] range. Divide by 255.0. UINT8 models expect [0, 255] — no division needed |
How to Speed Up TFLite Inference on Raspberry Pi
- Set num_threads=4 — uses all Pi cores
- Use INT8 quantised model — 2–4x faster vs FLOAT32
- Reduce input image resolution — 224×224 instead of 640×640
- Use YOLO nano over standard — 3x faster, minimal accuracy loss
- Process every 2nd or 3rd frame for video streams
- Google Coral USB Accelerator — 10–30x speedup for compatible models
- Use Pi 4 (4GB) over Pi 3 — significant inference speed difference
- Cool the Pi properly — throttling under load drops performance 40%
- Use Class 10 / A2 SD card — faster storage means faster model loading
- Overclock Pi 4 to 2.0GHz — modest but measurable gain
From Our Training — A Student's First TFLite Deployment
Frequently Asked Questions
At Knowx Innovations, students deploy TFLite models on Raspberry Pi from Week 3 — on real projects including crop disease detection, waste segregation, predictive maintenance, and energy monitoring. Online and offline batches available with weekday and weekend schedules. University-compliant internship certificate included.