Data & ML on device¶
tempestroid runs real CPython on the device (not a subset). That opens a door
ordinary Android apps don't have: running Python's scientific stack — numpy,
scikit-learn, polars, ONNX inference — inside the app, in the same
interpreter that builds the UI.
This page shows what already runs, how you enable each piece, and where the limits are. It is the roadmap's Trilho G.
Where this was proven
Everything here is device-verified on an x86_64 emulator (the hardware-free test stack; see Running on a device). The ship target is arm64 — the path is identical (recipes are per-ABI), but the arm64 wheels for the heavy libs are still pending. Per-piece status is in the table at the end.
Two paths¶
A native lib can reach the device in two ways, and tempestroid uses both:
- Cross-compiled CPython wheel — the lib is compiled as an Android wheel
(the same pattern as
pydantic-core) and runs in the embedded interpreter. This is the path fornumpy,scipy,scikit-learnandpolars. - Native library + bridge — the lib runs as native code (a Kotlin/C++ AAR)
and Python talks to it over the JNI bridge. This is the path for ONNX
inference (
onnxruntime-android), avoiding the heavy C++ wheel build.
Why it matters
Cross-compiling a wheel solves plain import x; the native bridge avoids the
weight of compiling giant C++ engines. The choice is per-lib, recorded under
docs/research/.
numpy¶
numpy is the critical path — almost the whole stack depends on it. The Android
wheel is cross-compiled with cibuildwheel (recipe
toolchain/build_numpy_x86.sh).
import numpy as np
arr = np.arange(1, 11, dtype=np.float64)
total = float(arr.sum()) # 55
dot = float(np.dot(arr, arr)) # 385
Run it on the emulator with the ready-made example:
make stage-x86 # stage the x86_64 CPython + base (numpy included)
make apk-x86 # build the emulator APK
tempest serve examples/onnxspike/app.py # shows "numpy OK" on device
Polars — the device DataFrame¶
For tabular data, use Polars, not pandas. Polars is a Rust core (the
pydantic-core class), cross-compiles to an abi3 wheel (one wheel for all
CPython ≥3.10), has a dependency-free core, and reads/writes
CSV/JSON/Parquet natively — no numpy/pyarrow required.
import io
import polars as pl
frame = pl.DataFrame({"team": ["a", "b", "a"], "points": [10, 7, 3]})
totals = frame.group_by("team").agg(pl.col("points").sum())
# Reading/writing: a CSV round-trip, entirely in memory
csv_text = frame.write_csv()
restored = pl.read_csv(io.StringIO(csv_text))
Enable it (opt-in — the Rust core is large):
make stage-polars # stage the polars-runtime-32 wheel (abi3) + the wrapper
make apk-x86
tempest serve examples/polarsspike/app.py
pandas is discouraged
If your app imports pandas, the loader emits a warning steering you to
Polars (tempestroid/cli/advisories.py). pandas drags heavy Cython/C
extensions + scientific deps into the APK and is awkward to cross-compile;
Polars is the choice that fits the device. The import still runs in the
simulator — it's a warning, not an error.
Build recipe
toolchain/build_polars_x86.sh cross-compiles polars-runtime-32 via
maturin. The details (Android-safe features, strip, the clipboard blocker)
are in docs/research/g-polars-feasibility.md.
scikit-learn + scipy¶
Classic ML runs on the device. scipy and scikit-learn cross-compile with
clang only, zero Fortran (the historical "Achilles' heel" is gone upstream:
OpenBLAS in C + a Fortran-free scipy), with OpenMP via the NDK's libomp.
import numpy as np
from sklearn.linear_model import LogisticRegression
x = np.arange(0, 10, dtype=np.float64).reshape(-1, 1)
y = (x.ravel() >= 5).astype(np.int64)
model = LogisticRegression().fit(x, y)
preds = model.predict(np.array([[2.0], [8.0]])) # [0, 1]
Enable it (opt-in — scipy + sklearn + deps are heavy):
make stage-science # scipy + scikit-learn + joblib/threadpoolctl/narwhals
make apk-x86
tempest serve examples/sklearnspike/app.py
ONNX inference (vision)¶
To run .onnx models (classification, detection, segmentation) use the
ort-vision-sdk with
tempestroid's native backend: the SDK runs inference through the onnxruntime-android
AAR over the bridge (path 2), with numpy pre/post in Python.
from ort_vision_sdk import Classifier
from tempestroid.native.inference import AarBackend
clf = Classifier("squeezenet1.1.onnx", backend=AarBackend())
result = clf.predict("banana.jpg")[0] # top-1 on device
The image path needs no OpenCV: tempestroid.native.image.decode_image decodes
via the host's BitmapFactory → ndarray. Models can be embedded or
downloaded+cached (tempestroid.native.model_store.ensure_model, with sha256
verification, off the UI thread). tempest optimize model.onnx -q int8 quantizes
+ converts to .ort on the host (build time).
Staging recipes (summary)¶
The heavy libs are opt-in — the default build carries none. Each has a per-ABI recipe:
| Lib | Enable | Wheel/recipe |
|---|---|---|
| numpy | make stage-x86 (base) |
toolchain/build_numpy_x86.sh |
| polars | make stage-polars |
toolchain/build_polars_x86.sh |
| scipy + sklearn | make stage-science |
toolchain/build_{openblas,scipy,sklearn}_x86.sh |
| onnxruntime | the vision build feature |
onnxruntime-android AAR (no wheel) |
APK size¶
The scientific stack is heavy. Trilho G7 trims what it safely can:
noCompress("so")— asset.soare not compressed (AGP's compressor crashes on a large.so; they're extracted at runtime anyway).- strip — Rust/C
.soship stripped (polars' drops from ~2.4 GB to ~200 MB). - single ABI — only the target ABI's
.sois packaged (the build doesn't leak the other ABI). - numpy trim —
numpy/tests,f2py,*.pyistubs (runtime-dead) are dropped.
Per-piece status¶
| Piece | x86_64 (emulator) | arm64 (ship) |
|---|---|---|
| numpy | ✅ import + compute | ⏳ rebuild |
| scipy + scikit-learn | ✅ import + fit/predict |
⏳ rebuild |
| Polars | ✅ build + import (op-path PySeries pending) |
⏳ rebuild |
| ONNX (ort-vision-sdk via AAR) | ✅ real Classifier (squeezenet) |
⏳ physical device |
| pandas | 🚫 discouraged → Polars | 🚫 |
Recap¶
- tempestroid runs real CPython on device → Python's scientific stack runs inside the app.
- Polars is the device DataFrame (Rust, abi3, light); pandas is discouraged (automatic warning).
numpy,scipy/scikit-learnand ONNX inference (via the AAR) run on the emulator today; each heavy lib is opt-in via amake stage-*recipe.- Trilho G7 trims the APK (noCompress/strip/single-ABI/trim).
- All proven on the x86_64 emulator; arm64 (the real ship target) is next.