Quick start
This guide shows the shortest path to a typed result on both platforms. Every
predict() returns a list of length-1 envelopes per image — hence the [0]
pattern. The envelope is iterable: looping over it yields the detected instances.
Python
Classification
from ort_vision_sdk import Classifier
clf = Classifier(
"resnet50.onnx",
labels="imagenet_labels.txt", # one class per line, or a list/dict
input_size=(224, 224), # default
apply_softmax=True, # False if your model already outputs probs
)
r = clf.predict("dog.jpg")[0]
print(r.cls, r.conf, r.name) # top-1 — Ultralytics-style
print(r.probs.top5) # array of the top-5 indices
print(r.probs.top5conf) # corresponding probabilities
Detection
from ort_vision_sdk import Detector
det = Detector(
"yolov8n.onnx",
labels="coco", # default — 80-class COCO preset
conf_threshold=0.25,
iou_threshold=0.45,
)
result = det.predict("street.jpg")[0]
# Bulk view — Ultralytics' Boxes interface
print(result.boxes.xyxy.shape) # (N, 4) absolute pixels
print(result.boxes.cls, result.boxes.conf)
# Per-instance dataclasses
for d in result:
print(d.name, d.conf, d.box.xyxy)
# d.cropped_image is an HWC uint8 RGB ndarray of the box crop
Segmentation
from ort_vision_sdk import Segmenter
seg = Segmenter("yolov8n-seg.onnx", labels="coco", mask_threshold=0.5)
result = seg.predict("street.jpg")[0]
for inst in result:
print(inst.name, inst.conf, inst.box.xyxy)
print(inst.mask.shape) # (h, w) uint8 ∈ {0, 255}, cropped to the bbox
print(inst.segmented_image.shape) # (h, w, 3) RGB with background zeroed
Async inference
Each class exposes async_predict() (off-loads via asyncio.to_thread, the
default async path) and ort_async_predict() (InferenceSession.run_async, for
high concurrency). See the Python guide for the difference.
result = (await det.async_predict("street.jpg"))[0]
Web (browser)
The API mirrors Python. Tasks are created with await Task.create(...) (loads
the model asynchronously) and predict() is always async.
Classification
import { Classifier } from "@mauriciobenjamin700/ort-vision-sdk-web";
const clf = await Classifier.create("/models/resnet50.onnx", {
labels: ["tench", "goldfish", /* ... 1000 ImageNet labels */],
});
const r = (await clf.predict("/images/dog.jpg", { topK: 5 }))[0];
console.log(r.cls, r.conf, r.name);
console.log(r.probs.top5, r.probs.top5conf);
Detection
import { Detector } from "@mauriciobenjamin700/ort-vision-sdk-web";
// labels defaults to "coco" (80 classes)
const det = await Detector.create("/models/yolov8n.onnx");
const result = (await det.predict("/images/street.jpg", { confThreshold: 0.4 }))[0];
for (const d of result) {
console.log(d.className, d.confidence, d.bbox.asXyxy());
// d.croppedImage is an RGBImage of just that bounding box region
}
Segmentation
import { Segmenter } from "@mauriciobenjamin700/ort-vision-sdk-web";
const seg = await Segmenter.create("/models/yolov8n-seg.onnx", { maskThreshold: 0.5 });
const result = (await seg.predict("/images/street.jpg"))[0];
for (const inst of result) {
console.log(inst.className, inst.confidence, inst.bbox.asXyxy());
console.log(inst.mask.width, inst.mask.height);
}
Common problems
| Symptom | Cause / fix |
|---|---|
ModuleNotFoundError: ort_vision_sdk |
Package not installed in the env. Run pip install ort-vision-sdk. See Installation. |
onnxruntime-web import error in the browser |
It's a peer dependency — install it alongside the SDK and serve the matching .wasm files. |
| Model file not found / load failure | Check the .onnx path. Export one with yolo export model=yolov8n.pt format=onnx (see Installation). |
predict(...) seems to return a weird "list" |
By design: every predict returns a list of one envelope per image — take [0]. |
Labels come out as class_0, class_1, … |
The model carried no names and no labels was passed. Pass labels="coco", a list, a dict, or a file — see Python guide. |
Inference freezes the async server |
Don't call sync predict in an async handler; use async_predict. See Python guide. |
Recap
- Each task is a class (
Classifier,Detector,Segmenter);predictreturns a list of one envelope per image — use[0]. - Iterate the envelope for items, or use the bulk views (
.boxes,.probs). - The API is the same in Python and the browser; only
snake_case↔camelCasechanges.
Next steps
- Per-task guides: classification, detection, segmentation.
- Platform differences: Python and Web.
- Full reference: Python API and Web API.