Python guide

Specifics of the Python ort-vision-sdk package: accepted inputs, label resolution, execution providers, and the three inference variants.

Accepted inputs

Every predict() accepts the same set of image inputs:

from pathlib import Path
from PIL import Image
import numpy as np

clf.predict("dog.jpg")                              # str path
clf.predict(Path("dog.jpg"))                        # pathlib
clf.predict(open("dog.jpg", "rb").read())           # raw bytes (PNG, JPEG, ...)
clf.predict(Image.open("dog.jpg"))                  # PIL — converted to RGB
clf.predict(np.zeros((480, 640, 3), dtype=np.uint8))  # HWC uint8 RGB ndarray

To load an image once and reuse it, use the same internal loader:

from ort_vision_sdk import load_image
img = load_image("dog.jpg")   # HWC uint8 RGB
clf.predict(img)

Labels

Tasks resolve labels at construction time via resolve_labels:

from ort_vision_sdk import Classifier, Detector, COCO_CLASSES, resolve_labels

# 1) Built-in preset (currently: "coco")
det = Detector("yolov8n.onnx", labels="coco")

# 2) Explicit list / tuple
clf = Classifier("model.onnx", labels=["cat", "dog", "fox"])

# 3) Sparse dict — gaps filled with "class_<id>"
clf = Classifier("model.onnx", labels={0: "cat", 2: "fox"})

# 4) File path — one class per line
clf = Classifier("model.onnx", labels="imagenet_labels.txt")

# 5) None — auto-generates "class_0", "class_1", ... (only when the model's
#    output shape is statically known)
clf = Classifier("model.onnx", labels=None)

names on every result is the canonical dict[int, str] mapping (mirrors Ultralytics' model.names).

Execution providers

By default the SDK picks the first available provider in ORT's preference order. To pin a specific backend, pass providers= with short aliases or canonical ORT names:

det = Detector("yolov8n.onnx", providers=["cuda", "cpu"])
det = Detector("yolov8n.onnx", providers=["tensorrt", "cuda", "cpu"])
det = Detector("yolov8n.onnx", providers=["CUDAExecutionProvider"])  # canonical name

Supported aliases: "cpu", "cuda", "tensorrt", "directml", "coreml", "openvino", "rocm". Anything else is forwarded verbatim to ORT.

For fine-grained control (graph optimization, threading, profiling) pass an ort.SessionOptions:

import onnxruntime as ort

opts = ort.SessionOptions()
opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
opts.intra_op_num_threads = 4

det = Detector("yolov8n.onnx", session_options=opts)

Async inference

Each class exposes two async variants of predict() matching the sync signature. Pick the one that matches your concurrency profile:

Method	Mechanism	Use when
`predict()`	Synchronous	Scripts, notebooks, batch pipelines without an event loop.
`async_predict()`	`asyncio.to_thread`	Default async path — FastAPI/AnyIO/Quart handlers. Off-loads the whole pipeline (pre + run + post) to the asyncio default executor's thread pool, freeing the event loop. One Python thread per in-flight inference.
`ort_async_predict()`	`InferenceSession.run_async`	High concurrency — many simultaneous awaits share a single thread pool. Pre-/post-processing run on the event-loop thread; the model run is dispatched to the ONNX Runtime internal pool configured via `SessionOptions`. Requires `onnxruntime>=1.16`.

The same split exists on the underlying session — OrtSession.async_run / OrtSession.ort_async_run — for callers building their own pipelines.

FastAPI handler (default async)

from fastapi import FastAPI, UploadFile
from ort_vision_sdk import Detector

app = FastAPI()
det = Detector("yolov8n.onnx")

@app.post("/detect")
async def detect(file: UploadFile) -> dict[str, list[dict[str, float | int | str]]]:
    image_bytes = await file.read()
    result = (await det.async_predict(image_bytes))[0]
    return {
        "detections": [
            {"name": d.name, "conf": d.conf, "x1": d.box.x1, "y1": d.box.y1,
             "x2": d.box.x2, "y2": d.box.y2}
            for d in result
        ]
    }

High-concurrency batch (ORT pool)

import asyncio
import onnxruntime as ort
from ort_vision_sdk import Detector

opts = ort.SessionOptions()
opts.intra_op_num_threads = 4
opts.inter_op_num_threads = 1

det = Detector("yolov8n.onnx", session_options=opts)

async def detect_all(paths: list[str]) -> list[list]:
    return await asyncio.gather(*(det.ort_async_predict(p) for p in paths))

results = asyncio.run(detect_all([f"img_{i}.jpg" for i in range(200)]))

Rule of thumb

One-off async call inside a request handler → async_predict.
Hundreds of concurrent inferences (queue worker, batch endpoint) → ort_async_predict.