ENGINEERING發布於 · 2026年4月22日

把 PyTorch 模型塞進 Flutter App 的工程實錄

從 ONNX 匯出、ARM64 build flag、到 isolate 推論執行緒拆分。EyeRace Pro 把雲端模型搬到 mobile 的完整路徑。

3 分鐘閱讀

EyeRace Pro 一開始所有 ML 推論都跑在雲端，用 Flask + GPU instance 處理。但 demo 給賽鴿俱樂部的時候被狠打了一巴掌：他們的鴿舍在山上，4G 信號不穩，每次拍完照等 10 秒上傳又等 5 秒回傳，根本沒法用。

於是有了把模型塞回 mobile 的工程。

為什麼選 ONNX 而不是 TFLite

PyTorch 訓練的 EfficientNet-B0 要上 mobile，常見路徑：

PyTorch Mobile：Lite 版，但 iOS 包進 binary 不方便、社群活躍度下降
TFLite：Google 主打，但要從 PyTorch 轉 → ONNX → TFLite，多一層轉換損耗
ONNX Runtime：直接從 PyTorch 匯出 ONNX，再用 ONNX Runtime mobile 執行

我選了第 3 條。原因：

轉換步驟少：一個 torch.onnx.export 解決
Runtime 成熟：ONNX Runtime 在 Microsoft 持續維護，shape inference / quantization 都齊全
跨平台：iOS / Android 用同一份 .onnx 檔，省掉維護兩份的成本

匯出指令

export_to_onnx.py

import torch
from model import EyeClassifier
 
m = EyeClassifier()
m.load_state_dict(torch.load("checkpoints/best.pt"))
m.train(False)  # 切到 inference mode
 
# 用實際輸入 size 跑一次 dummy forward 取 trace
dummy = torch.randn(1, 3, 224, 224)
 
torch.onnx.export(
    m,
    dummy,
    "eye_classifier.onnx",
    input_names=["pixel_values"],
    output_names=["scores"],
    dynamic_axes={
        "pixel_values": {0: "batch"},
        "scores": {0: "batch"},
    },
    opset_version=17,  # ONNX Runtime mobile 1.16+ 支援
)

Flutter 端整合

onnxruntime 的 Dart binding 不存在，但有 community 包 onnxruntime_flutter。整合主要踩三個雷：

雷 1：ARM64 vs x86_64

iOS Simulator 跑 x86_64，但 Apple Silicon 是 arm64。pubspec.yaml 要明確指定：

pubspec.yaml

dependencies:
  onnxruntime_flutter: ^1.16.0
 
# Podfile 的 post_install 加：
# config.build_settings['EXCLUDED_ARCHS[sdk=iphonesimulator*]'] = 'arm64'
# 否則 Apple Silicon 模擬器跑不起來

雷 2：Isolate 推論

ONNX Runtime 的 inference 是 CPU bound，在 Flutter main thread 跑會卡 UI。必須丟 isolate：

lib/inference/eye_inference.dart

Future<Map<String, double>> inferEye(Uint8List imageBytes) async {
  // compute() 自動 spawn isolate 執行
  return compute(_runInference, imageBytes);
}
 
Map<String, double> _runInference(Uint8List bytes) {
  final session = OrtSession.fromBytes(_modelBytes); // 預先 cache
  final input = _preprocess(bytes); // resize 224x224 + normalize
  final outputs = session.run({'pixel_values': input});
  return _decodeScores(outputs['scores']);
}

雷 3：模型 size 與 quantization

原始 EfficientNet-B0 約 50MB，App Store 的「巨大 App」門檻是 200MB cellular download，但用戶體驗仍嫌大。

我做了 int8 quantization：

import onnxruntime.quantization as q
 
q.quantize_dynamic(
    "eye_classifier.onnx",
    "eye_classifier_int8.onnx",
    weight_type=q.QuantType.QInt8,
)

結果：

Size: 50MB → 12MB
Inference: 380ms → 220ms（M1 iPad）
精度：F1 score 0.917 → 0.875

精度掉 4% 對賽鴿評鑑不可接受。最後決定 混合架構：有網路時用雲端 fp32 模型；沒網路時 fallback 到 mobile int8。

學到什麼

GIGO（Garbage In, Garbage Out）在 ML pipeline 比模型優化更重要。EyeRace 的「智慧拍攝引擎」（拍照前判斷穩定度、拍後判斷品質）對最終準確度的提升超過後端模型優化的兩倍。

如果你也在打算把雲端 ML 搬到 mobile，順序大概是：

先把流程跑通（哪怕模型 size 50MB 也好）
再 profile 哪個步驟慢（preprocessing? inference? postprocessing?）
最後才動 quantization / pruning
永遠保留 cloud fallback

不要一開始就被 size / 速度綁住，否則會失去「快速實驗模型」的能力。

多租戶 LINE Agent 架構決策：BeauBot 的 5 個踩坑

覺得有用？

分享給可能用得上的人。

/contact