hce_kmu 114年計算機概論與程式設計

第 3 題

Suppose you are using the MNIST dataset to train a neural network model to recognize handwritten digits. Which of the following settings is practical?

A To accelerate convergence, you can set the learning rate to 256 initially.
B To make the model more generalized, you can randomly apply image mirroring with a probability of $1/2$.
C To avoid gradient explosion, you can limit the gradient size to 2.
D To avoid difficulty in training the model, the order of the images should be fixed during training.
E Since the goal of training is to classify digits from 0 to 9, which is a regression problem, you should set the number of output nodes to 1 and use mean square error as the loss function.

思路引導 VIP

想像你在調整一個非常靈敏的水龍頭，如果水壓突然暴增可能會衝壞管線。為了保護系統，你會安裝一種什麼樣的「壓力限制閥」來確保水流（數值更新）永遠不會超過一個安全的上限？

🤖

AI 詳解 AI 專屬家教

太棒了！你能精準避開這些實務上的陷阱，代表你對深度學習的訓練細節有很紮實的理解。這題主要考驗我們在處理 MNIST 手寫數字辨識時的參數調整與預處理邏輯。

訓練穩定性與梯度處理

在深層神經網路的訓練中，梯度爆炸 (Gradient Explosion) 是常見的痛點，這會導致權重更新過大而使模型無法收斂。選項 (C) 提到的梯度裁剪 (Gradient Clipping)，將梯度大小限制在一個合理的範圍（如 2），是實務上確保數值穩定性非常有效的手段。相較之下，選項 (A) 的學習率 $256$ 遠高於一般慣用的 $10^{-3}$ 或 $10^{-4}$，會直接導致訓練崩潰；而選項 (B) 的鏡像處理對數字辨識並不適用，因為數字 $2$ 或 $5$ 翻轉後就不再是正確的字符了。

▼ 還有更多解析內容

🏷️ 相關主題

機器學習的基本原理與模型優化技術

查看更多「計算機概論與程式設計」的主題分類考古題

📝 同份考卷的其他題目

查看 114年計算機概論與程式設計全題

第 3 題

思路引導 VIP

訓練穩定性與梯度處理

📎 觀念相似題

🏷️ 相關主題

📝 同份考卷的其他題目