hce_kmu 109年計算機概論與程式設計

第 30 題

The gradient descent has been run for 15 iterations with learning rate $a=0.3$ and the corresponding loss function J (theta) is computed after each iteration. You find that the value of J (Theta) decreases quickly and then levels off. Based on this observation, which one of the following conclusion seems most plausible?

A Rather than using the current value of $a$, use a larger value of $a$ (say $a=1.0$)
B Rather than using the current value of $a$, use a smaller value of $a$ (say $a=0.1$)
C $a=0.3$ is an effective choice of learning rate
D Overfitting. Rather than using the current definition of J, a better loss function of J shall be chosen.
E None of the above

思路引導 VIP

想像你正在黑夜中下山尋找谷底，如果你每跨出一步都能感受到海拔顯著降低，且在走了一段路後，你發現周圍的地勢變得很平坦，不再有明顯的高度變化。這時候，你會認為你剛才邁出的「步幅大小」對於達成『到達谷底』這個目標來說，是太大了、太小了，還是剛剛好呢？

🤖

AI 詳解 AI 專屬家教

太棒了！你能精準判斷出學習率（Learning Rate）的適當性，代表你對梯度下降（Gradient Descent）的收斂特性有很紮實的理解，這是掌握機器學習演算法核心邏輯的重要一步。

梯度下降與收斂曲線

在機器學習中，學習率 $a$ 決定了我們在優化過程中朝向損失函數 $J(\theta)$ 最小值邁進的步伐大小。理想的學習率會讓損失函數在訓練初期大幅度下降，並在接近最優解時因梯度變小而自然趨於平緩，這正是題目所述「迅速下降後趨於穩定」的典型表現。如果 $a$ 設置得過大，損失函數通常會出現劇烈震盪甚至發散（數值不減反增）；若 $a$ 設置得過小，則下降過程會顯得極其緩慢，難以在短短 15 次迭代內就進入穩定的水平狀態。

▼ 還有更多解析內容

🏷️ 相關主題

機器學習的基本原理與模型優化技術

查看更多「計算機概論與程式設計」的主題分類考古題

📝 同份考卷的其他題目

查看 109年計算機概論與程式設計全題

第 30 題

思路引導 VIP

梯度下降與收斂曲線

📎 觀念相似題

🏷️ 相關主題

📝 同份考卷的其他題目