顯示具有 machine learning 標籤的文章。 顯示所有文章
顯示具有 machine learning 標籤的文章。 顯示所有文章

2025/01/10

Check Points in Machine Learning

Q: What is the check point in machine learning & deep learning?

A: It is used to preserve the temporary models during training.

With the development of large language models (LLMs), models are becoming increasingly larger. As a result, research on utilizing model checkpoints has become important. Some machine learning experts are investigating methods to resume checkpoint models from interrupted training progress.

Checkpoint (檢查站/關口)在深度學習的領域,是指訓練過程中所保存的模型。

隨著大型語言模型(Large Language Model, LLM)的發展,現在的模型越來越大,因此Checkpoint的保留有其重要性,有學者在研究訓練中斷後如何重新從Checkpoint繼續先前未完成的訓練。


References:

Machine Learning Checkpoinging (deepchecks)

Resume Training from Checkpoint Network (Matlab)

Rojas, E., Kahira, A. N., Meneses, E., Gomez, L. B., & Badia, R. M. (2020). A study of checkpointing in large scale training of deep neural networks. arXiv preprint arXiv:2012.00825.

Xiang, L., Lu, X., Zhang, R., & Hu, Z. (2024, May). SSDC: A Scalable Sparse Differential Checkpoint for Large-scale Deep Recommendation Models. In 2024 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1-5). IEEE.

2021/12/20

Math for Deep Learning: arg min

arg min f(x) = the value x with the minimum value of f(x)

這個arg min f(x)的意思是,當f(x)是最小值的時候,x的數值


e.g. 以下的例子

f(0) = 3

f(0.9) = 2.1

f(1) = 2

f(1.1) = 2.5

f(2) = 5

f(3) = 46

因為最小的f(x) = 2
則arg min f(x) = 1

Reference:

Explanation on arg min

2021/08/26

Deep Learning: threshold θ and bias b

The output y of a single-node neural network can be represented as:

y = a(Σwixi - θ)

where a is the activation function, xi is the ith input, wis the ith weight, and θ is the threshold.

When the summation of wighted input Σwixi is less than θ, the neuron does not output. Hence we call θ the threshold, which is similar to the threshold potential in a physical neuron.

We may replace θ by -b and get

y = a(Σwixi + b)

where b is called the bias parameter.

So b = - θ is a more generalized representation for the threshold.

References:

A Beginner’s Guide to Neural Networks: Part Two

Hinton Neural Networks課程筆記2b:第一代神經網路之感知機

深度學習的數學:用數學開啟深度學習的大門(博碩) p.12-16

2020/09/01

Deep Learning @ Mobile Devices 在行動裝置上實現深度學習的幾種方法

 Some ways to implement deep learning on mobile devices:

1. Device - Inference Only 僅推論

Pre-trained models by cloud data centers are loaded into device. The model can be either fixed or can be updated later via network or product services.

訓練好的模型放在行動裝置上,模型除了是固定預載的,也可經網路下載或透過維修服務人員更新。

2. Device - Inference and Collect Data for Distributed Training 有回報資訊

The device collects data and transmit them for cloud data centers to train the model, which is then downloaded to update the device.

行動裝置可回報上傳適合更新模型的資訊

3. Federated Learning 聯合學習

Download the model to device and train the model with local data. The results of training and inference are uploaded to the cloud for updating the model.

在行動裝置上同時做訓練及推論/推斷/推理,再將結果上傳雲端,然後還可再下載新的模型至裝置

References:

Wang, J., Cao, B., Yu, P., Sun, L., Bao, W., & Zhu, X. (2018, July). Deep learning towards mobile applications. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (pp. 1385-1393). IEEE.

行動裝置上的AI:使用TensorFlow on iOS Android及樹莓派,王眾磊、陳海波, 深智數位, p.3-1~3-3

McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273-1282). PMLR.

分布式機器學習時代即將來臨?谷歌推出「Federated Learning」

2020/05/08

Deep Learning Textbook by Ian Goodfellow, Yoshua Bengio and Aaron Courville

A good online textbook to start with deep learning, is available here:

https://www.deeplearningbook.org/

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Authors:

Ian Goodfellow - Apple. Previous: Google Brain, OpenAI
Yoshua Bengio - Université de Montréal
Aaron Courville - Université de Montréal

Video introduction to deep learning:

Learn Deep Learning in 6 Weeks (YouTube video by Siraj Raval)

2019/07/28

Machine Learning - Batch Size and Epoch

Batch Size 批大小

The number of training examples present in a single batch.
一批次訓練中的樣本數,因無法將完整dataset送入整個神經網路訓練,故將dataset分割為數個批次(divide the dataset into batches),例如batch_size = 100,即一個批次中有100個樣本。

Epoch

Passing an entire dataset forward and backward through the neural network once.
一個資料集(dataset)完整forwardbackward通過神經網络的過程

Iteration 迭代

One iteration is the number of batches needed to complete one epoch.
做完一個epoch的訓練需要數個batch的樣本,稱為一個iteration。

Example

對於一個3000個訓練樣本的資料集(Dataset)。
若將這3000個樣本以batch_size=500分割,則完成一個epoch需要6個iterations。
做一個iteration,需以一批500個樣本進行訓練。

References 參考資料:
神經網路中Epoch、Iteration、Batchsize相關理解和說明
深度學習中的 epoch iteration batch-size
Epoch vs Batch Size vs Iterations

2019/04/11

Jacobian Matrix 雅可比矩陣

Jacobian Matrix 雅可比矩陣

The matrix that arranges the first-order partial derivatives of a function of a vector.

vector y is a function f of vector x:

vector y = f(vector x)
vector x = [x1, x2, ..., xn]
vector y = [y1, y2, ..., ym]

Jacobian matrix:

J = [∂f/x1, ∂f/x2, ..., ∂f/xn]
  = [∂y1/x1, ∂y1/x2, ..., ∂y1/xn
      ∂y2/x1, ∂y2/x2, ..., ∂y2/xn
      .....
     ∂ym/x1, ∂ym/x2, ..., ∂ym/xn]
(The above figure of formula is from Wikipedia: Jacobian matrix and determinant)

References:

Jacobian Matrix (Wikipedia) 雅可比矩陣 (維基百科)
Autograd: Automatic Differentiation (PyTorch Official Tutorial)

2019/01/19

Hello World with TensorFlow

After TensorFlow is installed, you may test the Hello World with commands in Python:

Type python to enter the Python environment:

python

Type the following commands

>>> import tensorflow as tf
>>> hello = tf.constant('Hello, World!')
>>> sess = tf.Session()
>>> sess.run(hello)
b'Hello, World!'
>>> 


Note that b'Hello, World!' represents byte literals.

If you want to print the message in a python file, use this command:

print(sess.run(hello))

Reference:

helloworld.py (GitHub:aymericdamien/TensorFlow-Examples)
What does the 'b' character do in front of a string literal? (Stack Overflow)

2018/08/21

Mathematics for Machine Learning 機器學習的數學

Siraj Raval的影片介紹,機器學習主要有以下幾種的數學:

Calculus 微積分 - 最佳化

Linear Algebra 線性代數 - 實現演算法

Probability 機率 - 預估結果

Statistics 統計 - 找出目標

----

參考資料

Mathematics of Machine Learning (Siraj Raval)
Logistic regression (Wikipedia) 邏輯迴歸 (維基百科)