Applied Mathematics in Machine Learning

Machine learning is fundamentally an application of mathematics—especially linear algebra, probability theory, and optimization.

In this article, we derive key concepts from first principles.

1. Problem Setup

We define a supervised learning dataset:

$$ \mathcal{D} = {(x^{(i)}, y^{(i)})}_{i=1}^{m} $$

where:

$$ x^{(i)} \in \mathbb{R}^n $$

$$ y^{(i)} \in \mathbb{R} $$

We want to learn a function:

$$ h_\theta : \mathbb{R}^n \rightarrow \mathbb{R} $$

2. Linear Model

We define:

$$ h_\theta(x) = \theta^T x + b $$

Expanded:

$$ h_\theta(x) = \sum_{j=1}^{n} \theta_j x_j + b $$

3. Loss Function

Mean Squared Error:

$$ J(\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 $$

4. Optimization Objective

We solve:

$$ \theta^* = \arg\min_\theta J(\theta) $$

5. Gradient Descent

Gradient:

$$ \frac{\partial J}{\partial \theta} = \frac{2}{m} \sum_{i=1}^{m} x^{(i)} \left( \theta^T x^{(i)} - y^{(i)} \right) $$

Update rule:

$$ \theta := \theta - \alpha \nabla_\theta J(\theta) $$

6. Vectorized Form

$$ J(\theta) = \frac{1}{m} (X\theta - y)^T (X\theta - y) $$

Gradient:

$$ \nabla_\theta J(\theta) = \frac{2}{m} X^T (X\theta - y) $$

7. Normal Equation

Closed-form solution:

$$ \theta^* = (X^T X)^{-1} X^T y $$

8. Regularization

L2 regularization:

$$ J(\theta) = \frac{1}{m} \sum (h_\theta(x) - y)^2 + \lambda |\theta|_2^2 $$

9. Probabilistic View

Assume:

$$ y = \theta^T x + \epsilon $$

where:

$$ \epsilon \sim \mathcal{N}(0, \sigma^2) $$

Likelihood:

$$ p(y|x,\theta) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \theta^T x)^2}{2\sigma^2}\right) $$

10. Neural Networks Extension

$$ a^{(l)} = \sigma(W^{(l)} a^{(l-1)} + b^{(l)}) $$

Backpropagation:

$$ \frac{\partial J}{\partial W^{(l)}} = \frac{\partial J}{\partial a^{(l)}} \cdot \frac{\partial a^{(l)}}{\partial W^{(l)}} $$

11. General Learning Objective

$$ \min_{\theta} ; \mathbb{E}{(x,y)\sim \mathcal{D}} \left[ \ell(h\theta(x), y) \right] $$

12. Key Insight

$$ \text{Machine Learning} = \text{Optimization} + \text{Statistics} + \text{Linear Algebra} $$

Conclusion

We built the mathematical foundation of machine learning from:

Linear models
Loss functions
Optimization
Probability

This is the backbone of modern AI systems.

Applied Mathematics in Machine Learning

1. Problem Setup

2. Linear Model

3. Loss Function

4. Optimization Objective

5. Gradient Descent

6. Vectorized Form

7. Normal Equation

8. Regularization

9. Probabilistic View

10. Neural Networks Extension

11. General Learning Objective

12. Key Insight

Conclusion

Comments

More from this blog

The Protein Whisperer

AtCoder Beginner Contest 452

Atcoder ABC problem A Solution

Ad Hoc Problems

Command Palette

1. Problem Setup

2. Linear Model

3. Loss Function

4. Optimization Objective

5. Gradient Descent

6. Vectorized Form

7. Normal Equation

8. Regularization

9. Probabilistic View

10. Neural Networks Extension

11. General Learning Objective

12. Key Insight

Conclusion

Comments

More from this blog