Skip to main content

Command Palette

Search for a command to run...

Applied Mathematics in Machine Learning

From First Principles to Optimization

Published
2 min read

Machine learning is fundamentally an application of mathematics—especially linear algebra, probability theory, and optimization.

In this article, we derive key concepts from first principles.

1. Problem Setup

We define a supervised learning dataset:

$$ \mathcal{D} = {(x^{(i)}, y^{(i)})}_{i=1}^{m} $$

where:

$$ x^{(i)} \in \mathbb{R}^n $$

$$ y^{(i)} \in \mathbb{R} $$

We want to learn a function:

$$ h_\theta : \mathbb{R}^n \rightarrow \mathbb{R} $$


2. Linear Model

We define:

$$ h_\theta(x) = \theta^T x + b $$

Expanded:

$$ h_\theta(x) = \sum_{j=1}^{n} \theta_j x_j + b $$


3. Loss Function

Mean Squared Error:

$$ J(\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 $$


4. Optimization Objective

We solve:

$$ \theta^* = \arg\min_\theta J(\theta) $$


5. Gradient Descent

Gradient:

$$ \frac{\partial J}{\partial \theta} = \frac{2}{m} \sum_{i=1}^{m} x^{(i)} \left( \theta^T x^{(i)} - y^{(i)} \right) $$

Update rule:

$$ \theta := \theta - \alpha \nabla_\theta J(\theta) $$


6. Vectorized Form

$$ J(\theta) = \frac{1}{m} (X\theta - y)^T (X\theta - y) $$

Gradient:

$$ \nabla_\theta J(\theta) = \frac{2}{m} X^T (X\theta - y) $$


7. Normal Equation

Closed-form solution:

$$ \theta^* = (X^T X)^{-1} X^T y $$


8. Regularization

L2 regularization:

$$ J(\theta) = \frac{1}{m} \sum (h_\theta(x) - y)^2 + \lambda |\theta|_2^2 $$


9. Probabilistic View

Assume:

$$ y = \theta^T x + \epsilon $$

where:

$$ \epsilon \sim \mathcal{N}(0, \sigma^2) $$

Likelihood:

$$ p(y|x,\theta) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \theta^T x)^2}{2\sigma^2}\right) $$


10. Neural Networks Extension

$$ a^{(l)} = \sigma(W^{(l)} a^{(l-1)} + b^{(l)}) $$

Backpropagation:

$$ \frac{\partial J}{\partial W^{(l)}} = \frac{\partial J}{\partial a^{(l)}} \cdot \frac{\partial a^{(l)}}{\partial W^{(l)}} $$


11. General Learning Objective

$$ \min_{\theta} ; \mathbb{E}{(x,y)\sim \mathcal{D}} \left[ \ell(h\theta(x), y) \right] $$


12. Key Insight

$$ \text{Machine Learning} = \text{Optimization} + \text{Statistics} + \text{Linear Algebra} $$


Conclusion

We built the mathematical foundation of machine learning from:

  • Linear models
  • Loss functions
  • Optimization
  • Probability

This is the backbone of modern AI systems.


84 views