Add tutorial

This commit is contained in:
Ryan 2025-01-08 20:20:25 +08:00
parent 363d2cb925
commit 8a233ffa97

51
tutorials.md Normal file
View file

@ -0,0 +1,51 @@
# Tutorials
## Week1 tutorial
- Calculation: (formulas are given from test paper)
- $Accuracy = \frac{Correct Classifications}{Total Classification} = \frac{TP + TN}{TP + TN + FP + FN}$
- $F1 = \frac{2}{recall^{-1} + precision^{-1}} = \frac{2 \times TP}{2 \times TP + FP + FN}$
- Accuracy vs. F1:
- Accuracy: TP and TN are more important
- F1: FP and FN are more important, used for imbalanced classes
## Week2 tutorial
- IQR: difference between the 25% (Q1) and the 75% (Q3) in a dataset
- The spread of 50% of values
- Popular method of defining observation:
- Finding median, Q1, Q3, Upper bound, Lower bound
- Method: https://www.scribbr.com/statistics/interquartile-range/
## Week 3 tutorial
- K-means clustering:
- Initialize K
- Assign random K points to be centroids
- Assign each data point to closest centroid
- Calculate the mean, and place a new centroid (doesn't have to be on a
point) to each cluster
- Repeat, until centroid doesn't change anymore
## Week 4 tutorial
- Euclidean distance
- Cosine similarity
- Useful for applications with sparse data, since even if the objects are
far in euclidean distance, they can still have a small angle between.
- Word documents (NLP)
- Market transaction data
- Recommendation system
- Image on computer
- Because 0, 0 data will be ignored
- Values:
- Cos close to 1: similar
- Cos close to 0: orthogonal, not related
- Cos close to -1: opposite
- Calculation:
$Similarity(A,B) = cos(\theta) = \frac{A \dot B}{||A||\times||b||}$
- $\theta$ is the angle between vectors
- $A \dot B$ is the dot product, $A_1 B_1 + A_2 B_2 + ... + A_n B_n$
- $||A||$ is the magnitude of vector,
$\sqrt{A^2_1 + A^2_2 + ... + A^2_n}$
- Calculate the angle with $arccos(\theta)$