52 lines
1.9 KiB
Markdown
52 lines
1.9 KiB
Markdown
# Tutorials
|
|
|
|
## Week1 tutorial
|
|
|
|
- Calculation: (formulas are given from test paper)
|
|
- $Accuracy = \frac{Correct Classifications}{Total Classification} = \frac{TP + TN}{TP + TN + FP + FN}$
|
|
- $F1 = \frac{2}{recall^{-1} + precision^{-1}} = \frac{2 \times TP}{2 \times TP + FP + FN}$
|
|
- Accuracy vs. F1:
|
|
- Accuracy: TP and TN are more important
|
|
- F1: FP and FN are more important, used for imbalanced classes
|
|
|
|
## Week2 tutorial
|
|
|
|
- IQR: difference between the 25% (Q1) and the 75% (Q3) in a dataset
|
|
- The spread of 50% of values
|
|
- Popular method of defining observation:
|
|
- Finding median, Q1, Q3, Upper bound, Lower bound
|
|
- Method: https://www.scribbr.com/statistics/interquartile-range/
|
|
|
|
## Week 3 tutorial
|
|
|
|
- K-means clustering:
|
|
- Initialize K
|
|
- Assign random K points to be centroids
|
|
- Assign each data point to closest centroid
|
|
- Calculate the mean, and place a new centroid (doesn't have to be on a
|
|
point) to each cluster
|
|
- Repeat, until centroid doesn't change anymore
|
|
|
|
## Week 4 tutorial
|
|
|
|
- Euclidean distance
|
|
- Cosine similarity
|
|
- Useful for applications with sparse data, since even if the objects are
|
|
far in euclidean distance, they can still have a small angle between.
|
|
- Word documents (NLP)
|
|
- Market transaction data
|
|
- Recommendation system
|
|
- Image on computer
|
|
- Because 0, 0 data will be ignored
|
|
- Values:
|
|
- Cos close to 1: similar
|
|
- Cos close to 0: orthogonal, not related
|
|
- Cos close to -1: opposite
|
|
- Calculation:
|
|
$Similarity(A,B) = cos(\theta) = \frac{A \cdot B}{||A||\times||b||}$
|
|
- $\theta$ is the angle between vectors
|
|
- $A \dot B$ is the dot product, $A_1 B_1 + A_2 B_2 + ... + A_n B_n$
|
|
- $||A||$ is the magnitude of vector,
|
|
$\sqrt{A^2_1 + A^2_2 + ... + A^2_n}$
|
|
- Calculate the angle with $arccos(\theta)$
|