1.9 KiB
1.9 KiB
Tutorials
Week1 tutorial
- Calculation: (formulas are given from test paper)
Accuracy = \frac{Correct Classifications}{Total Classification} = \frac{TP + TN}{TP + TN + FP + FN}
F1 = \frac{2}{recall^{-1} + precision^{-1}} = \frac{2 \times TP}{2 \times TP + FP + FN}
- Accuracy vs. F1:
- Accuracy: TP and TN are more important
- F1: FP and FN are more important, used for imbalanced classes
Week2 tutorial
- IQR: difference between the 25% (Q1) and the 75% (Q3) in a dataset
- The spread of 50% of values
- Popular method of defining observation:
- Finding median, Q1, Q3, Upper bound, Lower bound
- Method: https://www.scribbr.com/statistics/interquartile-range/
Week 3 tutorial
- K-means clustering:
- Initialize K
- Assign random K points to be centroids
- Assign each data point to closest centroid
- Calculate the mean, and place a new centroid (doesn't have to be on a point) to each cluster
- Repeat, until centroid doesn't change anymore
Week 4 tutorial
- Euclidean distance
- Cosine similarity
- Useful for applications with sparse data, since even if the objects are
far in euclidean distance, they can still have a small angle between.
- Word documents (NLP)
- Market transaction data
- Recommendation system
- Image on computer
- Because 0, 0 data will be ignored
- Values:
- Cos close to 1: similar
- Cos close to 0: orthogonal, not related
- Cos close to -1: opposite
- Calculation:
Similarity(A,B) = cos(\theta) = \frac{A \cdot B}{||A||\times||b||}
\theta
is the angle between vectorsA \dot B
is the dot product,A_1 B_1 + A_2 B_2 + ... + A_n B_n
||A||
is the magnitude of vector,\sqrt{A^2_1 + A^2_2 + ... + A^2_n}
- Calculate the angle with
arccos(\theta)
- Useful for applications with sparse data, since even if the objects are
far in euclidean distance, they can still have a small angle between.