# Tutorials ## Week1 tutorial - Calculation: (formulas are given from test paper) - $Accuracy = \frac{Correct Classifications}{Total Classification} = \frac{TP + TN}{TP + TN + FP + FN}$ - $F1 = \frac{2}{recall^{-1} + precision^{-1}} = \frac{2 \times TP}{2 \times TP + FP + FN}$ - Accuracy vs. F1: - Accuracy: TP and TN are more important - F1: FP and FN are more important, used for imbalanced classes ## Week2 tutorial - IQR: difference between the 25% (Q1) and the 75% (Q3) in a dataset - The spread of 50% of values - Popular method of defining observation: - Finding median, Q1, Q3, Upper bound, Lower bound - Method: https://www.scribbr.com/statistics/interquartile-range/ ## Week 3 tutorial - K-means clustering: - Initialize K - Assign random K points to be centroids - Assign each data point to closest centroid - Calculate the mean, and place a new centroid (doesn't have to be on a point) to each cluster - Repeat, until centroid doesn't change anymore ## Week 4 tutorial - Euclidean distance - Cosine similarity - Useful for applications with sparse data, since even if the objects are far in euclidean distance, they can still have a small angle between. - Word documents (NLP) - Market transaction data - Recommendation system - Image on computer - Because 0, 0 data will be ignored - Values: - Cos close to 1: similar - Cos close to 0: orthogonal, not related - Cos close to -1: opposite - Calculation: $Similarity(A,B) = cos(\theta) = \frac{A \cdot B}{||A||\times||b||}$ - $\theta$ is the angle between vectors - $A \dot B$ is the dot product, $A_1 B_1 + A_2 B_2 + ... + A_n B_n$ - $||A||$ is the magnitude of vector, $\sqrt{A^2_1 + A^2_2 + ... + A^2_n}$ - Calculate the angle with $arccos(\theta)$