root/EBU6504_smart_arch_notes

Ryan b5a734a1b7 Update dot product

2025-01-08 20:22:19 +08:00

1.9 KiB

Raw Blame History

Tutorials

Week1 tutorial

Calculation: (formulas are given from test paper)
- Accuracy = \frac{Correct Classifications}{Total Classification} = \frac{TP + TN}{TP + TN + FP + FN}
- F1 = \frac{2}{recall^{-1} + precision^{-1}} = \frac{2 \times TP}{2 \times TP + FP + FN}
Accuracy vs. F1:
- Accuracy: TP and TN are more important
- F1: FP and FN are more important, used for imbalanced classes

Week2 tutorial

IQR: difference between the 25% (Q1) and the 75% (Q3) in a dataset
- The spread of 50% of values
- Popular method of defining observation:
- Finding median, Q1, Q3, Upper bound, Lower bound
- Method: https://www.scribbr.com/statistics/interquartile-range/

Week 3 tutorial

K-means clustering:
- Initialize K
- Assign random K points to be centroids
- Assign each data point to closest centroid
- Calculate the mean, and place a new centroid (doesn't have to be on a point) to each cluster
- Repeat, until centroid doesn't change anymore

Week 4 tutorial

Euclidean distance
Cosine similarity
- Useful for applications with sparse data, since even if the objects are far in euclidean distance, they can still have a small angle between.
  - Word documents (NLP)
  - Market transaction data
  - Recommendation system
  - Image on computer
- Because 0, 0 data will be ignored
- Values:
  - Cos close to 1: similar
  - Cos close to 0: orthogonal, not related
  - Cos close to -1: opposite
- Calculation: Similarity(A,B) = cos(\theta) = \frac{A \cdot B}{||A||\times||b||}
  - \theta is the angle between vectors
  - A \dot B is the dot product, A_1 B_1 + A_2 B_2 + ... + A_n B_n
  - ||A|| is the magnitude of vector, \sqrt{A^2_1 + A^2_2 + ... + A^2_n}
- Calculate the angle with arccos(\theta)