EBU6504_smart_arch_notes/tutorials.md
2025-01-08 20:20:25 +08:00

1.9 KiB

Tutorials

Week1 tutorial

  • Calculation: (formulas are given from test paper)
    • Accuracy = \frac{Correct Classifications}{Total Classification} = \frac{TP + TN}{TP + TN + FP + FN}
    • F1 = \frac{2}{recall^{-1} + precision^{-1}} = \frac{2 \times TP}{2 \times TP + FP + FN}
  • Accuracy vs. F1:
    • Accuracy: TP and TN are more important
    • F1: FP and FN are more important, used for imbalanced classes

Week2 tutorial

Week 3 tutorial

  • K-means clustering:
    • Initialize K
    • Assign random K points to be centroids
    • Assign each data point to closest centroid
    • Calculate the mean, and place a new centroid (doesn't have to be on a point) to each cluster
    • Repeat, until centroid doesn't change anymore

Week 4 tutorial

  • Euclidean distance
  • Cosine similarity
    • Useful for applications with sparse data, since even if the objects are far in euclidean distance, they can still have a small angle between.
      • Word documents (NLP)
      • Market transaction data
      • Recommendation system
      • Image on computer
    • Because 0, 0 data will be ignored
    • Values:
      • Cos close to 1: similar
      • Cos close to 0: orthogonal, not related
      • Cos close to -1: opposite
    • Calculation: Similarity(A,B) = cos(\theta) = \frac{A \dot B}{||A||\times||b||}
      • \theta is the angle between vectors
      • A \dot B is the dot product, A_1 B_1 + A_2 B_2 + ... + A_n B_n
      • ||A|| is the magnitude of vector, \sqrt{A^2_1 + A^2_2 + ... + A^2_n}
    • Calculate the angle with arccos(\theta)