EBU6504_smart_arch_notes/1-intro-to-smart-infra.md
2025-01-09 15:17:40 +08:00

1.9 KiB
Raw Blame History

Smart Infrastructure

General Methodology

  1. Business understanding: What is the problem to solve
    • What is holding Londoners back from cycling?
  2. Analytics approach: How can I use data to answer Q1?
    • Using data, only 2% of trips are done cycling, why? Try to learn causes by the data
    • Types of analytics:
      • Descriptive: what happened
      • diagnostic: Why
      • Predictive: What will happen
      • Prescritive: How to make it happen
  3. Data requirements: What Existing Do I need to analyze the problem
    • Cyclists casualties data
    • City data
    • Cycle thefts
  4. Data collection: collect new data
    • Try to collect data using sensors
  5. Data understanding: Verify if the data collected can solve the problem
    • Using tools like uni-variate, pairwise correlation, and histogram
  6. Data preparation (loop back to data collection): If the data is usable, or if preparation must be done,
    • Possible problems:
      • Structural error
      • Merging of data
      • Outlier analysis
      • Redundancy
    • Data collected contains observations (values), and attributes / features (keys), can be:
      • Continuous or Discrete
      • Numeric or nominal (labels like "London" or "Beijing")
  7. Modeling: Visualizing the data to answer questions
    • Using ML: split dataset to train, validate and test them
      • Train: to fit the model
      • Validate: provide unbiased evaluation while training (tuning hyper-parameters)
      • Test: provide evaluation on final model fit
  8. Evaluation: Does the model answer the question or is change needed
  9. Deployment: Using the model in practice
  10. Feedback (loop back to modeling): Use feedback and new data, to possibly re-train or fine-tune the model, and answer the initial question.