EBU6504_smart_arch_notes/1-intro-to-smart-infra.md

# Smart Infrastructure

<!--toc:start-->
- [Smart Infrastructure](#smart-infrastructure)
  - [General Methodology](#general-methodology)
<!--toc:end-->

## General Methodology

1. Business understanding: What is the problem to solve
    - What is holding Londoners back from cycling?
1. Analytics approach: How can I use data to answer Q1?
    - Using data, only 2% of trips are done cycling, why? Try to learn causes by
      the data
    - Types of analytics:
        - Descriptive: what happened
        - diagnostic: Why
        - Predictive: What will happen
        - Prescritive: How to make it happen
1. Data requirements: What Existing Do I need to analyze the problem
    - Cyclists’ casualties data
    - City data
    - Cycle thefts
1. Data collection: collect new data
    - Try to collect data using sensors
1. Data understanding: Verify if the data collected can solve the problem
    - Using tools like _uni-variate_, _pairwise correlation_, and histogram
1. Data preparation (loop back to data collection): If the data is usable, or if
   preparation must be done,
    - Possible problems:
        - Structural error
        - Merging of data
        - Outlier analysis
        - Redundancy
    - Data collected contains observations (values), and attributes / features
      (keys), can be:
        - Continuous or Discrete
        - Numeric or nominal (labels like "London" or "Beijing")
1. Modeling: Visualizing the data to answer questions
    - Using ML: split dataset to train, validate and test them
        - Train: to fit the model
        - Validate: provide unbiased evaluation while training (tuning
          hyper-parameters)
        - Test: provide evaluation on final model fit
1. Evaluation: Does the model answer the question or is change needed
1. Deployment: Using the model in practice
1. Feedback (loop back to modeling): Use feedback and new data, to possibly
   re-train or fine-tune the model, and answer the initial question.
-												Add 1, took 20min

											
										
										
											2025-01-06 19:55:40 +08:00
+								# Smart Infrastructure
-												Add table of content

											
										
										
											2025-01-07 18:19:28 +08:00
+								<!--toc:start-->
 								- [Smart Infrastructure](#smart-infrastructure)
 								  - [General Methodology](#general-methodology)
 								<!--toc:end-->
-												Add 1, took 20min

											
										
										
											2025-01-06 19:55:40 +08:00
+								## General Methodology
 . Business understanding: What is the problem to solve
 								    - What is holding Londoners back from cycling?
 . Analytics approach: How can I use data to answer Q1?
 								    - Using data, only 2% of trips are done cycling, why? Try to learn causes by
 								      the data
-												Add analysis type

											
										
										
											2025-01-09 15:17:40 +08:00
+								    - Types of analytics:
 								        - Descriptive: what happened
 								        - diagnostic: Why
 								        - Predictive: What will happen
 								        - Prescritive: How to make it happen
-												Add 1, took 20min

											
										
										
											2025-01-06 19:55:40 +08:00
+. Data requirements: What Existing Do I need to analyze the problem
 								    - Cyclists’ casualties data
 								    - City data
 								    - Cycle thefts
 . Data collection: collect new data
 								    - Try to collect data using sensors
 . Data understanding: Verify if the data collected can solve the problem
 								    - Using tools like _uni-variate_, _pairwise correlation_, and histogram
 . Data preparation (loop back to data collection): If the data is usable, or if
 								   preparation must be done,
 								    - Possible problems:
 								        - Structural error
 								        - Merging of data
 								        - Outlier analysis
 								        - Redundancy
 								    - Data collected contains observations (values), and attributes / features
 								      (keys), can be:
 								        - Continuous or Discrete
-												Update nominal data

											
										
										
											2025-01-06 19:57:51 +08:00
+								        - Numeric or nominal (labels like "London" or "Beijing")
-												Add 1, took 20min

											
										
										
											2025-01-06 19:55:40 +08:00
+. Modeling: Visualizing the data to answer questions
 								    - Using ML: split dataset to train, validate and test them
 								        - Train: to fit the model
 								        - Validate: provide unbiased evaluation while training (tuning
 								          hyper-parameters)
 								        - Test: provide evaluation on final model fit
 . Evaluation: Does the model answer the question or is change needed
 . Deployment: Using the model in practice
 . Feedback (loop back to modeling): Use feedback and new data, to possibly
 								   re-train or fine-tune the model, and answer the initial question.