diff --git a/1-intro-to-smart-infra.md b/1-intro-to-smart-infra.md new file mode 100644 index 0000000..3b37a0c --- /dev/null +++ b/1-intro-to-smart-infra.md @@ -0,0 +1,38 @@ +# Smart Infrastructure + +## General Methodology + +1. Business understanding: What is the problem to solve + - What is holding Londoners back from cycling? +1. Analytics approach: How can I use data to answer Q1? + - Using data, only 2% of trips are done cycling, why? Try to learn causes by + the data +1. Data requirements: What Existing Do I need to analyze the problem + - Cyclists’ casualties data + - City data + - Cycle thefts +1. Data collection: collect new data + - Try to collect data using sensors +1. Data understanding: Verify if the data collected can solve the problem + - Using tools like _uni-variate_, _pairwise correlation_, and histogram +1. Data preparation (loop back to data collection): If the data is usable, or if + preparation must be done, + - Possible problems: + - Structural error + - Merging of data + - Outlier analysis + - Redundancy + - Data collected contains observations (values), and attributes / features + (keys), can be: + - Continuous or Discrete + - Numeric or nominal +1. Modeling: Visualizing the data to answer questions + - Using ML: split dataset to train, validate and test them + - Train: to fit the model + - Validate: provide unbiased evaluation while training (tuning + hyper-parameters) + - Test: provide evaluation on final model fit +1. Evaluation: Does the model answer the question or is change needed +1. Deployment: Using the model in practice +1. Feedback (loop back to modeling): Use feedback and new data, to possibly + re-train or fine-tune the model, and answer the initial question.