2.6 KiB
2.6 KiB
Data analytics
Feature engineering
Definition
- The process that attempts to create additional relevant features from existing raw features, to increase the predictive power of algorithms
- Alternative definition: transfer raw data into features that better represent the underlying problem, such that the accuracy of predictive model is improved.
- Important to machine learning
Sources of features
- Different features are needed for different problems, even in the same domain
Feature engineering in ML
- Process of ML iterations:
- Baseline model -> Feature engineering -> Model 2 -> Feature engineering -> Final
- Example: data needed to predict house price
- ML can do that with sufficient feature
- Reason for feature engineering: Raw data are rarely useful
- Must be mapped into a feature vector
- Good feature engineering takes the most time out of ML
Types of feature engineering
- Indicator variable to isolate information
- Highlighting interactions between features
- Representing the feature in a different way
Good feature:
- Related to objective (important)
- Example: the number of concrete blocks around it is not related to house prices
- Known at prediction-time
- Some data could be known immediately, and some other data is not known in real time: Can't feed the feature to a model, if it isn't present at prediction time
- Feature definition shouldn't change over time
- Example: If the sales data at prediction time is only available within 3 days, with a 3 day lag, then current sale data can't be used for training (that has to predict with a 3-day old data)
- Numeric with meaningful magnitude:
- It does not mean that categorical features can't be used in training: simply, they will need to be transformed through a process called one-hot encoding
- Example: Font category: (Arial, Times New Roman)
- Have enough samples
- Have at least five examples of any value before using it in your model
- If features tend to be poorly assorted and are unbalanced, then the trained model will be biased
- Bring human insight to problem