13. Feature Engineering#

References#

  1. Burkov, A. (2019) The One Hundred Page Machine Learning Book - Required Textbook.

  2. Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. This book provides insights into understanding AI-generated models and results, reinforcing the importance of interpretability in human-AI collaboration.

Videos#

Feature Engineering in Machine Learning#

In traditional statistical modeling, particularly in regression analysis, the process of selecting explanatory variables was a crucial step. Analysts would identify relevant independent variables that best explained variability in the dependent variable, ensuring they met assumptions like linearity, independence, and minimal multicollinearity. This process—variable selection—focused on choosing and transforming numerical features to maximize predictive accuracy while maintaining interpretability.

In Machine Learning (ML), the concept of feature engineering extends beyond traditional variable selection. ML models can handle diverse feature types, including:

  • Real-valued variables (e.g., temperature, stock prices, rainfall)

  • Categorical variables (e.g., country, product category, job title)

  • Text features (e.g., words, phrases, document embeddings)

  • Image features (e.g., pixel values, edge maps, deep learning embeddings)

  • Time series data (e.g., sequences, lagged values)

  • Graph-based features (e.g., social network connections, road network distances)

In modern ML applications, features are not necessarily pre-defined by human experts; they can be automatically extracted, transformed, and optimized through computational techniques. This is particularly true for deep learning, where models learn hierarchical representations of raw data (e.g., convolutional layers for images or word embeddings for text).

Regardless of the model type, the core goal of feature engineering remains the same: to create representations of input data that improve predictive performance. This involves:

  • Feature Selection – Identifying the most relevant features.

  • Feature Transformation – Encoding categorical variables, normalizing numerical variables, or applying polynomial expansions.

  • Feature Extraction – Automatically learning new features, such as word embeddings for text or principal components for dimensionality reduction.

  • Feature Construction – Combining raw variables into more meaningful representations, such as aggregating time-based trends or deriving interaction terms.

The ability of ML models to process and integrate mixed data types (numbers, text, images, etc.) sets them apart from traditional statistical models. Ultimately, effective feature engineering can dramatically enhance model accuracy and efficiency, often proving to be more important than the choice of algorithm itself.

Feature Engineering Methods by Data Type#

Numeric Data (Continuous or Discrete)#

These methods are typically used for structured numerical data, reducing dimensionality, extracting patterns, or transforming raw features.

Dimensionality Reduction & Projection Methods#

  • Principal Component Analysis (PCA) – Projects data into uncorrelated principal components.

  • Empirical Orthogonal Functions (EOF) – Similar to PCA, commonly used in climate and geospatial analysis.

  • Independent Component Analysis (ICA) – Separates independent signals from mixed signals.

  • Factor Analysis – Extracts latent variables that explain observed variance.

  • t-SNE (t-Distributed Stochastic Neighbor Embedding) – Nonlinear dimensionality reduction for visualization.

  • UMAP (Uniform Manifold Approximation and Projection) – Similar to t-SNE but preserves more global structure.

Transformation & Scaling#

  • Log Transform – Helps stabilize variance and handle skewed distributions.

  • Box-Cox Transform – Used for normalizing skewed data.

  • Min-Max Scaling – Scales features to a specific range (e.g., [0,1]).

  • Z-Score Normalization – Standardizes features to have mean 0 and standard deviation 1.

  • Binning (Discretization) – Converts continuous variables into categorical bins (e.g., quantile bins).

Feature Extraction & Construction#

  • Fourier Transform / Wavelet Transform – Used in signal processing to extract frequency components.

  • Autoregressive Features (AR, MA, ARMA, ARIMA) – Common in time series modeling.

  • Polynomial Features – Generates polynomial terms for regression models.

  • Interaction Features – Combines two or more features multiplicatively or additively.

Categorical Data (Nominal or Ordinal)#

Methods for converting categorical variables into numerical representations.

Encoding Methods#

  • One-Hot Encoding – Converts categorical variables into binary vectors.

  • Ordinal Encoding – Assigns integer values based on category order.

  • Target Encoding (Mean Encoding) – Replaces categories with their mean response variable value.

  • Frequency Encoding – Replaces categories with their occurrence frequency.

  • Embedding Representation – Uses dense vector representations (learned in deep learning models).

Grouping & Aggregation#

  • Rare Category Grouping – Groups infrequent categories into an “Other” category.

  • Domain-Specific Binning – Groups categorical variables based on meaningful criteria (e.g., age groups).

Text Data#

Methods for extracting numerical features from text.

Traditional Text Feature Extraction#

  • Bag of Words (BoW) – Represents text as a sparse count matrix.

  • TF-IDF (Term Frequency-Inverse Document Frequency) – Weighs words by importance in a document corpus.

  • N-grams – Extracts word sequences (bigrams, trigrams, etc.).

  • Embedding & Deep Learning-Based Methods

  • Word2Vec (CBOW, Skip-Gram) – Learns word embeddings from text.

  • GloVe (Global Vectors for Word Representation) – Generates word embeddings using co-occurrence statistics.

  • FastText – A variation of Word2Vec that includes subword information.

  • BERT (Bidirectional Encoder Representations from Transformers) – Context-aware deep learning embeddings.

  • Sentence Transformers (SBERT, T5, etc.) – Generates sentence-level embeddings.

Advanced NLP Feature Engineering#

  • Topic Modeling (LDA, Latent Semantic Analysis) – Extracts underlying topics from a corpus.

  • Sentiment Analysis Scores – Maps text to a sentiment scale.

  • Named Entity Recognition (NER) – Identifies entities like names, locations, and organizations.

Image Data#

Methods for feature extraction from image-based data.

Traditional Feature Extraction#

  • Histogram of Oriented Gradients (HOG) – Captures edge directions and gradients.

  • Scale-Invariant Feature Transform (SIFT) – Extracts keypoints for object recognition.

  • Local Binary Patterns (LBP) – Captures texture patterns.

Deep Learning-Based Feature Extraction#

  • Convolutional Neural Networks (CNNs) – Extract hierarchical image features.

  • Pretrained CNN Embeddings (ResNet, VGG, Inception, EfficientNet) – Uses deep models trained on large datasets to extract features.

  • Autoencoders – Learns compressed latent representations of images.

  • GAN-Based Feature Learning (StyleGAN, BigGAN) – Generates and modifies image features.

Graph-Based Data (Networks, Relationships)#

Methods for encoding structured graph relationships.

Graph Feature Extraction#

  • Node Degree, Betweenness Centrality, Closeness Centrality – Captures node importance in a graph.

  • Eigenvector Centrality (PageRank) – Measures influence of a node in a network.

  • Graph Kernels (Weisfeiler-Lehman, Shortest Path Kernel) – Measures graph similarity.

Graph Embedding Methods#

  • Node2Vec – Learns node representations via random walks.

  • DeepWalk – Similar to Word2Vec but for graphs.

  • Graph Convolutional Networks (GCNs) – Extends deep learning to graphs.

  • Graph Attention Networks (GATs) – Uses attention mechanisms for node embeddings.

Mixed Data Types (Multimodal Learning)#

Methods for combining heterogeneous data sources (text, images, structured data, etc.).

Feature Fusion Approaches#

  • Concatenation-Based Fusion – Directly concatenates numerical, categorical, text, and image embeddings.

  • Autoencoder-Based Fusion – Learns a unified representation from multiple modalities.

  • Transformer-Based Multimodal Learning – Models like CLIP (Contrastive Language-Image Pretraining) combine vision and text.

  • Graph Neural Networks (GNNs) for Multimodal Data – Represents heterogeneous data in a structured way.

Importance of Feature Engineering

Feature engineering a critical step in Machine Learning. Whether working with structured numerical data, categorical data, or unstructured data (text, images, graphs), choosing the right transformation, encoding, or embedding method can significantly improve predictive performance.