# Model Selection Techniques: An Overview

@article{Ding2018ModelST, title={Model Selection Techniques: An Overview}, author={Jie Ding and Vahid Tarokh and Yuhong Yang}, journal={IEEE Signal Processing Magazine}, year={2018}, volume={35}, pages={16-34} }

In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such… Expand

#### Figures, Tables, and Topics from this paper

#### 91 Citations

On Statistical Efficiency in Learning

- Computer Science, Mathematics
- IEEE Transactions on Information Theory
- 2021

A generalized notion of Takeuchi’s information criterion is proposed and it is proved that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. Expand

Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error

- Computer Science
- Mach. Learn. Knowl. Extr.
- 2019

This paper discusses criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities. Expand

Selection of Heteroscedastic Models: A Time Series Forecasting Approach

- Mathematics
- Applied Mathematics
- 2019

To overcome the weaknesses of in-sample model selection, this study adopted out-of-sample model selection approach for selecting models with improved forecasting accuracies and performances. Daily… Expand

Targeted Cross-Validation

- Computer Science, Mathematics
- ArXiv
- 2021

This work proposes a targeted cross-validation (TCV) to select models or procedures based on a general weighted L2 loss and shows that the TCV is consistent in selecting the best performing candidate under the weighted L1 loss. Expand

Variable Grouping Based Bayesian Additive Regression Tree

- Computer Science, Engineering
- ArXiv
- 2019

A two-stage method named variable grouping based Bayesian additive regression tree (GBART) with a well-developed python package gbart available is proposed to enhance the predictive performance of ensemble methods for regression. Expand

Model family selection for classification using Neural Decision Trees

- Computer Science, Mathematics
- ArXiv
- 2020

This paper proposes a method to reduce the scope of exploration needed for model selection by progressively relaxing the decision boundaries of the initial decision trees (the RMs) as long as this is beneficial in terms of performance measured on an analyzed dataset. Expand

On improvability of model selection by model averaging

- Mathematics
- 2021

Abstract In regression, model averaging (MA) provides an alternative to model selection (MS), and asymptotic efficiency theories have been derived for both MS and MA. Basically, under sensible… Expand

Controlling the error probabilities of model selection information criteria using bootstrapping

- Computer Science
- 2019

The Error Control for Information Criteria (ECIC) method is presented, a bootstrap approach to controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions. Expand

Consistent model selection criteria and goodness-of-fit test for affine causal processes

- Mathematics
- 2019

This paper studies the model selection problem in a large class of causal time series models, which includes both the ARMA or AR(∞) processes, as well as the GARCH or ARCH(∞), APARCH, ARMA-GARCH and… Expand

Statistical Methods for the Automatization of Basic Loss Model Calibration

- Computer Science
- 2019

A method to detect trends and change points in the loss triangles of basic loss portfolios in order to ensure an appropriate assessment of the claims reserve and the premium risk and reserve risk based on these data. Expand

#### References

SHOWING 1-10 OF 92 REFERENCES

Variable Selection Diagnostics Measures for High-Dimensional Regression

- Mathematics
- 2014

Many exciting results have been obtained on model selection for high-dimensional data in both efficient algorithms and theoretical developments. The powerful penalized regression methods can give… Expand

Model selection and multimodel inference : a practical information-theoretic approach

- Computer Science
- 2003

The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy is… Expand

Model selection and estimation in regression with grouped variables

- Mathematics
- 2006

Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor… Expand

Toward an objective and reproducible model choice via variable selection deviation.

- Mathematics, Medicine
- Biometrics
- 2017

For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies based on variable selection deviation. Expand

On Model Selection Consistency of Lasso

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2006

It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large. Expand

Model Selection and the Principle of Minimum Description Length

- Mathematics
- 2001

This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL… Expand

Parametric or nonparametric? A parametricness index for model selection

- Mathematics
- 2011

In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when… Expand

Model selection by MCMC computation

- Mathematics, Computer Science
- Signal Process.
- 2001

This paper addresses the MCMC methods from the second group, which allow for generation of samples from probability distributions defined on unions of disjoint spaces of different dimensions and shows why sampling from such distributions is a nontrivial task. Expand

Extended Bayesian information criteria for model selection with large model spaces

- Mathematics
- 2008

The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose an… Expand

PREDICTION/ESTIMATION WITH SIMPLE LINEAR MODELS: IS IT REALLY THAT SIMPLE?

- Mathematics
- Econometric Theory
- 2006

Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and information criteria can… Expand