Model selection as an approach

3.6 Explanation

3.6.1 The basics

Model selection looks at suites of variables in otherwise comparable models.

The models are compared by some number (such as AIC, AICc, or BIC, defined later) and ranked. The “best” (more accurately the highest-ranked) models are the best of the set, not the best in all the possible universe. You might have not added all the needed variables to explain your data
This approach can be modified with “model averaging” that allows you to combine multiple of the highest-ranked models.

Read chapter 8 of the foundational book first to see if the approach is right for you (Kenneth P. Burnham and Anderson 2008).

3.6.2 More technical

3.6.2.1 Questions and data types

Any generalized linear model (linear regression, generalized linear models, ANOVA, ANCOVA, and more) that generates AICs can work

3.6.2.2 Key assumptions

You provide selected, well-thought-out hypotheses and compare them
- This is best suited for comparing suites of variables instead of all possible combinations
- Any higher-ranked models may not be the best model, only the best model you’ve tested
You give up significance (p-values) with this approach
- To find out how of the model explains your data, you can look at weights
- You don’t get to interpret the coefficients by significance like you do with frequentist statistics

3.6.2.4 Common terminology confusions

Some papers will describe the approach as “information-theoretic” (Anderson and Burnham 2002) and others generally refer to the methods as “model selection” (Kenneth P. Burnham and Anderson 2008). Model selection is a type of information-theoretic approach, but information-theoretic approach may be used as shorthand for it in casual writing or conversation.
Model averaging is a type of multi-model inference but the terms are sometimes used interchangeably in more casual writing or conversation.

3.6.2.5 Cite these

Kenneth P. Burnham and Anderson (2008) is the classic baseline. The summary chapter (8) is a good starting place to understand if the approach is right for your data and problem. If you plan to use the models, then go ahead and read the whole book followed by the implementation papers above.

3.6.2.6 Implementations and controversies

AICcmodavg R package has a vignette with examples and details (Mazerolle 2023). Open software.
Basic difficulties are covered in Anderson and Burnham (2002).
Kenneth P. Burnham, Anderson, and Huyvaert (2011) provides a basic summary as well as how to use multi-model inference (also called model averaging).
Harrison et al. (2018) covers multi-model inference as well as using mixed models as the base models to compare. Open access.
Model averaging has some controversies about how to implement correctly (Cade 2015).

3.7 Examples “in the wild”

Citations and what is useful in the paper. To be found.

3.8 Reporting results

Anderson et al. (2001) compares how to present data analyses of typical frequentist (p-value) analyses as well as information-theoretic (model selection) and Bayesian results. The paper mainly focuses on the advantages and ways to present the methods and analyses (results) for model selection. They strongly advise not mixing frequentist (p-value and test statistic reporting) and information-theoretic (model selection) results.

References

Anderson, David R., and Kenneth P Burnham. 2002. “Avoiding Pitfalls When Using Information-Theoretic Methods.” Journal of Wildlife Management 66 (3): 912–18. https://doi.org/10.2307/3803155.

Anderson, David R., William A. Link, Douglas H. Johnson, and Kenneth P. Burnham. 2001. “Suggestions for Presenting the Results of Data Analyses.” The Journal of Wildlife Management 65 (3): 373. https://doi.org/10.2307/3803088.

Burnham, Kenneth P., David R. Anderson, and Kathryn P. Huyvaert. 2011. “AIC Model Selection and Multimodel Inference in Behavioral Ecology: Some Background, Observations, and Comparisons.” Behavioral Ecology and Sociobiology 65 (1): 23–35. https://doi.org/10.1007/s00265-010-1029-6.

Burnham, Kenneth P, and David Raymond Anderson. 2008. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York, NY [u.a.]: Springer.

Cade, Brian S. 2015. “Model Averaging and Muddled Multimodel Inferences.” Ecology 96 (9): 2370–82. https://doi.org/10.1890/14-1639.1.

Harrison, Xavier A., Lynda Donaldson, Maria Eugenia Correa-Cano, Julian Evans, David N. Fisher, Cecily E. D. Goodwin, Beth S. Robinson, David J. Hodgson, and Richard Inger. 2018. “A Brief Introduction to Mixed Effects Modelling and Multi-Model Inference in Ecology.” PeerJ 6 (May): e4794. https://doi.org/10.7717/peerj.4794.

Hegyi, Gergely, and László Zsolt Garamszegi. 2011. “Using Information Theory as a Substitute for Stepwise Regression in Ecology and Behavior.” Behavioral Ecology and Sociobiology 65 (1): 69–76. https://doi.org/10.1007/s00265-010-1036-7.

Mazerolle, Marc J. 2023. “Model Selection and Multimodel Inference Using the ‘AICcmodavg‘ Package.” CRAN. https://cran.r-project.org/web/packages/AICcmodavg/vignettes/AICcmodavg.pdf.

Mundry, Roger, and Charles L. Nunn. 2009. “Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution.” American Naturalist 173 (1): 119–23. https://doi.org/10.1086/593303.

Olden, Julian D., Joshua J. Lawler, and N. LeRoy Poff. 2008. “Machine Learning Methods Without Tears: A Primer for Ecologists.” The Quarterly Review of Biology 83 (2): 171–93. http://www.jstor.org/stable/10.1086/587826.

Smith, Gary. 2018. “Step Away from Stepwise.” Journal of Big Data 5 (1): 32. https://doi.org/10.1186/s40537-018-0143-6.