.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_gaussian_model_selection.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_gaussian_model_selection.py: Using AIC and BIC for Model Selection ------------------------------------- This example will demonstrate how the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values may be used to select the number of components for a model. 1) We train models with varying numbers of ``n_components``. 2) For each ``n_components`` we train multiple models with different random initializations; the best model is kept. 3) Now we plot the values of the AIC and BIC for each n_components. A clear minimum is detected for the model with ``n_components=4``. We also see that using the log-likelihood of the training data is not suitable for model selection, as it is always increasing. .. GENERATED FROM PYTHON SOURCE LINES 17-24 .. code-block:: Python import numpy as np import matplotlib.pyplot as plt from sklearn.utils import check_random_state from hmmlearn.hmm import GaussianHMM rs = check_random_state(546) .. GENERATED FROM PYTHON SOURCE LINES 25-26 Our model to generate sample data from: .. GENERATED FROM PYTHON SOURCE LINES 26-39 .. code-block:: Python model = GaussianHMM(4, init_params="") model.n_features = 4 model.startprob_ = np.array([1/4., 1/4., 1/4., 1/4.]) model.transmat_ = np.array([[0.3, 0.4, 0.2, 0.1], [0.1, 0.2, 0.3, 0.4], [0.5, 0.2, 0.1, 0.2], [0.25, 0.25, 0.25, 0.25]]) model.means_ = np.array([[-2.5], [0], [2.5], [5.]]) model.covars_ = np.sqrt([[0.25], [0.25], [0.25], [0.25]]) X, _ = model.sample(1000, random_state=rs) lengths = [X.shape[0]] .. GENERATED FROM PYTHON SOURCE LINES 40-44 Search over various n_components and examine the AIC, BIC, and the LL of the data. Train a few different models with different random initializations, saving the one with the best LL. .. GENERATED FROM PYTHON SOURCE LINES 44-62 .. code-block:: Python aic = [] bic = [] lls = [] ns = [2, 3, 4, 5, 6] for n in ns: best_ll = None best_model = None for i in range(10): h = GaussianHMM(n, n_iter=200, tol=1e-4, random_state=rs) h.fit(X) score = h.score(X) if not best_ll or best_ll < best_ll: best_ll = score best_model = h aic.append(best_model.aic(X)) bic.append(best_model.bic(X)) lls.append(best_model.score(X)) .. GENERATED FROM PYTHON SOURCE LINES 63-65 Visualize our results: a clear minimum is seen for 4 components which matches our expectation. .. GENERATED FROM PYTHON SOURCE LINES 65-79 .. code-block:: Python fig, ax = plt.subplots() ln1 = ax.plot(ns, aic, label="AIC", color="blue", marker="o") ln2 = ax.plot(ns, bic, label="BIC", color="green", marker="o") ax2 = ax.twinx() ln3 = ax2.plot(ns, lls, label="LL", color="orange", marker="o") ax.legend(handles=ax.lines + ax2.lines) ax.set_title("Using AIC/BIC for Model Selection") ax.set_ylabel("Criterion Value (lower is better)") ax2.set_ylabel("LL (higher is better)") ax.set_xlabel("Number of HMM Components") fig.tight_layout() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_gaussian_model_selection_001.png :alt: Using AIC/BIC for Model Selection :srcset: /auto_examples/images/sphx_glr_plot_gaussian_model_selection_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 12.218 seconds) .. _sphx_glr_download_auto_examples_plot_gaussian_model_selection.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gaussian_model_selection.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gaussian_model_selection.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_