.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_multinomial_hmm.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_multinomial_hmm.py: A simple example demonstrating Multinomial HMM ---------------------------------------------- The Multinomial HMM is a generalization of the Categorical HMM, with some key differences: - a Categorical__ (or generalized Bernoulli/multinoulli) distribution models an outcome of a die with `n_features` possible values, i.e. it is a generalization of the Bernoulli distribution where there are ``n_features`` categories instead of the binary success/failure outcome; a Categorical HMM has the emission probabilities for each component parametrized by Categorical distributions. - a Multinomial__ distribution models the outcome of ``n_trials`` independent rolls of die, each with ``n_features`` possible values; i.e. - when ``n_trials = 1`` and ``n_features = 2``, it is a Bernoulli distribution, - when ``n_trials > 1`` and ``n_features = 2``, it is a Binomial distribution, - when ``n_trials = 1`` and ``n_features > 2``, it is a Categorical distribution. The emission probabilities for each component of a Multinomial HMM are parameterized by Multinomial distributions. __ https://en.wikipedia.org/wiki/Categorical_distribution __ https://en.wikipedia.org/wiki/Multinomial_distribution .. GENERATED FROM PYTHON SOURCE LINES 31-130 .. rst-class:: sphx-glr-script-out .. code-block:: none MultinomialHMM has undergone major changes. The previous version was implementing a CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution (e.g. as in https://en.wikipedia.org/wiki/Multinomial_distribution). See these issues for details: https://github.com/hmmlearn/hmmlearn/issues/335 https://github.com/hmmlearn/hmmlearn/issues/340 Topics discussed: ['cat', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat', 'dog', 'cat', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat', 'dog', 'cat', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat', 'dog', 'cat', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat', 'dog', 'cat', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat', 'dog'] Learned emission probs: [[2.57129200e-01 2.86190571e-02 4.28541642e-01 2.85710101e-01] [1.33352852e-01 7.33292496e-01 2.67548571e-05 1.33327897e-01]] Learned transition matrix: [[0.71429762 0.28570238] [0.50007593 0.49992407]] MultinomialHMM has undergone major changes. The previous version was implementing a CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution (e.g. as in https://en.wikipedia.org/wiki/Multinomial_distribution). See these issues for details: https://github.com/hmmlearn/hmmlearn/issues/335 https://github.com/hmmlearn/hmmlearn/issues/340 New Model Topics discussed: ['dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat'] Learned emission probs: [[1.99798408e-01 5.96625425e-01 5.09102765e-02 1.52665890e-01] [2.33585307e-01 1.74056427e-04 4.67509588e-01 2.98731049e-01]] Learned transition matrix: [[0.66887532 0.33112468] [0.33454555 0.66545445]] | .. code-block:: Python import numpy as np from hmmlearn import hmm # For this example, we will model the stages of a conversation, # where each sentence is "generated" with an underlying topic, "cat" or "dog" states = ["cat", "dog"] id2topic = dict(zip(range(len(states)), states)) # we are more likely to talk about cats first start_probs = np.array([0.6, 0.4]) # For each topic, the probability of saying certain words can be modeled by # a distribution over vocabulary associated with the categories vocabulary = ["tail", "fetch", "mouse", "food"] # if the topic is "cat", we are more likely to talk about "mouse" # if the topic is "dog", we are more likely to talk about "fetch" emission_probs = np.array([[0.25, 0.1, 0.4, 0.25], [0.2, 0.5, 0.1, 0.2]]) # Also assume it's more likely to stay in a state than transition to the other trans_mat = np.array([[0.8, 0.2], [0.2, 0.8]]) # Pretend that every sentence we speak only has a total of 5 words, # i.e. we independently utter a word from the vocabulary 5 times per sentence # we observe the following bag of words (BoW) for 8 sentences: observations = [["tail", "mouse", "mouse", "food", "mouse"], ["food", "mouse", "mouse", "food", "mouse"], ["tail", "mouse", "mouse", "tail", "mouse"], ["food", "mouse", "food", "food", "tail"], ["tail", "fetch", "mouse", "food", "tail"], ["tail", "fetch", "fetch", "food", "fetch"], ["fetch", "fetch", "fetch", "food", "tail"], ["food", "mouse", "food", "food", "tail"], ["tail", "mouse", "mouse", "tail", "mouse"], ["fetch", "fetch", "fetch", "fetch", "fetch"]] # Convert "sentences" to numbers: vocab2id = dict(zip(vocabulary, range(len(vocabulary)))) def sentence2counts(sentence): ans = [] for word, idx in vocab2id.items(): count = sentence.count(word) ans.append(count) return ans X = [] for sentence in observations: row = sentence2counts(sentence) X.append(row) data = np.array(X, dtype=int) # pretend this is repeated, so we have more data to learn from: lengths = [len(X)]*5 sequences = np.tile(data, (5,1)) # Set up model: model = hmm.MultinomialHMM(n_components=len(states), n_trials=len(observations[0]), n_iter=50, init_params='') model.n_features = len(vocabulary) model.startprob_ = start_probs model.transmat_ = trans_mat model.emissionprob_ = emission_probs model.fit(sequences, lengths) logprob, received = model.decode(sequences) print("Topics discussed:") print([id2topic[x] for x in received]) print("Learned emission probs:") print(model.emissionprob_) print("Learned transition matrix:") print(model.transmat_) # Try to reset and refit: new_model = hmm.MultinomialHMM(n_components=len(states), n_trials=len(observations[0]), n_iter=50, init_params='ste') new_model.fit(sequences, lengths) logprob, received = new_model.decode(sequences) print("\nNew Model") print("Topics discussed:") print([id2topic[x] for x in received]) print("Learned emission probs:") print(new_model.emissionprob_) print("Learned transition matrix:") print(new_model.transmat_) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.044 seconds) .. _sphx_glr_download_auto_examples_plot_multinomial_hmm.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_multinomial_hmm.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_multinomial_hmm.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_