### HMM模型

$$\max P(o|\lambda) =\max P(o_1 o_2 \dots o_n|\lambda_1 \lambda_2 \dots \lambda_n)$$

$$P(o_1 o_2 \dots o_n|\lambda_1 \lambda_2 \dots \lambda_n) = P(o_1|\lambda_1)P(o_2|\lambda_2)\dots P(o_n|\lambda_n)$$

$$P(o|\lambda)=\frac{P(o,\lambda)}{P(\lambda)}=\frac{P(\lambda|o)P(o)}{P(\lambda)}$$

$$P(\lambda|o)P(o)$$

$$P(\lambda|o)=P(\lambda_1|o_1)P(\lambda_2|o_2)\dots P(\lambda_n|o_n)$$

$$P(o)=P(o_1)P(o_2|o_1)P(o_3|o_1,o_2)\dots P(o_n|o_1,o_2,\dots,o_{n-1})$$

$$P(o)=P(o_1)P(o_2|o_1)P(o_3|o_2)\dots P(o_n|o_{n-1})\sim P(o_2|o_1)P(o_3|o_2)\dots P(o_n|o_{n-1})$$

$$P(\lambda|o)P(o)\sim P(\lambda_1|o_1) P(o_2|o_1) P(\lambda_2|o_2) P(o_3|o_2) \dots P(o_n|o_{n-1}) P(\lambda_n|o_n)$$

### Python实现

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54  from collections import Counter from math import log   hmm_model = {i:Counter() for i in 'sbme'}   with open('dict.txt') as f: for line in f: lines = line.decode('utf-8').split(' ') if len(lines[0]) == 1: hmm_model['s'][lines[0]] += int(lines[1]) else: hmm_model['b'][lines[0][0]] += int(lines[1]) hmm_model['e'][lines[0][-1]] += int(lines[1]) for m in lines[1:-1]: hmm_model['m'][m] += int(lines[1])   log_total = {i:log(sum(hmm_model[i].values())) for i in 'sbme'}   trans = {'ss':0.3, 'sb':0.7, 'bm':0.3, 'be':0.7, 'mm':0.3, 'me':0.7, 'es':0.3, 'eb':0.7 }   trans = {i:log(j) for i,j in trans.iteritems()}   def viterbi(nodes): paths = nodes[0] for l in range(1, len(nodes)): paths_ = paths paths = {} for i in nodes[l]: nows = {} for j in paths_: if j[-1]+i in trans: nows[j+i]= paths_[j]+nodes[l][i]+trans[j[-1]+i] k = nows.values().index(max(nows.values())) paths[nows.keys()[k]] = nows.values()[k] return paths.keys()[paths.values().index(max(paths.values()))]   def hmm_cut(s): nodes = [{i:log(j[t]+1)-log_total[i] for i,j in hmm_model.iteritems()} for t in s] tags = viterbi(nodes) words = [s[0]] for i in range(1, len(s)): if tags[i] in ['b', 's']: words.append(s[i]) else: words[-1] += s[i] return words

>>print ' '.join(hmm_cut(u'今天天气不错'))

>>print ' '.join(hmm_cut(u'李想是一个好孩子'))

>>print ' '.join(hmm_cut(u'小明硕士毕业于中国科学院计算所'))