The very basic HMM is based on the assumptions of first order Markov chain, and multinomial emission and transition probabilities, and is trained using standard Baum-Welch algorithm to maximize joint probabilities.
However, there are a lot of variations of Markovian assumptions (higher order Markov chain), of emission/transition probabilities ( Exponential distributions or even neural network (simulated) distribution), and of network structures (pair-wise HMM, factorial HMM
http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html). Theoretical people also think about maximizing conditional probabilites.
It is nor fair to only compare MEMM to the basic HMM.