Dialog act (DA) tags are useful for many applications in natural language processing and automatic speech recognition. In this work, we introduce hidden backoff models (HBMs) where a large generalized backoff model is trained, using an embedded expectation-maximization (EM) procedure, on data that is only partially observed. We use HBMs as word models conditioned on both DAs and (hidden) DA-segments. Experimental results on the ICSI meeting recorder dialog act (MRDA) corpus show that our embedded EM algorithm can strictly increase log likelihood on training data and can effectively reduce the error rate on test data. Different improvements are shown using different numbers of hidden states for each DA. In the best case, test error can be reduced by 6.1% relative to our baseline, and is competitive with other models even without using acoustic prosody.