Dia is a 1.6B parameter text-to-speech model that generates highly realistic dialogue in one pass, with capabilities for emotion and tone control through audio conditioning and nonverbal communication.