Microsoft introduces VibeVoice, a Whisper-style speech-to-text model with speaker diarization.