Clarification on Audio Injection & AI Bot Capabilities – Zoom Meeting SDK for Windows
Hello Zoom Developer Support Team,
I hope you are doing well.
I am currently working on an AI-led interview platform that integrates with the Zoom Meeting SDK for Windows (Zoom Native SDK). I would like clarification on some SDK-level capabilities related to audio handling and AI automation.
What we have implemented so far:
Created Zoom meetings programmatically using Zoom Meeting REST APIs
Enabled cloud recording and auto-transcription
Successfully retrieved recordings and transcript files (VTT) after meeting completion
Integrated Zoom Meeting SDK for Windows (C++)
The AI bot is able to:
Authenticate successfully
Join Zoom meetings as a participant
Enable microphone and speaker
Participate in meetings like a normal user
Our requirement:
We are building an AI interviewer that:
Joins Zoom meetings automatically as a participant
Uses AWS Polly to convert AI questions into speech (PCM/WAV)
Speaks those questions to the candidate during the meeting
Listens to candidate responses
Relies on Zoom cloud recording and transcription for post-meeting analysis
Where we are stuck:
We want to confirm whether the Zoom Meeting SDK for Windows supports SDK-level audio injection. Specifically:
Is it possible to inject AI-generated PCM or WAV audio directly into a Zoom meeting using the Meeting SDK, without using OS-level or virtual audio devices?
Does the Meeting SDK provide any API similar to sendAudioFrame, injectPCM, or any mechanism to push audio buffers programmatically into the meeting?
If this is not supported, is routing AI audio via an OS-recognized microphone (physical or virtual) the only supported approach?
Are there any official recommendations or best practices from Zoom for implementing AI voice bots in Zoom meetings using the Meeting SDK?
We want to ensure our implementation aligns with Zoom’s supported capabilities and policies, and to understand whether this limitation is by design or if there are alternative SDK-supported approaches.
SDK and environment details:
Platform: Windows
SDK: Zoom Meeting SDK for Windows (C++)
Use case: AI-led automated interviews
Audio source: AWS Polly (Neural TTS)
We would greatly appreciate confirmation or guidance from the Zoom engineering team on this topic, as it is a critical architectural decision for our product.
Thank you for your time and support.
Looking forward to your clarification.
Thanks,
