Clarification on Audio Injection & AI Bot Capabilities – Zoom Meeting SDK for Windows

Question

I am currently working on an AI-led interview platform that integrates with Zoom Meeting SDK for Windows (Zoom Native SDK), and I would like clarification on some SDK-level capabilities related to audio handling and AI automation.

What we have implemented so far
Created Zoom meetings programmatically using Zoom Meeting REST APIs

Enabled cloud recording and auto-transcription

Successfully retrieved recordings and transcript files (VTT) after meeting completion

Integrated Zoom Meeting SDK for Windows (C++)

AI bot is able to:

Authenticate successfully

Join Zoom meetings as a participant

Enable microphone and speaker

Participate in meetings like a normal user

Our requirement
We are building an AI interviewer that:

Joins Zoom meetings automatically as a participant

Uses AWS Polly to convert AI questions into speech (PCM/WAV)

Speaks those questions to the candidate during the meeting

Listens to candidate responses

Relies on Zoom cloud recording and transcription for post-meeting analysis

Where we are stuck
We want to confirm whether the Zoom Meeting SDK for Windows supports SDK-level audio injection, specifically:

Is it possible to inject AI-generated PCM/WAV audio directly into a Zoom meeting using the Meeting SDK (without using OS-level or virtual audio devices)?

Does the Meeting SDK provide any API similar to:

sendAudioFrame()

injectPCM()

or any mechanism to push audio buffers programmatically into the meeting?

If not supported, is routing AI audio via an OS-recognized microphone (physical or virtual) the only supported approach?

Are there any official recommendations or best practices from Zoom for implementing AI voice bots in Zoom meetings using the Meeting SDK?

We want to ensure our implementation aligns with Zoom’s supported capabilities and policies, and to understand whether this limitation is by design or if there are alternative SDK-supported approaches.

SDK & Environment Details
Platform: Windows

SDK: Zoom Meeting SDK for Windows (C++)

Use case: AI-led automated interviews

Audio source: AWS Polly (Neural TTS)

We would greatly appreciate confirmation or guidance from the Zoom engineering team on this topic, as it is a critical architectural decision for our product.

Thank you for your time and support.
Looking forward to your clarification.

ExpertswhoJohn · Answer

hi ​@Prabu3please try and use the developers community forumhttps://devforum.zoom.us/As I am a zoom developer champion too, let me try and answer.Zoom creates the transcripts, so you could pick up the user text replies from there?you may have to use RTMS to do this for which there will be usage charges.If you have questions to prompt, you could record them in advance but Zoom have not way to turn text to speech.“We want to confirm whether the Zoom Meeting SDK for Windows supports SDK-level audio injection, specifically:Is it possible to inject AI-generated PCM/WAV audio directly into a Zoom meeting using the Meeting SDK (without using OS-level or virtual audio devices)?”There is nothing in zoom to support this. You would have to create the audio and then push it as a virtual microphone.All the bestJohn

Clarification on Audio Injection & AI Bot Capabilities – Zoom Meeting SDK for Windows

1 reply

Getting Started

Getting Started

Sign up

Login with SSO

Login to the community

Login with SSO