Newcomer

Smart Name Tags (Video & Voice) not auto-recognizing in Zoom Rooms, detecting ghost speakers (Hungarian language environment)

Forum|Forum|29 days ago
June 25, 2026
1 reply
46 views

Hello Community,

We are facing a major issue with the Smart Name Tags (both Video and Voice) feature in our Zoom Rooms setup. The system fails to recognize participants automatically, forcing us into a completely manual workflow. On top of that, the detection is highly inaccurate.

Our Setup:

Licenses: Zoom Rooms Subscription + Zoom Workplace Pro.
Profiles: All recurring participants have created and saved their unique audio/voice profiles.
Settings: All necessary admin policies are enabled, and the expected participants are explicitly added to the meeting invitations by name.
Language: 99% of our meetings are conducted in Hungarian, meaning the spoken language is Hungarian. Voice profiles were recorded successfully.

The Problem:

Instead of automatic identification, the system requires us to manually assign names to the detected voices.

This creates a massive workflow bottleneck when hosting meetings with 7–8 people in the room:

Real-time confusion: In the "Voice" management section, participants just show up as Speaker 1, Speaker 2, Speaker 3, etc. During a fast-paced live meeting, it is completely impossible for the moderator to actively track and manually assign who is who in real-time.
Post-meeting guesswork: Trying to fix this after the meeting is pure guesswork. Unless you have a flawless memory of the exact speaking order, you end up blindly guessing who Speaker 1 actually was, which completely defeats the purpose of accurate transcription and smart meeting summaries.

"Ghost" Speakers and inaccuracy: The system's accuracy is heavily flawed. We've had instances where only 2 of us were sitting in the room, yet the Smart Name Tag for Voice detected 3 different speakers. This completely breaks the utility of the feature.

The strangest part is that the Video part works to an extent: if I manually tag myself on the video feed, the system tracks me and displays my name on the screen. Logically, since the AI can see who is moving their mouth and the microphone array can detect the spatial direction of the audio source, the system should be able to accurately cross-reference these data points and identify the speaker. Yet, this automatic mapping never happens.

Could the fact that the spoken language is Hungarian be causing the automated matching to fail entirely? Has anyone encountered a similar issue where auto-recognition fails completely and even generates false speakers?

Appreciate any insights or solutions!