Today's video calls suck for large group calls. When someone is speaking in a large Zoom call, they're the only person speaking in the entire room. They hold the metaphorical conversational lock and block any other people from speaking. Imagine if only a single person in a large room full of people could speak at any given time. That's the current state of large video calls today. Once a person is done talking, they release lock and the next person begins to talk. There can only ever be one conversation happening in a Zoom room at any given time. This single-threaded and blocking approach to conversation is the reason large video calls suck. A room where one person is speaking to 100 people is a lecture. A room where 100 people are speaking amongst themselves is a party.
Contrary to Zoom rooms, a real life room is fantastic for allowing parallel conversations. If I'm speaking with someone and I overhear a nearby conversation about something I'm interested in, I can chime in while continuing my existing conversation. If I want to join a new conversation, I can simply walk up to people and begin listening. Conversations aren't supposed to be discrete. They don't have a clear start or end, and the number of people in each conversation constantly change as people come and go. Two nearby conversations can merge into one large conversation, and a conversation of 4 people might split off into 2 one-on-one conversations.
I shouldn't have to invite a user to my conversation, or end a conversation when we're done speaking. There shouldn't be a conversation object in a database somewhere. I don't think the best way to allow parallel conversations is to build conversations as a discrete feature in your video chat app, but rather you should simulate the environment in which conversations can naturally occur.
The Solution
I think simulating a 2D, or isometric view of a large house and where each person has an avatar-bubble consisting of their video feed and can freely move the 2D space would be the simplest way to facilitate conversations in a large video chat. The friction between starting or stopping a conversation is equal to the amount of effort it takes to drag your avatar to a different part of the room.
You would only hear audio from nearby avatars, and each avatar's volume would be relative to its distance from your avatar. This means people could have one-on-one conversations, while also having large group conversations in the same call. An admin could grant access to a megaphone effect that would let anyone be heard at full volume no matter where they are in the room.
I've put together a quick mockup in Figma, and used Mii's as people's avatars. In reality, it can just be a bubble containing a user's video feed, with a solid border around the bubble whenever a person is speaking.
This solution wouldn't be ideal for smaller video calls, but it does a pretty decent job at simulating what real conversations are like in a virtual setting, with the least amount of effort. If this environment works well for large video calls, perhaps you could create always on rooms to let people freely chat with each other for fun. Something akin to a virtual bar or shopping mall. Once you have an environment that lets large groups to naturally have conversations amongst each other, the possibilities are endless.