Answer first
Use this comparison to choose an operating surface, not to force a universal winner.
The better question is which stack best matches the workflow you need to ship and control.
If the goal is a browser-based voice agent, the important questions are session control, moderation boundaries, business logic placement, and how the user enters or exits the conversation. Official OpenAI docs now make those operational layers explicit through the Realtime API and sideband control patterns.
If the goal is to understand what made Seeduplex interesting in the first place, the ByteDance framing is about attentive listening, robustness under interference, and natural back-and-forth. That search intent often maps to a public voice experience or a guided rehearsal product.
For most teams, the practical next step is to define the product shell first: public website agent, training scene, or platform workflow. Once that surface is clear, model choice becomes easier to evaluate without muddy benchmark claims.