Remote medical interpreting is now the default. The infrastructure under it was never built for clinical-grade conversation and it shows.
Remote interpretation is now the dominant way healthcare institutions deliver language access. The reasons are practical: faster connection times than on-site, broader language coverage, 24/7 availability, and lower cost per minute. The tradeoff is that remote interpretation runs on infrastructure (phone networks, public internet, consumer headsets, vendor data centers) that was never designed to carry clinical-grade conversations between three people who urgently need to understand each other.
The result, reported consistently in our interviews with providers, is two recurring complaints: interpreters sound far away and muffled, and calls drop in the middle of encounters. Both of these are technical problems with technical causes. This post walks through where they come from and what an infrastructure built for medical interpretation actually needs to deliver.
1. What are the main remote medical interpreting modalities, and how do they differ technically?
Remote medical interpreting comes in two main forms, and the difference between them is largely a difference of audio fidelity and network architecture.
1.1 Over-the-Phone Interpreting (OPI)
OPI runs over the Public Switched Telephone Network (PSTN), the same infrastructure that carries traditional landline and mobile telephony. Its strength is reliability. PSTN is one of the most consistent voice networks ever built, with very low jitter and predictable call quality even under load. That is why hospitals have used it for decades.
Its weakness is bandwidth, in the literal acoustic sense. PSTN was standardized around the ITU-T G.711 codec, which passes audio in the frequency band of 300 to 3,400 Hz. That range covers enough of the human voice to make speech intelligible, but it cuts off everything above roughly 3.4 kHz. This is called narrowband audio.
The clinical problem with narrowband is what it loses. The human voice carries identifying acoustic information up to 17 kHz. Consonants like "f," "s," "th," and "sh" are concentrated in the 4 to 8 kHz range, exactly where PSTN cuts the signal off. That is why voices on a phone often sound muffled, and why distinguishing between "fifteen" and "fifty," or "biopsy" and "autopsy," is meaningfully harder over the phone than in person.
1.2 Video Remote Interpreting (VRI)
VRI runs over the public internet, typically using Voice over IP (VoIP) protocols. It can carry wideband audio (also known as HD voice) at frequencies of 50 to 7,000 Hz using the ITU-T G.722 codec or modern alternatives like Opus. That is roughly double the acoustic range of PSTN, and the difference is audible. Video adds non-verbal cues, which research consistently rates as helpful in complex clinical encounters.
The tradeoffs are real, too. VRI depends on the local Wi-Fi, the hospital's network, the public internet between the institution and the vendor, and the endpoint hardware on both sides. Each of those is a potential point of failure. Video also consumes substantially more bandwidth than audio-only traffic, anywhere from 6x to 40x more depending on resolution and codec, which means VRI is the first thing to degrade when network conditions get bad.
So the modality choice is not free. OPI gives up audio fidelity for reliability. VRI gives up reliability for fidelity. Neither was designed for the specific demands of medical interpretation.
Spectrogram1 of a normal adult male voice - consistently reaches 10,000 Hz
2. Why does audio quality matter clinically in medical interpretation?
There is a common assumption that audio quality is an "experience" problem rather than a clinical one. The peer-reviewed evidence suggests otherwise.
A 2024 international survey of 47 professional medical interpreters published in *Perspectives* by researchers at the University of Surrey found that approximately 90% of remote interpreters reported common technical issues such as background noise and sound quality affecting their work. The same study found that 43.9% of telephone interpreting users reported negative performance effects, compared to 22.5% for video interpreting. The interpreters themselves rated TI as more cognitively demanding and stressful than VI, and both rated lower than in-person work on effective communication.
When audio quality drops, three things happen at once:
The interpreter has to ask for repetition more often, which extends the encounter.
The interpreter sometimes does not catch that they misheard, and the misinterpretation flows through to the provider or patient undetected.
Numerical precision degrades. Dosages, frequencies, dates, and named medications carry exactly the high-frequency consonant information that narrowband audio loses.
This is not a marginal issue. It is the failure mode that makes audio quality a patient safety topic, not a user experience topic. The fragmentation of access across the patient visit compounds this further, as we covered in our analysis of the multiple touchpoints problem in non-English speaking patient care.
3. What causes dropped calls and connectivity problems in remote interpreting?
Dropped calls are not a single phenomenon. They are the symptom of several different infrastructure problems that show up the same way on the provider's end.
Interpreter location. Many remote interpretation vendors route calls to interpreters working from home offices, sometimes domestically, often internationally. Every additional network hop between the interpreter and the healthcare institution adds latency and a point at which the call can fail. International routing in particular introduces undersea cable hops that are out of any vendor's direct control.
Interpreter equipment. A consumer-grade headset, a home Wi-Fi router under contention from other devices, or an outdated computer all degrade audio quality before the signal ever reaches the network. Vendors that contract individual interpreters often have limited ability to standardize this layer.
Vendor application and data center architecture. Where the vendor's servers physically sit, how their VoIP infrastructure is configured, and whether they use private network paths or rely on the public internet all materially affect call stability. Many remote interpretation vendors are built on top of generic telephony or video platforms that were not designed for medical use cases.
The healthcare institution's own network. Hospital Wi-Fi is notorious for being uneven. Coverage drops in basements, in shielded rooms (radiology suites, MRI rooms), and during peak usage hours. A remote interpretation call that originates in a clinical environment with weak Wi-Fi is going to fail more often than the vendor's uptime numbers suggest.
The cumulative effect is what providers actually experience: a call that connects fine in the office often drops in the exam room, and a Tuesday afternoon call sounds clean while a Sunday night call comes through garbled.
Off-hours coverage is a related problem that compounds the audio issue, because the highest-friction encounters (ED visits, complex consults that run long) are the ones most likely to hit network conditions and interpreter pools that produce lower quality. We unpacked this dynamic separately in the waiting time aspect of deploying medical interpreters.
4. How can health systems address audio and connectivity challenges in remote interpreting?
Most of the audio and connectivity problems in remote interpretation come from a single design choice: routing every interpretation through a long network path between the patient room and a remote human interpreter. Solving the problem at scale requires moving the interpretation closer to the point of care, not improving the long-distance path.
US-based servers, geographically close to the healthcare institution. This minimizes the network distance audio has to travel, which reduces latency and the surface area for packet loss.
AI interpretation that runs into the local device and delivers consistent audio output. Because the interpretation itself is generated on the platform rather than transmitted from a remote human in a home office, the audio output on the provider and patient side is consistent in quality across calls. It does not depend on what country the interpreter is in, what headset they bought, or what their home internet looked like that afternoon.
Internet traffic optimization for clinical settings. Specific techniques to reduce jitter, packet loss, and dropped sessions across the hospital network and the public internet path to the data center.
This addresses what the Surrey researchers, and our own provider interviews, identified as the central failure modes of remote interpretation: inconsistent audio fidelity and unreliable connections. It does not replace human interpreters. For high-stakes encounters where a human interpreter is the right call, escalation is built in as a one-button action by either the patient or the provider. See our position on this in human escalation in AI medical interpretation.
5. What should health system leaders look for when evaluating remote interpretation infrastructure?
For institutions evaluating remote interpretation vendors or auditing their current setup, four technical questions separate serious infrastructure from generic telephony:
What is the audio codec used? Narrowband (G.711, 300 to 3,400 Hz) is acceptable for some encounters but degrades clinical precision. Wideband (G.722, Opus, or equivalents) should be the baseline for medical use.
Where do the vendor's servers physically sit relative to the institution? Geographic distance is a real factor in call stability. US-based servers for US health systems is not a marketing point. It is an architecture decision with measurable latency consequences.
What is the network path from the patient room to the interpreter? Each additional hop is a point of failure. Architectures that minimize network distance, particularly by moving interpretation to the point of care, eliminate entire classes of dropped-call scenarios.
Is there a frictionless human escalation path when AI or any other primary interpretation modality is not the right tool? This is the difference between an interpretation product and a complete language access workflow.
If you would like to walk through your institution's specific remote interpretation infrastructure with our team, we are happy to do that as a working session, not a sales pitch. You can book a 30 minute conversation here.
FAQs
1. Why does remote medical interpreting often sound muffled or unclear?
Most over-the-phone interpreting runs on the Public Switched Telephone Network (PSTN), which is limited to narrowband audio in the 300 to 3,400 Hz range. That range cuts off the higher frequencies (4 to 8 kHz) where consonants like "f," "s," "th," and "sh" are concentrated. The result is voices that sound less crisp, and a meaningfully higher risk of mishearing similar-sounding numbers, medications, or dosages.
2. What is the difference between over-the-phone interpreting (OPI) and video remote interpreting (VRI)?
OPI is audio-only, runs on the traditional phone network, and is typically narrowband. VRI runs over the public internet, can support wideband (HD voice) audio and video, and adds non-verbal cues. OPI is more reliable but lower fidelity. VRI is higher fidelity but depends on network stability, hardware quality, and bandwidth availability at the point of care
3. Why do remote interpretation calls drop in the middle of clinical encounters?
Dropped calls in remote interpreting are usually caused by one of four issues: long network paths between the interpreter and the institution (often involving international routing), inconsistent interpreter-side equipment, vendor infrastructure not optimized for clinical use, or weak Wi-Fi inside the healthcare facility itself (especially in basements, MRI rooms, and shielded areas).
4. Does audio quality in remote interpretation affect medical accuracy?
Yes. A 2024 peer-reviewed survey of 47 medical interpreters published in Perspectives found that around 90% reported technical issues such as background noise and poor sound quality. When audio quality drops, interpreters request repetition more often, sometimes misinterpret without catching it, and lose acoustic precision on exactly the high-frequency consonants that distinguish similar numbers and medication names. This makes audio quality a clinical accuracy issue, not just an experience issue.
5. How does AI medical interpretation address the audio and connectivity issues of remote interpreting?
AI medical interpretation at the point of care shortens the network path. Because the interpretation is generated on a platform with US-based servers rather than transmitted from a remote human interpreter in a home office, audio output is consistent and does not depend on the interpreter's hardware, location, or home internet. Human escalation is always available, with a single button from either the patient or the provider, whether for an emotionally complex encounter or simply because either party prefers a human interpreter. Human choice is a right, not an escalation pathway.
Eyal Heldenberg
Co-founder and CEO, building No Barrier
Eyal has 20+ years in speech-to-speech and voice AI and is the co-founder of No Barrier AI, a HIPAA-compliant medical interpreter platform. Over the past two years, he has led its adoption across healthcare organizations, helping providers bridge dialect gaps, reduce compliance risk and improve patient safety. His mission is simple: ensure health equity by removing language barriers at the point of care.
Audio & Connectivity Challenges in Remote Interpreting
Eyal Heldenberg
Co-founder and CEO, building No Barrier
October 13, 2024
3
Minute Read
Remote medical interpreting is now the default. The infrastructure under it was never built for clinical-grade conversation and it shows.
Remote interpretation is now the dominant way healthcare institutions deliver language access. The reasons are practical: faster connection times than on-site, broader language coverage, 24/7 availability, and lower cost per minute. The tradeoff is that remote interpretation runs on infrastructure (phone networks, public internet, consumer headsets, vendor data centers) that was never designed to carry clinical-grade conversations between three people who urgently need to understand each other.
The result, reported consistently in our interviews with providers, is two recurring complaints: interpreters sound far away and muffled, and calls drop in the middle of encounters. Both of these are technical problems with technical causes. This post walks through where they come from and what an infrastructure built for medical interpretation actually needs to deliver.
1. What are the main remote medical interpreting modalities, and how do they differ technically?
Remote medical interpreting comes in two main forms, and the difference between them is largely a difference of audio fidelity and network architecture.
1.1 Over-the-Phone Interpreting (OPI)
OPI runs over the Public Switched Telephone Network (PSTN), the same infrastructure that carries traditional landline and mobile telephony. Its strength is reliability. PSTN is one of the most consistent voice networks ever built, with very low jitter and predictable call quality even under load. That is why hospitals have used it for decades.
Its weakness is bandwidth, in the literal acoustic sense. PSTN was standardized around the ITU-T G.711 codec, which passes audio in the frequency band of 300 to 3,400 Hz. That range covers enough of the human voice to make speech intelligible, but it cuts off everything above roughly 3.4 kHz. This is called narrowband audio.
The clinical problem with narrowband is what it loses. The human voice carries identifying acoustic information up to 17 kHz. Consonants like "f," "s," "th," and "sh" are concentrated in the 4 to 8 kHz range, exactly where PSTN cuts the signal off. That is why voices on a phone often sound muffled, and why distinguishing between "fifteen" and "fifty," or "biopsy" and "autopsy," is meaningfully harder over the phone than in person.
1.2 Video Remote Interpreting (VRI)
VRI runs over the public internet, typically using Voice over IP (VoIP) protocols. It can carry wideband audio (also known as HD voice) at frequencies of 50 to 7,000 Hz using the ITU-T G.722 codec or modern alternatives like Opus. That is roughly double the acoustic range of PSTN, and the difference is audible. Video adds non-verbal cues, which research consistently rates as helpful in complex clinical encounters.
The tradeoffs are real, too. VRI depends on the local Wi-Fi, the hospital's network, the public internet between the institution and the vendor, and the endpoint hardware on both sides. Each of those is a potential point of failure. Video also consumes substantially more bandwidth than audio-only traffic, anywhere from 6x to 40x more depending on resolution and codec, which means VRI is the first thing to degrade when network conditions get bad.
So the modality choice is not free. OPI gives up audio fidelity for reliability. VRI gives up reliability for fidelity. Neither was designed for the specific demands of medical interpretation.
Spectrogram1 of a normal adult male voice - consistently reaches 10,000 Hz
2. Why does audio quality matter clinically in medical interpretation?
There is a common assumption that audio quality is an "experience" problem rather than a clinical one. The peer-reviewed evidence suggests otherwise.
A 2024 international survey of 47 professional medical interpreters published in *Perspectives* by researchers at the University of Surrey found that approximately 90% of remote interpreters reported common technical issues such as background noise and sound quality affecting their work. The same study found that 43.9% of telephone interpreting users reported negative performance effects, compared to 22.5% for video interpreting. The interpreters themselves rated TI as more cognitively demanding and stressful than VI, and both rated lower than in-person work on effective communication.
When audio quality drops, three things happen at once:
The interpreter has to ask for repetition more often, which extends the encounter.
The interpreter sometimes does not catch that they misheard, and the misinterpretation flows through to the provider or patient undetected.
Numerical precision degrades. Dosages, frequencies, dates, and named medications carry exactly the high-frequency consonant information that narrowband audio loses.
This is not a marginal issue. It is the failure mode that makes audio quality a patient safety topic, not a user experience topic. The fragmentation of access across the patient visit compounds this further, as we covered in our analysis of the multiple touchpoints problem in non-English speaking patient care.
3. What causes dropped calls and connectivity problems in remote interpreting?
Dropped calls are not a single phenomenon. They are the symptom of several different infrastructure problems that show up the same way on the provider's end.
Interpreter location. Many remote interpretation vendors route calls to interpreters working from home offices, sometimes domestically, often internationally. Every additional network hop between the interpreter and the healthcare institution adds latency and a point at which the call can fail. International routing in particular introduces undersea cable hops that are out of any vendor's direct control.
Interpreter equipment. A consumer-grade headset, a home Wi-Fi router under contention from other devices, or an outdated computer all degrade audio quality before the signal ever reaches the network. Vendors that contract individual interpreters often have limited ability to standardize this layer.
Vendor application and data center architecture. Where the vendor's servers physically sit, how their VoIP infrastructure is configured, and whether they use private network paths or rely on the public internet all materially affect call stability. Many remote interpretation vendors are built on top of generic telephony or video platforms that were not designed for medical use cases.
The healthcare institution's own network. Hospital Wi-Fi is notorious for being uneven. Coverage drops in basements, in shielded rooms (radiology suites, MRI rooms), and during peak usage hours. A remote interpretation call that originates in a clinical environment with weak Wi-Fi is going to fail more often than the vendor's uptime numbers suggest.
The cumulative effect is what providers actually experience: a call that connects fine in the office often drops in the exam room, and a Tuesday afternoon call sounds clean while a Sunday night call comes through garbled.
Off-hours coverage is a related problem that compounds the audio issue, because the highest-friction encounters (ED visits, complex consults that run long) are the ones most likely to hit network conditions and interpreter pools that produce lower quality. We unpacked this dynamic separately in the waiting time aspect of deploying medical interpreters.
4. How can health systems address audio and connectivity challenges in remote interpreting?
Most of the audio and connectivity problems in remote interpretation come from a single design choice: routing every interpretation through a long network path between the patient room and a remote human interpreter. Solving the problem at scale requires moving the interpretation closer to the point of care, not improving the long-distance path.
US-based servers, geographically close to the healthcare institution. This minimizes the network distance audio has to travel, which reduces latency and the surface area for packet loss.
AI interpretation that runs into the local device and delivers consistent audio output. Because the interpretation itself is generated on the platform rather than transmitted from a remote human in a home office, the audio output on the provider and patient side is consistent in quality across calls. It does not depend on what country the interpreter is in, what headset they bought, or what their home internet looked like that afternoon.
Internet traffic optimization for clinical settings. Specific techniques to reduce jitter, packet loss, and dropped sessions across the hospital network and the public internet path to the data center.
This addresses what the Surrey researchers, and our own provider interviews, identified as the central failure modes of remote interpretation: inconsistent audio fidelity and unreliable connections. It does not replace human interpreters. For high-stakes encounters where a human interpreter is the right call, escalation is built in as a one-button action by either the patient or the provider. See our position on this in human escalation in AI medical interpretation.
5. What should health system leaders look for when evaluating remote interpretation infrastructure?
For institutions evaluating remote interpretation vendors or auditing their current setup, four technical questions separate serious infrastructure from generic telephony:
What is the audio codec used? Narrowband (G.711, 300 to 3,400 Hz) is acceptable for some encounters but degrades clinical precision. Wideband (G.722, Opus, or equivalents) should be the baseline for medical use.
Where do the vendor's servers physically sit relative to the institution? Geographic distance is a real factor in call stability. US-based servers for US health systems is not a marketing point. It is an architecture decision with measurable latency consequences.
What is the network path from the patient room to the interpreter? Each additional hop is a point of failure. Architectures that minimize network distance, particularly by moving interpretation to the point of care, eliminate entire classes of dropped-call scenarios.
Is there a frictionless human escalation path when AI or any other primary interpretation modality is not the right tool? This is the difference between an interpretation product and a complete language access workflow.
If you would like to walk through your institution's specific remote interpretation infrastructure with our team, we are happy to do that as a working session, not a sales pitch. You can book a 30 minute conversation here.