🎀 The Voice Courier: Meet RTP



This content originally appeared on DEV Community and was authored by SIP GAMES

β€œSDP made the rules, RTP plays the game.”

In the previous episode of SIP GAMES, we peeked inside the SDP invite that tells your opponent how you’d like to play: what codecs, what ports, and what IPs. But who actually carries the media?

🎮 Enter RTP β€” Real-time Transport Protocol.

🧳 What is RTP?

Think of RTP as the courier that carries your voice across the network β€” broken into little time-stamped, sequence-numbered packages.

  • SIP sets up the call
  • SDP describes the media setup
  • RTP sends the actual media (voice/video)

RTP runs on top of UDP (User Datagram Protocol) because it’s fast and tolerant of occasional loss β€” just like a real conversation.

🧬 RTP Packet Structure

Here’s the basic layout of an RTP packet:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|CC|M| PT | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization Source (SSRC) Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Contributing Source (CSRC) Identifiers (optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload (audio/video) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Let’s decode this header:

🔍 RTP Header Fields

Field What It Means
V Version (always 2)
P Padding (if extra bytes added)
X Extension header present
CC CSRC count (used in conferencing)
M Marker bit (e.g. start of a talkspurt)
PT Payload type (codec, e.g., 0 = PCMU, 96 = dynamic)
Sequence Number Increments by 1 per packet β€” used to detect loss
Timestamp Used for media playback timing
SSRC Sender’s unique ID
CSRC IDs of other contributing streams (optional)
Payload Actual audio or video data

🕒 Packetization Time (a.k.a. ptime)

What is packetization time?

It’s the duration of audio in each RTP packet, often advertised in SDP using a=ptime:20 (means 20 ms per packet).

Common values:

Codec Typical ptime Result
PCMU 20 ms 50 packets/sec
Opus Variable Can do 20–60 ms
G.729 20 ms Small, compressed

🧮 Frequency of RTP Transmission

The number of RTP packets per second depends on the codec’s ptime.

Example:

  • If ptime is 20ms, that’s 50 packets/second
  • If it’s 30ms, ~33.3 packets/sec
  • Higher ptime = fewer packets = less overhead
  • Lower ptime = smoother audio but more packets

🧠 Why Do I Care?

If you’re implementing RTP or trying to debug call quality:

  • Jitter? Check packet arrival times and timestamps
  • Audio out of sync? Sequence or timestamp mismatch
  • Silence or gaps? Packets lost or arriving too late
  • Wrong codec? Check the Payload Type (PT) field

RTP is everywhere in VoIP β€” and understanding this header lets you trace, debug, and build your own media streamers.

🛠 Example: A Real RTP Packet (with G.711)

Let’s say we’re using G.711 with 20ms ptime.

  • Payload Type: 0 (PCMU)
  • Sequence Number: 10567
  • Timestamp: 160000
  • SSRC: 0x789ABC
  • Payload: 160 bytes of G.711 data (8-bit PCM at 8000 Hz)

That’s 160 samples Γ— 8 kHz Γ— 20ms = 160 bytes

🎮 TL;DR

  • RTP carries media after SIP/SDP sets things up
  • Each RTP packet has headers: version, PT, seq, timestamp, etc.
  • Ptime defines how much media is in each packet
  • Frequency of packets is based on ptime
  • Use RTP headers to debug and analyze VoIP issues

📦 Up Next in SIP GAMES:

β€œSpy Tools for VoIP Agents” 🕵‍♂

We’ll break down the best open-source tools like Wireshark, sipp, and rtpengine, and show you how to capture, simulate, and troubleshoot your VoIP calls like a pro.

Follow @sip_games to keep leveling up your VoIP game.


This content originally appeared on DEV Community and was authored by SIP GAMES