Tutorial – Building an AI Deepfake Detector Chrome Plugin



This content originally appeared on DEV Community and was authored by aurigin

Introduction

Generative AI has become so good and so common that it has now flooded the web, hard for our eyes and ears to see what is real and fake. That’s why, in this tutorial we’ll build a chrome extension that can help you check wether any content is real or AI-generated.

When the user clicks a “Scan” button, the extension will listen to the audio playing in the current browser tab for a few seconds, send it to Aurigin.ai’s free AI deepfake detection API, and display the results.

Quick links

What we will cover

  1. Chrome extension setup: Creating a Chrome extension (Manifest V3) with necessary permissions.

  2. User interface: Building a clean popup UI with a “Scan” button and result display.

  3. Capturing tab audio: Using Chrome’s tab capture API to grab live audio from the active tab.

  4. Recording audio: Recording a 5-10 second clip of the tab’s audio and preparing it for analysis.

  5. Calling the deepfake detection API: Uploading the audio clip to Aurigin’s /predict API, including creating an API key.

  6. Displaying results: Parsing the API response (predictions and confidence scores) and displaying a clear verdict.

By the end, you’ll have a working Chrome extension that can warn you if the audio you are hearing is likely a deepfake. Let’s get started!

Setting up the chrome extension

First, we’ll create a new folder for your extension (e.g. audio-deepfake-detector). In this folder, we need a manifest file and our extension’s HTML/JS source files.

Create a file manifest.json with the following content:

{
    "manifest_version": 3,
    "name": "AI Audio Deepfake Detector",
    "version": "1.0",
    "description": "Detects AI-generated (deepfake) audio in the current tab in real-time.",
    "permissions": [
      "tabCapture"
    ],
    "host_permissions": [
      "https://aurigin.ai/*"
    ],
    "action": {
      "default_popup": "popup.html",
      "default_icon": "icon.png"
    }
  }

Explanations:

  • We use Manifest V3 (the latest format for Chrome extensions). The extension’s name, version, and description are provided.

  • The extension needs permission to capture tab audio, so we include “tabCapture” in the permissions. Chrome’s Tab Capture API lets us access a MediaStream of the tab’s audio (and video if needed).

  • We also include a host permission for the Aurigin API domain. This allows our extension to make requests to https://aurigin.ai/* (where the deepfake detection API is hosted) from our extension. Without this, the extension’s fetch call to the API may be blocked by Chrome’s extension security policy.

  • The “action” section defines the extension’s browser action (the little toolbar icon). We specify a default_popup HTML file (popup.html) which will serve as the extension’s UI when clicked. We also reference an icon.png (you need to supply a 16×16 or 48×48 icon image for the extension).

With the manifest ready, let’s create the UI.

Creating the extension’s UI (popup.html & popup.css)

Our extension’s popup will be a small HTML page with a “Scan” button and an area to display results. We want the UI to be clean and easy to use: one click to scan, then clearly show whether the audio is real or fake along with a confidence score.

Create a file popup.html:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>AI Audio Deepfake Detector</title>
    <link rel="stylesheet" href="popup.css" />
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
  </head>
  <body>
    <div class="container">
      <!-- Header -->
      <div class="header">
        <div class="logo">
          <div class="logo-icon">🤖</div>
          <h1>Deepfake Detector</h1>
        </div>
        <p class="subtitle">AI-powered audio authenticity analysis</p>
      </div>

      <!-- Main Content -->
      <div class="content">
        <!-- Scan Button -->
        <button id="scanBtn" class="scan-button">
          <div class="button-content">
            <svg class="scan-icon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
              <path d="M21 16V8a2 2 0 0 0-1-1.73l-7-4a2 2 0 0 0-2 0l-7 4A2 2 0 0 0 3 8v8a2 2 0 0 0 1 1.73l7 4a2 2 0 0 0 2 0l7-4A2 2 0 0 0 21 16z"></path>
              <polyline points="3.27,6.96 12,12.01 20.73,6.96"></polyline>
              <line x1="12" y1="22.08" x2="12" y2="12"></line>
            </svg>
            <span class="button-text">Analyze Audio</span>
          </div>
          <div class="button-loading hidden">
            <div class="spinner"></div>
            <span>Analyzing...</span>
          </div>
        </button>

        <!-- Status Display -->
        <div id="status" class="status-message hidden">
          <div class="status-icon">🎯</div>
          <p></p>
        </div>

        <!-- Result Display -->
        <div id="result" class="result-card hidden">
          <div class="result-header">
            <div class="result-icon"></div>
            <h3 class="result-title"></h3>
          </div>
          <div class="result-content">
            <div class="confidence-bar">
              <div class="confidence-fill"></div>
            </div>
            <p class="confidence-text"></p>
            <div class="result-details"></div>
          </div>
        </div>

        <!-- Instructions -->
        <div class="instructions">
          <h4>How it works:</h4>
          <ul>
            <li>Captures 5 seconds of audio from the active tab</li>
            <li>Analyzes using advanced AI detection</li>
            <li>Shows authenticity results with confidence scores</li>
          </ul>
        </div>

        <!-- Aurigin Branding -->
        <div class="branding">
          <p class="branding-text">Made with ❤ by</p>
          <img src="aurigin_logo_black.png" alt="Aurigin" class="aurigin-logo">
        </div>
      </div>
    </div>

    <script src="popup.js"></script>
  </body>
</html>

And popup.css for some basic styling:

/* Modern Chrome Extension Popup Styles */
* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}

body {
  font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
  background: white;
  color: #333;
  width: 350px;
  height: 450px;
  overflow: hidden;
  margin: 0;
  padding: 0;
}

.container {
  background: white;
  overflow: hidden;
}

/* Header Styles */
.header {
  background: linear-gradient(135deg, rgb(0, 15, 60) 0%, rgb(0, 150, 255) 100%);
  color: white;
  padding: 20px 16px;
  text-align: center;
}

.logo {
  display: flex;
  align-items: center;
  justify-content: center;
  gap: 10px;
  margin-bottom: 6px;
}

.logo-icon {
  font-size: 24px;
  filter: drop-shadow(0 2px 4px rgba(0, 0, 0, 0.2));
}

h1 {
  font-size: 18px;
  font-weight: 700;
  letter-spacing: -0.5px;
}

.subtitle {
  font-size: 12px;
  opacity: 0.9;
  font-weight: 400;
}

/* Content Styles */
.content {
  padding: 20px 16px;
  height: calc(100% - 120px);
  overflow-y: auto;
}

/* Button Styles */
.scan-button {
  width: 100%;
  background: #E1FF5A;
  color: #131C29;
  border: none;
  padding: 14px 20px;
  border-radius: 10px;
  font-size: 15px;
  font-weight: 600;
  cursor: pointer;
  transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
  box-shadow: 0 4px 12px rgba(225, 255, 90, 0.3);
  position: relative;
  overflow: hidden;
  margin-bottom: 16px;
}

.scan-button:hover:not(:disabled) {
  transform: translateY(-2px);
  box-shadow: 0 8px 20px rgba(225, 255, 90, 0.4);
}

.scan-button:active:not(:disabled) {
  transform: translateY(0);
}

.scan-button:disabled {
  background: #9E9E9E;
  color: #666;
  cursor: not-allowed;
  transform: none;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}

.button-content {
  display: flex;
  align-items: center;
  justify-content: center;
  gap: 8px;
}

.button-loading {
  display: flex;
  align-items: center;
  justify-content: center;
  gap: 8px;
}

.scan-icon {
  width: 20px;
  height: 20px;
  transition: transform 0.3s ease;
}

.scan-button:hover .scan-icon {
  transform: scale(1.1);
}

/* Spinner Animation */
.spinner {
  width: 16px;
  height: 16px;
  border: 2px solid rgba(255, 255, 255, 0.3);
  border-top: 2px solid white;
  border-radius: 50%;
  animation: spin 1s linear infinite;
}

@keyframes spin {
  0% { transform: rotate(0deg); }
  100% { transform: rotate(360deg); }
}

/* Status Message */
.status-message {
  background: #f8f9fa;
  border: 1px solid #e9ecef;
  border-radius: 10px;
  padding: 12px;
  margin: 12px 0;
  text-align: center;
  transition: all 0.3s ease;
}

.status-message.loading {
  background: linear-gradient(135deg, #e3f2fd 0%, #bbdefb 100%);
  border-color: #2196f3;
}

.status-message.error {
  background: linear-gradient(135deg, #ffebee 0%, #ffcdd2 100%);
  border-color: #f44336;
}

.status-icon {
  font-size: 24px;
  margin-bottom: 8px;
}

.status-message p {
  font-size: 14px;
  color: #666;
  line-height: 1.5;
}

/* Result Card */
.result-card {
  background: white;
  border-radius: 12px;
  padding: 16px;
  margin: 12px 0;
  box-shadow: 0 4px 16px rgba(0, 0, 0, 0.1);
  border: 1px solid #e9ecef;
  animation: slideIn 0.5s cubic-bezier(0.4, 0, 0.2, 1);
}

@keyframes slideIn {
  from {
    opacity: 0;
    transform: translateY(20px);
  }
  to {
    opacity: 1;
    transform: translateY(0);
  }
}

.result-header {
  display: flex;
  align-items: center;
  gap: 12px;
  margin-bottom: 16px;
}

.result-icon {
  width: 40px;
  height: 40px;
  border-radius: 50%;
  display: flex;
  align-items: center;
  justify-content: center;
  font-size: 20px;
  font-weight: bold;
  flex-shrink: 0;
  min-width: 40px;
  min-height: 40px;
}

.result-icon.real {
  background: linear-gradient(135deg, #4CAF50 0%, #45a049 100%);
  color: white;
}

.result-icon.fake {
  background: linear-gradient(135deg, #f44336 0%, #d32f2f 100%);
  color: white;
}

.result-icon.error {
  background: linear-gradient(135deg, #ff9800 0%, #f57c00 100%);
  color: white;
}

.result-title {
  font-size: 18px;
  font-weight: 600;
  color: #333;
}

.result-content {
  margin-top: 16px;
}

/* Confidence Bar */
.confidence-bar {
  width: 100%;
  height: 8px;
  background: #e9ecef;
  border-radius: 4px;
  overflow: hidden;
  margin-bottom: 12px;
}

.confidence-fill {
  height: 100%;
  border-radius: 4px;
  transition: width 1s cubic-bezier(0.4, 0, 0.2, 1);
}

.confidence-fill.real {
  background: linear-gradient(90deg, #4CAF50 0%, #45a049 100%);
}

.confidence-fill.fake {
  background: linear-gradient(90deg, #f44336 0%, #d32f2f 100%);
}

.confidence-text {
  font-size: 14px;
  color: #666;
  text-align: center;
  margin-bottom: 12px;
}

.result-details {
  background: #f8f9fa;
  border-radius: 8px;
  padding: 12px;
  font-size: 13px;
  color: #555;
  line-height: 1.4;
}

/* Instructions */
.instructions {
  background: #f8f9fa;
  border-radius: 10px;
  padding: 12px;
  margin-top: 12px;
}

.instructions h4 {
  font-size: 14px;
  font-weight: 600;
  color: #333;
  margin-bottom: 8px;
}

.instructions ul {
  list-style: none;
  padding: 0;
}

.instructions li {
  font-size: 12px;
  color: #666;
  margin-bottom: 4px;
  padding-left: 16px;
  position: relative;
}

.instructions li:before {
  content: "✓";
  position: absolute;
  left: 0;
  color: #4CAF50;
  font-weight: bold;
}

/* Branding */
.branding {
  text-align: center;
  margin-top: 16px;
  padding-top: 12px;
  border-top: 1px solid #e9ecef;
}

.branding-text {
  font-size: 11px;
  color: #666;
  margin: 0 0 8px 0;
  font-weight: 500;
  letter-spacing: 0.3px;
}

.aurigin-logo {
  height: 28px;
  opacity: 0.6;
  transition: opacity 0.2s ease;
}

.aurigin-logo:hover {
  opacity: 0.8;
}

/* Utility Classes */
.hidden {
  display: none !important;
}

/* Responsive adjustments */
@media (max-width: 400px) {
  body {
    width: 320px;
    height: 400px;
  }

  .content {
    padding: 16px 12px;
  }

  .header {
    padding: 16px 12px;
  }
}

/* Smooth transitions for all interactive elements */
* {
  transition: color 0.2s ease, background-color 0.2s ease, border-color 0.2s ease;
}

/* Focus styles for accessibility */
.scan-button:focus {
  outline: 2px solid rgb(0, 150, 255);
  outline-offset: 2px;
}

/* Custom scrollbar */
::-webkit-scrollbar {
  width: 6px;
}

::-webkit-scrollbar-track {
  background: #f1f1f1;
  border-radius: 3px;
}

::-webkit-scrollbar-thumb {
  background: #c1c1c1;
  border-radius: 3px;
}

::-webkit-scrollbar-thumb:hover {
  background: #a8a8a8;
}

UI explained:

  • The popup has a title, a button, a status text, and a result area. By default, the status prompt tells the user to click scan, and the result area is hidden.

  • The “Analyze Audio” button (id=”scanBtn”) will trigger the audio capture and analysis. We’ve also included a disabled state style (gray out) for when scanning is in progress.

  • The result text (id=”result”) will be shown after we get a response. We plan to apply the class .fake or .real to this element to color the text red or green depending on the outcome.

With the HTML and CSS set up, our popup will look something like:

Next, let’s implement the logic in popup.js to capture audio and call the API.

3. Capturing audio from the current tab

Chrome provides the chrome.tabCapture API for capturing the content of a tab, including its audio. This API can only be invoked after a user action (like clicking our extension) which is perfect since the user will click our “Scan” button to start. We requested the “tabCapture” permission in the manifest, so we can use this API.

In popup.js, first select the UI elements and setup a click handler:

// DOM Elements
const scanBtn = document.getElementById("scanBtn");
const statusText = document.getElementById("status");
const resultText = document.getElementById("result");
const buttonContent = document.querySelector(".button-content");
const buttonLoading = document.querySelector(".button-loading");

// UI State Management
function updateButtonState(isLoading) {
  if (isLoading) {
    scanBtn.disabled = true;
    buttonContent.classList.add("hidden");
    buttonLoading.classList.remove("hidden");
  } else {
    scanBtn.disabled = false;
    buttonContent.classList.remove("hidden");
    buttonLoading.classList.add("hidden");
  }
}

function updateStatus(message, type = "default") {
  if (message) {
    statusText.innerHTML = `
      <div class="status-icon">${getStatusIcon(type)}</div>
      <p>${message}</p>
    `;
    statusText.className = `status-message ${type}`;
    statusText.classList.remove("hidden");
  } else {
    statusText.classList.add("hidden");
  }
}

function getStatusIcon(type) {
  const icons = {
    default: "🎯",
    loading: "⏳",
    error: "⚠",
    success: "✅"
  };
  return icons[type] || icons.default;
}

function cleanupAudioResources() {
  // Stop and cleanup current stream
  if (currentStream) {
    currentStream.getTracks().forEach(track => track.stop());
    currentStream = null;
  }

  // Close audio context
  if (currentAudioContext && currentAudioContext.state !== 'closed') {
    currentAudioContext.close();
    currentAudioContext = null;
  }
}

// Main scan function
scanBtn.addEventListener("click", async () => {
  // Cleanup any existing resources first
  cleanupAudioResources();

  // Reset UI state
  updateButtonState(true);
  updateStatus("Capturing audio from tab...", "loading");

  try {
    // Capture audio from the active tab
    const stream = await new Promise((resolve, reject) => {
      chrome.tabCapture.capture({
        audio: true,
        video: false,
      }, (stream) => {
        if (chrome.runtime.lastError) {
          reject(new Error(chrome.runtime.lastError.message));
        } else if (!stream) {
          reject(new Error(
            "Could not capture tab audio. Make sure a tab is playing audio and try again."
          ));
        } else {
          resolve(stream);
        }
      });
    });

    // Store the stream for cleanup
    currentStream = stream;

    // Ensure the user can still hear the audio while we capture it
    const audioContext = new AudioContext();
    currentAudioContext = audioContext;
    const source = audioContext.createMediaStreamSource(stream);
    source.connect(audioContext.destination); // route audio back to output

    updateStatus("Recording 5 seconds of audio...", "loading");


  } catch (err) {
    console.error("Capture error:", err);
    showResult({
      type: "error",
      icon: "⚠",
      title: "Capture Failed - Please Try Again",
      confidence: 0,
      details: ""
    });
    updateStatus("Failed to capture audio. Please try again.", "error");
    // Cleanup audio resources
    cleanupAudioResources();
    updateButtonState(false);
  }
});

Explanation:

  • When the user clicks the Analyze Audio button, we disable the button (to prevent double-clicks during the scan) and update the status text to say “Capturing audio from tab…”.

  • We call chrome.tabCapture.capture({ audio: true, video: false }) to get a stream of the current tab’s audio. This returns a MediaStream of the tab’s audio output. We check if we got a stream; if not, we throw an error (maybe no audio is playing, or permission was denied).

Important: Capturing the tab’s audio will mute the tab’s audio output by default (Chrome stops playing it to the user once captured). To preserve a good UX, we immediately create an AudioContext, make a media source from the captured stream, and connect it to the context’s destination (the speakers). This way, the user continues hearing the audio while we’re recording it. It’s essentially wiring the captured audio back to playback:

// Ensure the user can still hear the audio while we capture it
    const audioContext = new AudioContext();
    currentAudioContext = audioContext;
    const source = audioContext.createMediaStreamSource(stream);
    source.connect(audioContext.destination); // route audio back to output

Now the user doesn’t notice any interruption in the audio.

Next, we’ll record a 5 second snippet of this audio stream for analysis.

4. Recording 5–10 Seconds of Audio

We’ll use the MediaStream Recording API (MediaRecorder) to record the audio stream. Chrome’s MediaRecorder will capture the audio in WebM/Opus format by default. Since the Aurigin API expects common formats like WAV, MP3, M4A, FLAC, or OGG , we will later convert the recording to a WAV file before uploading.

Add the following to popup.js inside the try block, after obtaining stream and setting up the AudioContext:


    // Record 5 seconds of audio from the stream using MediaRecorder
    const recorder = new MediaRecorder(stream);
    const chunks = [];

    recorder.ondataavailable = (e) => {
      if (e.data.size > 0) {
        chunks.push(e.data);
      }
    };

    recorder.onstop = async () => {
      try {
        updateStatus("Processing audio...", "loading");

        // Combine chunks into a single Blob
        const audioBlob = new Blob(chunks, { type: recorder.mimeType });
        console.log("Recorded audio blob:", audioBlob);

        // Convert the recorded audio to WAV format
        const wavBlob = await convertToWav(audioBlob);
        console.log("WAV audio blob:", wavBlob);

        updateStatus("Analyzing with AI...", "loading");

        // Send the WAV blob to the deepfake detection API
        const result = await sendToDeepfakeAPI(wavBlob);

        // Display the result
        showResult(result);
        updateStatus("", "success");

      } catch (error) {
        console.error("Processing error:", error);
        showResult({
          type: "error",
          icon: "⚠",
          title: "Analysis Failed - Please Try Again",
          confidence: 0,
          details: ""
        });
        updateStatus("Analysis failed. Please try again.", "error");
      } finally {
        // Cleanup audio resources
        cleanupAudioResources();
        updateButtonState(false);
      }
    };

    // Start recording and automatically stop after 5 seconds
    recorder.start();
    setTimeout(() => recorder.stop(), 5000);

Explanation:

  • We create a MediaRecorder for the captured stream. We don’t specify a MIME type here, letting Chrome use its default (audio/webm; codecs=opus).

  • We collect data chunks in an array as they become available (recorder.ondataavailable). When recording stops, we’ll have one or more Blob chunks of audio data.

  • In recorder.onstop, we combine all chunks into one Blob (our complete recorded audio). We log it for debugging.

  • We then call a function convertToWav(audioBlob) to convert the recorded audio to WAV format. We’ll implement this next.

  • After conversion, we call sendToDeepfakeAPI(wavBlob) to send the audio to Aurigin’s API (we’ll implement this soon as well).

  • Finally, once the API call is done, we re-enable the Scan button so the user can run another scan if desired.

  • We start the recorder and use setTimeout to stop it after 5000 ms (5 seconds). This captures a 5-second audio snippet. You can adjust this duration if needed (minimum ~3s for the API, but 10s gives two chunks of data).

  • We wrap everything in a try/catch to handle any errors (e.g. if capturing fails). In case of error, we log it and show a message on the status text, then re-enable the button.

At this point, when you click “Analyze Audio”, the extension will capture the tab’s audio for 5 seconds. Next, we need to implement the audio conversion and API call functions we referenced: convertToWav and sendToDeepfakeAPI.

5. Converting the Recorded Audio to WAV Format

Chrome’s MediaRecorder gave us audio in a compressed format (WebM/Opus). The Aurigin API supports several formats, including WAV, MP3, M4A, FLAC, and OGG – but not WebM explicitly. To ensure compatibility, we’ll convert our recorded audio Blob to a WAV file (PCM 16-bit). WAV is uncompressed, but since we’re only dealing with a few seconds of audio, the file size is manageable.

We can convert the audio in the browser by decoding the recorded blob to raw audio samples and then constructing a WAV file header. An easy way to decode is to leverage the browser’s audio capabilities: create an AudioContext and use decodeAudioData on the Blob’s array buffer. This gives us an AudioBuffer containing the PCM samples. Then we write those samples into a WAV file binary structure.

Add the following helper function to popup.js (outside of the event listener):

// Converts an audio Blob into a WAV format Blob
async function convertToWav(audioBlob) {
  // Decode audio data to get raw PCM samples
  const arrayBuffer = await audioBlob.arrayBuffer();
  const audioCtx = new AudioContext();
  const audioBuffer = await audioCtx.decodeAudioData(arrayBuffer);

  const numChannels = audioBuffer.numberOfChannels;
  const sampleRate = audioBuffer.sampleRate;
  const numFrames = audioBuffer.length;
  const bitsPerSample = 16;

  // WAV file header specs
  const headerSize = 44;
  const dataSize = numFrames * numChannels * (bitsPerSample / 8);
  const buffer = new ArrayBuffer(headerSize + dataSize);
  const view = new DataView(buffer);

  // Helper to write string into DataView
  function writeString(offset, str) {
    for (let i = 0; i < str.length; i++) {
      view.setUint8(offset + i, str.charCodeAt(i));
    }
  }

  // RIFF header
  writeString(0, "RIFF");
  view.setUint32(4, 36 + dataSize, true); // file size - 8
  writeString(8, "WAVE");
  // fmt chunk
  writeString(12, "fmt ");
  view.setUint32(16, 16, true); // PCM chunk size
  view.setUint16(20, 1, true); // format: 1 = PCM
  view.setUint16(22, numChannels, true);
  view.setUint32(24, sampleRate, true);
  view.setUint32(28, sampleRate * numChannels * (bitsPerSample / 8), true); // byte rate
  view.setUint16(32, numChannels * (bitsPerSample / 8), true); // block align
  view.setUint16(34, bitsPerSample, true);
  // data chunk
  writeString(36, "data");
  view.setUint32(40, dataSize, true);

  // Write interleaved PCM samples
  let offset = 44;
  for (let i = 0; i < numFrames; i++) {
    for (let ch = 0; ch < numChannels; ch++) {
      let sample = audioBuffer.getChannelData(ch)[i];
      // Convert from [-1,1] range to 16-bit integer
      sample = Math.max(-1, Math.min(1, sample));
      view.setInt16(
        offset,
        sample < 0 ? sample * 0x8000 : sample * 0x7fff,
        true
      );
      offset += 2;
    }
  }

  // Create a Blob with WAV MIME type
  return new Blob([buffer], { type: "audio/wav" });
}

This function might look intense, so let’s break it down:

  • We read the audioBlob into an ArrayBuffer and decode it using AudioContext.decodeAudioData. This gives us an AudioBuffer containing raw PCM data (floating-point samples).

  • We gather audio metadata: number of channels, sample rate, number of audio frames (samples per channel), and set our target bit depth to 16 bits.

  • We allocate an ArrayBuffer for the WAV file: 44 bytes for the header plus data for all the samples (dataSize). The DataView view will let us insert multi-byte values into the buffer.

  • We define writeString to easily write ASCII strings (like “RIFF”, “WAVE”, etc.) into the buffer.

  • We then construct the WAV header:

The RIFF chunk descriptor (“RIFF”, file size, “WAVE”).
The fmt subchunk (audio format metadata). We set PCM format (1), number of channels, sample rate, byte rate, block align, and bits per sample.
The data subchunk header with the data size.

  • After the 44-byte header, we interleave the audio samples from the AudioBuffer. We loop through each frame (i) and each channel (ch), retrieve the sample, clamp it to [-1,1], and scale to 16-bit integer range. We use view.setInt16 to write the little-endian 16-bit sample value.

  • Finally, we wrap the buffer in a Blob with type ‘audio/wav’. This Blob now represents a WAV audio file containing our recorded snippet.

Now we have a WAV blob ready to send. Let’s implement the API call.

6. Sending the Audio to Aurigin’s Deepfake Detection API

Aurigin.ai provides a REST API endpoint POST /predict to analyze audio and determine if it’s AI-generated or real. We will use the direct file upload method: send our audio file as multipart/form-data under the file field, along with our API key in the header. The API will respond with a JSON object containing predictions per 5-second chunk and confidence scores.

Link to the documentation

Sign up on Aurigin.ai to obtain your free API key. For this tutorial, replace “YOUR_API_KEY_HERE” with your actual key.

Aurigin.ai website

Add the sendToDeepfakeAPI function in popup.js:

async function sendToDeepfakeAPI(wavBlob) {
  // Prepare form data
  const formData = new FormData();
  formData.append("file", wavBlob, "audio.wav");

  try {
    const response = await fetch("https://aurigin.ai/api-ext/predict", {
      method: "POST",
      headers: {
        "x-api-key": "YOUR_API_KEY_HERE",
      },
      body: formData,
    });

    if (!response.ok) {
      throw new Error(
        `API request failed: ${response.status} ${response.statusText}`
      );
    }

    const result = await response.json();
    console.log("API result:", result);

    return handleApiResult(result);

  } catch (err) {
    console.error("API request error:", err);
    throw new Error(`Failed to analyze audio: ${err.message}`);
  }
}

Explanation:

  • We create a FormData object and append our WAV Blob to it under the field name “file”. (The API expects a form field named file containing the audio. We also give it a filename “audio.wav” which isn’t strictly necessary but good practice.)

  • We use fetch to POST to https://aurigin.ai/api-ext/predict. We include the header x-api-key with our API key. The content type will be set automatically by the browser for the multipart/form-data boundary.

  • We check for response.ok (HTTP 200) and throw an error if the status is not OK.

  • If successful, we parse the JSON response. The response will look like this (example from docs):

{
  "error": [null],
  "global_probability": [0.9584],
  "predictions": ["fake"]
}

We call handleApiResult(result) to interpret this data and update our UI (next step). If any network error or exception occurs, we catch it, log it, and show an error in the status text.

7. Displaying the Results to the User

Finally, we need to take the result from the API and show a meaningful message to the user. The API provides an array of predictions for each chunk (“fake” or “real”) and corresponding global_probability scores (0.0 to 1.0) for each that needs to be converted in confidence scores. We should convey whether the audio is likely AI-generated or not, and how confident the model is.

Add the handleApiResult function to popup.js:

function handleApiResult(result) {
  // Check for errors in the response
  if (result.error && result.error.some(e => e !== null)) {
    return {
      type: "error",
      icon: "⚠",
      title: "Analysis Error - Please Try Again",
      confidence: 0,
      details: ""
    };
  }

  // Check if we have valid predictions
  if (!result.predictions || !result.global_probability) {
    return {
      type: "error",
      icon: "⚠",
      title: "Invalid Response - Please Try Again",
      confidence: 0,
      details: ""
    };
  }

  const predictions = result.predictions;
  const probabilities = result.global_probability;

  // Calculate overall confidence and determine result
  const validProbabilities = probabilities.filter(p => p !== null && !isNaN(p));
  const avgProbability = validProbabilities.length > 0 
    ? validProbabilities.reduce((sum, p) => sum + p, 0) / validProbabilities.length 
    : 0.5;

  // Determine if any segment is fake
  const anyFake = predictions.some(p => p === 'fake');

  let confidencePercent;

  if (anyFake) {
    // For fake: probability between 0.5-1, closer to 1 = higher confidence
    // Convert 0.5-1 range to 0-100% confidence
    confidencePercent = Math.round((avgProbability - 0.5) * 200);
  } else {
    // For real: probability between 0-0.5, closer to 0 = higher confidence
    // Convert 0-0.5 range to 0-100% confidence (inverted)
    confidencePercent = Math.round((0.5 - avgProbability) * 200);
  }

  // Ensure confidence is between 0 and 100
  confidencePercent = Math.max(0, Math.min(100, confidencePercent));

  if (anyFake) {
    return {
      type: "fake",
      icon: "🚨",
      title: "This voice is likely AI generated",
      confidence: confidencePercent,
      details: ""
    };
  } else {
    return {
      type: "real",
      icon: "✅",
      title: "This voice is likely Human",
      confidence: confidencePercent,
      details: ""
    };
  }
}

Explanation:

  • We first check if the result.error array contains any non-null entries (meaning a chunk failed to process). If so, we inform the user there was an error for some part of the audio.

  • We extract the predictions and global_probability arrays. For each 5-second segment of audio, predictions[i] will be “fake” or “real”, and global_probability[i] will be a score between 0.0 and 1.0 for that prediction. 0 for real with 100% confidence and 1 for fake with 100% confidence. 0.5 being 0% confidence.

  • We decide an overall verdict: if any chunk was classified as fake, we treat the overall audio as potentially AI-generated. (This is a cautious approach even a short fake segment means the audio contains deepfake content.)

  • Depending on anyFake, we set the verdictText to either an alert about AI-generated audio or a message that it’s likely real. We also assign a CSS class to the resultText element: .fake if any fake, or .real if all were real.

  • Then we append some details, iterating over each segment’s prediction. For each segment, we label it as “AI-generated” or “human” and include the confidence (converted to percentage with one decimal). For example: “Segment 1: AI-generated (confidence 95.8%)”.

  • We update the UI: clear the status text (no longer “Analyzing…”), set the result text content to our verdict and details, and remove the hidden class to make the result visible.

With handleApiResult in place, our popup.js is complete. Here’s a quick recap of popup.js structure:

// Element selectors and event listener on Scan button
// ... (see above)

// chrome.tabCapture.capture to get audio stream
// ... (see above)

// MediaRecorder to record 10s, onstop -> convertToWav -> sendToDeepfakeAPI
// ... (see above)

// convertToWav function
// ... (see above)

// sendToDeepfakeAPI function
// ... (see above)

// handleApiResult function
// ... (see above)

Now, load the extension in Chrome for testing:

  • Go to chrome://extensions in your browser, enable “Developer mode”, and click “Load unpacked”. Select the extension project folder (audio-deepfake-detector). The extension should load.

  • Ensure a tab is playing audio (e.g., a YouTube video or any audio source). Click the extension’s icon to open the popup and then click “Scan Audio”.

  • The button will disable and you’ll see “Listening to tab audio…” for 5 seconds. During this time, the audio should continue playing (we routed it back to the output).

  • After recording, it will show “Analyzing audio…” while uploading to the API. In a couple of seconds, you should get a result displayed.

Congratulations, you’ve built a working audio deepfake detector! 🎉

Conclusion

In this tutorial, we created a Chrome extension that captures live audio from a browser tab and uses Aurigin.ai‘s cloud AI service to detect deepfake audio in real time. We covered how to use Chrome’s Tab Capture API to get tab audio, record a snippet with MediaRecorder, convert it to a suitable format, and call Aurigin’s Deepfake Detection API.

By following this tutorial, you now have a solid foundation for building browser extensions that interact with media streams and external AI services. With just a few seconds of audio, we can get an answer about authenticity – an impressive feat illustrating the power of combining web technology with AI services. Happy coding, and stay safe from deepfakes!

References:

  • Chrome Developers: chrome.tabCapture API Reference – explains capturing tab media and audio routing.

  • Aurigin.ai Documentation: Analyze audio (Predict) – details on the /predict endpoint, request format and example response. Aurigin’s API processes audio in 5-second chunks and returns one prediction per chunk.

  • Aurigin.ai Website: Deepfake Detection API – notes the service’s 98%+ accuracy and real-time response capabilities.


This content originally appeared on DEV Community and was authored by aurigin