How LLMs Transform Language into Vectors: The Power of Embeddings



This content originally appeared on DEV Community and was authored by Cristian Sifuentes

How LLMs Transform Language into Vectors: The Power of Embeddings

How LLMs Transform Language into Vectors: The Power of Embeddings

If you’ve ever typed a prompt into ChatGPT and wondered “How on earth does this AI understand me?” — the secret lies in vectors and embeddings.

These mathematical tools allow large language models (LLMs) to interpret, compare, and generate text with astonishing accuracy. In this article, we’ll explore these concepts in depth — but in a way that blends math, real-world analogies, and some Python so you can see it in action.

What Are Vectors in the Context of LLMs?

A vector is simply a list of numbers that places something inside a mathematical space. In school, you may have plotted points using X and Y coordinates. Add Z, and you move into 3D space.

LLMs take this idea further — way further. Instead of 2 or 3 dimensions, they use hundreds or thousands. Each dimension represents some learned feature of meaning.

For example, a vector might look like this:

[0.23, -0.87, 1.12, 0.56, ...]

These numbers aren’t random — they encode relationships, context, and meaning. Words with similar meanings live close together in this vector space.

What Are Embeddings?

Think of embeddings as the “soul” of a word in an AI model. They’re n-dimensional vectors that describe words (or tokens) across many hidden dimensions, such as:

  • Semantic meaning (Is it a fruit? Is it an emotion?)
  • Contextual usage (Formal? Informal?)
  • Relationships (Synonyms, antonyms, related concepts)
  • Emotional tone (Positive? Neutral?)

Real-World Analogy:

Imagine reviewing a restaurant in an app. Your rating might consider:

  1. Quality of the food
  2. Ambience
  3. Service
  4. Freshness
  5. Music

Each of these is a dimension. An embedding works the same way, but instead of restaurants, it rates concepts, words, and sentences across many hidden dimensions.

Embeddings and Multilingual Magic

Embeddings allow LLMs to connect words across languages without direct translation.

For example:

  • “Aguacate” (Spanish)
  • “Avocado” (English)

Even though the words are different, their embeddings might share dimensions like fruit, green, creamy, edible. This lets models match meaning, not spelling.

Why Every Word in Your Prompt Matters

When you write a prompt, every single word changes the vector path the model takes. Swap one synonym for another, and you could land in a slightly different part of the semantic map — producing a different answer.

It’s like giving directions: “Go left at the café” vs. “Go left at the coffee shop”. Same idea, but the model’s route might shift.

Implementing Embeddings in Python

Let’s get hands-on using Hugging Face Transformers.

pip install transformers torch scikit-learn
from transformers import AutoTokenizer, AutoModel
import torch
from sklearn.metrics.pairwise import cosine_similarity

# Load model & tokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Function to get sentence embeddings
def get_embedding(text):
    tokens = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**tokens)
        return outputs.last_hidden_state.mean(dim=1)

# Compare two sentences
emb1 = get_embedding("Artificial intelligence is fascinating")
emb2 = get_embedding("AI is amazing")

similarity = cosine_similarity(emb1, emb2)
print(f"Cosine Similarity: {similarity[0][0]:.4f}")

Cosine Similarity close to 1 means they’re very similar in meaning; close to 0 means unrelated.

Real-World Applications of Embeddings

  • Semantic Search: Find relevant results based on meaning, not keywords.
  • Machine Translation: Align words across languages.
  • Chatbots: Understand varied phrasing for the same intent.
  • Recommendation Systems: Suggest similar items based on conceptual closeness.

Key Takeaways

  • Vectors are mathematical representations of concepts.
  • Embeddings are learned vectors that capture meaning and relationships.
  • LLMs use embeddings to navigate complex, high-dimensional semantic spaces.
  • Your choice of words directly impacts the embeddings — and the output.

💡 Understanding embeddings isn’t just an academic exercise.

If you’re building AI-powered tools — from semantic search to multilingual assistants — knowing how vectors and embeddings work will help you fine-tune models, improve accuracy, and craft better prompts.

✍ Written by: Cristian Sifuentes – Full-stack dev crafting scalable apps with [NET – Azure], [Angular – React], Git, SQL & AI integrations. Dark mode, clean code, and atomic commits enthusiast.


This content originally appeared on DEV Community and was authored by Cristian Sifuentes