Loading One-to-Many relationships efficiently with Spring Data JPA and Postgres – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Mircea Cadariu

Intro

One-to-Many, or parent-child relationships are a common occurrence in application development. Off the cuff, we can think of numerous instances of this, like actual parents and their children indeed, a sports team with its players, etc. A natural solution to store this is to use a relational database and employ foreign key constraints to enforce data integrity.

This post features the following use-case – loading the list of parents, and all their corresponding children (in one go). We’ll look at how to do it efficiently when using Spring Data JPA and Postgres. We’ll start with the slowest approach (hello, N+1!) and show how to make it faster through successive refinements. The code is available in this repo.

Authors and their books

Let’s use a familiar scenario: authors and their books. Classic one-to-many relationship. These are the tables.

create table authors (
    id             bigint primary key generated always as identity,
    name           varchar(255) not null,
    bio            text
);

create table books (
    id             bigint primary key generated always as identity,
    title          varchar(255) not null,
    isbn           varchar(13),
    published_year integer,
    author_id      bigint not null,
    constraint fk_books_author foreign key (author_id) references authors(id)
);

create index idx_books_author_id on books(author_id);

Populating the tables

Let’s insert some data to work with. We’ll generate 1000 authors, and every author wrote 30 books each.

insert into authors (name, bio)
select
    'Author ' || seq,
    'Bio for author ' || seq
from generate_series(1, 1000) seq;

insert into books (author_id, title, isbn, published_year)
select
    a.id as author_id,
    'Book ' || gs as title,
    random()::bigint::text as isbn,
    (2000 + FLOOR(random() * 25))::int as published_year
from authors a, generate_series(1, 30) as gs;

Entities

We’ll create two entity classes, Author and Book. In order to learn how to map them correctly, read this post.

In the Book class, we’ll reference Author like this.

 @ManyToOne(fetch = LAZY)
 @JoinColumn(name = "author_id")
 private Author author;

Accordingly, in the Author class:

@OneToMany(mappedBy = "author", fetch = LAZY)
private List<Book> books = new ArrayList<>();

At this point, we have the schema, the test data and the entities. So far so good! We’re ready to retrieve our data.

Querying

As always, we want to keep a close eye on the queries Hibernate is generating for us under the hood:

@DynamicPropertySource 
static void registerPgProperties(DynamicPropertyRegistry registry) {
  registry.add("spring.jpa.show_sql", () -> true);
}

Iteration 1

We’ll start with a pure Java approach, which looks quite elegant actually. Have a look.

return authorRepository
            .findAll()
            .stream()
            .map(author -> {
               List<Book> books = author.getBooks();
                 return new AuthorWithBooksDto(
                       author.getId(),
                       author.getName(),
                       author.getBio(),
                       books.stream()
                           .map(book -> new BookDto(
                                  book.getTitle(),
                                  book.getIsbn(),
                                  book.getPublishedYear()))
                           .collect(toList())
                    );
                })
                .collect(toList());

When running it, we notice our console filling up with queries. You’ve just witnessed the N+1 problem. You want to avoid this if you want a fast application.

Iteration 2

Alright, let’s make this better. This is what we’ll add in the repository class:

 @Query("SELECT a FROM Author a JOIN FETCH a.books")
 List<Author> findAllWithBooks();

Great stuff! Turns out, this cuts the time to approximately half. Hibernate generates one query only. You should always try to load all the data you need with a single query. This is it:

select a1_0.id,a1_0.bio,b1_0.author_id,b1_0.id,b1_0.isbn,b1_0.published_year,b1_0.title,a1_0.name from authors a1_0 join books b1_0 on a1_0.id=b1_0.author_id

Let’s have a look at the explain plan to learn how the database retrieved our data.

 Hash Join  (cost=31.50..631.58 rows=30000 width=65) (actual time=0.476..8.912 rows=30000 loops=1)
   Hash Cond: (b1_0.author_id = a1_0.id)
   Buffers: shared hit=230
   ->  Seq Scan on books b1_0  (cost=0.00..521.00 rows=30000 width=29) (actual time=0.009..2.466 rows=30000 loops=1)
         Buffers: shared hit=221
   ->  Hash  (cost=19.00..19.00 rows=1000 width=36) (actual time=0.431..0.432 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 77kB
         Buffers: shared hit=9
         ->  Seq Scan on authors a1_0  (cost=0.00..19.00 rows=1000 width=36) (actual time=0.010..0.169 rows=1000 loops=1)
               Buffers: shared hit=9

It joined two tables, authors and books, using the hash join approach. Note though the rows=30000 on the first line of the explain plan. This shows that we retrieved 30000 rows from the database. Let’s have a look also at the shape of these rows. Below are the first 10 rows of the result set.

Because of the join, we are fetching to our application code a result set that’s larger than necessary, leading to waste.

Iteration 3

Let’s construct the expected shape of the response fully database-side, using Postgres features. We’ll have to write a native query.

@Query(value = """
            SELECT
                a.id,
                a.name,
                a.bio,
                jsonb_agg(
                    jsonb_build_object(
                        'id', b.id,
                        'title', b.title,
                        'isbn', b.isbn,
                        'publishedYear', b.published_year
                    )
                ) AS books
            FROM authors a
            JOIN books b ON b.author_id = a.id
            GROUP BY a.id;
        """, nativeQuery = true)
    List<Object[]> findAllWithBooksAsJson();

This is how the result looks like:

 652 | Author 652  | Bio for author 652  | [{"id": 652, "isbn": "0", "title": "Book 1 of author 652", "publishedYear": 2013}, {"id": 1652, "isbn": "1", "title": "Book 2 of author 652", "publishedYear": 2010}, {"id": 2652, "isbn": "1", "title": "Book 3 of author 652", "publishedYear": 2000}, {"id": 3652, "isbn": "0", "title": "Book 4 of author 652", "publishedYear": 2002}, {"id": 4652, "isbn": "1", "title": "Book 5 of author 652", "publishedYear": 2019}, {"id": 5652, "isbn": "1", "title": "Book 6 of author 652", "publishedYear": 2006}, {"id": 6652, "isbn": "1", "title": "Book 7 of author 652", "publishedYear": 2020}, {"id": 7652, "isbn": "1", "title": "Book 8 of author 652", "publishedYear": 2004}, {"id": 8652, "isbn": "1", "title": "Book 9 of author 652", "publishedYear": 2010}, {"id": 9652, "isbn": "1", "title": "Book 10 of author 652", "publishedYear": 2022}, {"id": 10652, "isbn": "1", "title": "Book 11 of author 652", "publishedYear": 2001}, {"id": 11652, "isbn": "1", "title": "Book 12 of author 652", "publishedYear": 2010}, {"id": 12652, "isbn": "1", "title": "Book 13 of author 652", "publishedYear": 2024}, {"id": 13652, "isbn": "1", "title": "Book 14 of author 652", "publishedYear": 2021}, {"id": 14652, "isbn": "0", "title": "Book 15 of author 652", "publishedYear": 2004}, {"id": 15652, "isbn": "1", "title": "Book 16 of author 652", "publishedYear": 2001}, {"id": 16652, "isbn": "1", "title": "Book 17 of author 652", "publishedYear": 2001}, {"id": 17652, "isbn": "0", "title": "Book 18 of author 652", "publishedYear": 2020}, {"id": 18652, "isbn": "0", "title": "Book 19 of author 652", "publishedYear": 2009}, {"id": 19652, "isbn": "1", "title": "Book 20 of author 652", "publishedYear": 2000}, {"id": 20652, "isbn": "1", "title": "Book 21 of author 652", "publishedYear": 2000}, {"id": 21652, "isbn": "1", "title": "Book 22 of author 652", "publishedYear": 2013}, {"id": 22652, "isbn": "1", "title": "Book 23 of author 652", "publishedYear": 2012}, {"id": 23652, "isbn": "0", "title": "Book 24 of author 652", "publishedYear": 2014}, {"id": 24652, "isbn": "0", "title": "Book 25 of author 652", "publishedYear": 2001}, {"id": 25652, "isbn": "1", "title": "Book 26 of author 652", "publishedYear": 2016}, {"id": 26652, "isbn": "1", "title": "Book 27 of author 652", "publishedYear": 2014}, {"id": 27652, "isbn": "0", "title": "Book 28 of author 652", "publishedYear": 2024}, {"id": 28652, "isbn": "0", "title": "Book 29 of author 652", "publishedYear": 2016}, {"id": 29652, "isbn": "0", "title": "Book 30 of author 652", "publishedYear": 2012}]
 273 | Author 273  | Bio for author 273  | [{"id": 273, "isbn": "1", "title": "Book 1 of author 273", "publishedYear": 2007}, {"id": 1273, "isbn": "1", "title": "Book 2 of author 273", "publishedYear": 2010}, {"id": 2273, "isbn": "1", "title": "Book 3 of author 273", "publishedYear": 2023}, {"id": 3273, "isbn": "0", "title": "Book 4 of author 273", "publishedYear": 2010}, {"id": 4273, "isbn": "1", "title": "Book 5 of author 273", "publishedYear": 2020}, {"id": 5273, "isbn": "0", "title": "Book 6 of author 273", "publishedYear": 2013}, {"id": 6273, "isbn": "0", "title": "Book 7 of author 273", "publishedYear": 2008}, {"id": 7273, "isbn": "1", "title": "Book 8 of author 273", "publishedYear": 2012}, {"id": 8273, "isbn": "1", "title": "Book 9 of author 273", "publishedYear": 2001}, {"id": 9273, "isbn": "0", "title": "Book 10 of author 273", "publishedYear": 2011}, {"id": 10273, "isbn": "0", "title": "Book 11 of author 273", "publishedYear": 2005}, {"id": 11273, "isbn": "1", "title": "Book 12 of author 273", "publishedYear": 2012}, {"id": 12273, "isbn": "1", "title": "Book 13 of author 273", "publishedYear": 2010}, {"id": 13273, "isbn": "1", "title": "Book 14 of author 273", "publishedYear": 2013}, {"id": 14273, "isbn": "1", "title": "Book 15 of author 273", "publishedYear": 2019}, {"id": 15273, "isbn": "1", "title": "Book 16 of author 273", "publishedYear": 2004}, {"id": 16273, "isbn": "1", "title": "Book 17 of author 273", "publishedYear": 2022}, {"id": 17273, "isbn": "0", "title": "Book 18 of author 273", "publishedYear": 2021}, {"id": 18273, "isbn": "0", "title": "Book 19 of author 273", "publishedYear": 2004}, {"id": 19273, "isbn": "1", "title": "Book 20 of author 273", "publishedYear": 2022}, {"id": 20273, "isbn": "1", "title": "Book 21 of author 273", "publishedYear": 2021}, {"id": 21273, "isbn": "1", "title": "Book 22 of author 273", "publishedYear": 2016}, {"id": 22273, "isbn": "1", "title": "Book 23 of author 273", "publishedYear": 2002}, {"id": 23273, "isbn": "0", "title": "Book 24 of author 273", "publishedYear": 2015}, {"id": 24273, "isbn": "1", "title": "Book 25 of author 273", "publishedYear": 2010}, {"id": 25273, "isbn": "1", "title": "Book 26 of author 273", "publishedYear": 2021}, {"id": 26273, "isbn": "1", "title": "Book 27 of author 273", "publishedYear": 2016}, {"id": 27273, "isbn": "0", "title": "Book 28 of author 273", "publishedYear": 2017}, {"id": 28273, "isbn": "0", "title": "Book 29 of author 273", "publishedYear": 2024}, {"id": 29273, "isbn": "1", "title": "Book 30 of author 273", "publishedYear": 2007}]
...

Let’s have a look at the explain plan as well.

 HashAggregate  (cost=856.58..869.08 rows=1000 width=68) (actual time=68.398..74.411 rows=1000 loops=1)
   Group Key: a.id
   Batches: 1  Memory Usage: 23361kB
   Buffers: shared hit=230
   ->  Hash Join  (cost=31.50..631.58 rows=30000 width=57) (actual time=0.385..7.345 rows=30000 loops=1)
         Hash Cond: (b.author_id = a.id)
         Buffers: shared hit=230
         ->  Seq Scan on books b  (cost=0.00..521.00 rows=30000 width=29) (actual time=0.012..1.821 rows=30000 loops=1)
               Buffers: shared hit=221
         ->  Hash  (cost=19.00..19.00 rows=1000 width=36) (actual time=0.356..0.357 rows=1000 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 77kB
               Buffers: shared hit=9
               ->  Seq Scan on authors a  (cost=0.00..19.00 rows=1000 width=36) (actual time=0.005..0.109 rows=1000 loops=1)
                     Buffers: shared hit=9

The same hash join you saw before, plus the operations needed to provide the in-line JSON list of books for every author. We’re returning now only 1000 rows, 30 times less than before. On my laptop, it turned out just a little bit faster than Option 2 above. However, in a typical 3 tier architecture, where the database is separated from the application by a network, this highly reduced result set will make all the difference. Less data needed to be transported over the network for every user request.

Conclusion

We have seen three ways to retrieve the data we needed and looked closely at why one approach is more performant than the other. Option 3 is the fastest but also the least portable, because it is based on a native query which leverages Postgres specific functionality.

I hope you enjoyed reading it, and potentially even applied it in order to make your application faster.

Thanks for reading! Until next time!

This content originally appeared on DEV Community and was authored by Mircea Cadariu