The problem
Let’s say you have a simple blog system. It’s written in Elixir, uses
Ecto (v3.x) and you have the following schemas: Post
, Author
,
Comment
, Tag
and Tagging
.
- Each Post has one Author.
- Each Post has many Commnents.
- Each Comment has one Author.
- Each Post has many Tags, through join table Tagging.
In multiple places in your application, you will have to list the posts. First, on the blog index page, you will likely want to see all of the most recent posts. Similar in RSS feed. In the blog sidebar, you may want to display some subset of these.
Then you may want to have Author page where a short bio of author appears, and a list of posts they wrote. You may also need to implement search, filtering by tag, pagination etc.
In all of these use cases you are likely to re-use significant portion of the same code, so at some point you might want to de-duplicate your code base and provide some sort of generic Posts finder, that allows you to preload some associations if you know you’ll need them, but also makes it possible to filter - by say specific Tag.
You obviously want to fetch the data and filter it as efficiently as possible. How do you achieve this with Elixir & Ecto?
If you ware using Ecto, you may use Ecto query composition with it’s functional API.
Where Ecto really shines is, however, it’s Ecto.Query domain-specific language (DSL).
The DSL is not only arguably nicer to use than chaining functions, but also is more powerful if you know how to use it correctly, allowing to make complicated joins, filters and preloads.
Our desired API
We want to have a single function, which takes two optional maps as parameters: the first map will tell us which associations to preload, the second one will specify optional filters.
In either case, our top-level function will return list of Posts
.
We want something like this for preloading:
# return list of posts with no filtering and no
# associations preloaded:
list_posts()
# return list of posts with Author association preloaded
list_posts(%{author: true})
# return list of posts with Comment and it's Author preloaded:
list_posts(%{comment: %{author: true}})
# return list of posts with everything related to it preloaded:
list_posts(%{author: true, comment: %{author: true}, tags: true})
And then we want to optionally apply some filters, let’s say to filter list of comments with author by given tag, you’d do:
# return list of posts with Comment and it's Author preloaded and
# filtered by tag "elixir":
list_posts(%{comment: %{author: true}}, %{tag: "elixir"})
And if you want to filter by tag and Post author, you would go with:
list_posts(%{comment: %{author: true}}, %{tag: "elixir", author: "Author 1"})
The API is simple and flexible enough for us to re-use in multiple places in our code base, but how would one go about implementing it?
tl;dr just give me the code
You can find the example to this blog post on GitHub with module implementing all the preloading and filtering API defined here and relevant tests here
Ecto.Query DSL compositions
You can compose Ecto.Query queries before executing them. In all of the examples below we will use the pattern, where we first build a simple query, append additional conditions, joins and preloads, and then execute it whenever it’s ready.
Preloading Ecto associations with SQL JOINs
We can start with a simple query which loads up all the Posts:
base_query = from(posts in Post)
Then, we expand the query with a JOIN, and use the data it returns to
populate Post’s author
association:
base_query = from(posts in Post)
query_with_author_preloaded = from(posts in base_query,
left_join: author in Author,
on: author.id == posts.author_id,
preload: [author: author]
)
If you then execute this code, you will get the list of Posts with their Authors already loaded, but importantly you will perform that operation using single SQL command behind the scenes:
Repo.all(query_with_author_preloaded)
# 12:42:53.206 [debug] QUERY OK source="posts" db=1.1ms decode=2.2ms queue=1.1ms
# SELECT p0."id", p0."title", p0."body", p0."author_id", a1."id", a1."name" FROM "posts" AS p0 LEFT OUTER JOIN "authors" AS a1 ON a1."id" = p0."author_id" []
[
%Post{
__meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
author: %Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
id: 309,
name: "Author 1",
posts: #Ecto.Association.NotLoaded<association :posts is not loaded>
},
author_id: 309,
body: "Post body 1",
comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
id: 305,
taggings: #Ecto.Association.NotLoaded<association :taggings is not loaded>,
tags: #Ecto.Association.NotLoaded<association :tags is not loaded>,
title: "Post 1"
},
%Post{
__meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
author: %Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
id: 310,
name: "Author 2",
posts: #Ecto.Association.NotLoaded<association :posts is not loaded>
},
author_id: 310,
body: "Post body 2",
comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
id: 306,
taggings: #Ecto.Association.NotLoaded<association :taggings is not loaded>,
tags: #Ecto.Association.NotLoaded<association :tags is not loaded>,
title: "Post 2"
}
]
Nice, we got the list of Posts with authors preloaded. Let’s think on
how to build our original list_posts/2
function:
def list_posts(params \\ %{}, filters \\ %{}) do
build_query(params, filters)
|> Repo.all()
end
defp build_query(params, _filters) do
query = base_query()
query
|> maybe_preload_author(params[:author])
end
defp maybe_preload_author(query, nil), do: query
defp maybe_preload_author(query, _) do
from(posts in query,
left_join: author in Author,
on: author.id == posts.author_id,
preload: [author: author]
)
end
What we are doing here is always composing our end result query of
output of base_query/0
function, and optionally append join and
preload clause to fetch author of post for us in one go.
We achieve conditional join and preload, by pattern match in
maybe_preload_author/2
function. Only if params[:author]
evaluates
to true, we preload author in one go.
We want to use the same pattern many times, so our function can grow in number of preloads and filters, and finally provide desired functionality:
defp build_query(params, filters) do
query = base_query()
query
|> maybe_preload_author(params[:author])
# uncomment below as you implement the rest:
# |> maybe_preload_comments(params[:comments])
# |> maybe_preload_tags(params[:tags])
# |> maybe_filter_by_tag(filters[:tag])
end
Preloading Ecto associations with SQL JOINs
Preloading Ecto associations with single SQL command, as tempting as it may sound, is not always a good idea.
Join-preloading something like comments is a really bad idea - if your post has 100 comments this means the body of the post will be sent 100 times over the wire. Joins create a huge n*m table and send it to the application. Ecto dedups the rows, so you don’t see it, but it’s still extremely wasteful - especially with something like posts and comments that are very data-intesive records. That’s the primary reason why ecto does separate queries for preloads
Michał Muskała (@michalmuskala) April 16, 2019
Bummer. We will have to use different strategy here, and execute second SQL query to preload our comments. Luckily, the resulting code is only slightly different from preloading using JOINs:
defp build_query(params, filters) do
query = base_query()
query
|> maybe_preload_author(params[:author])
|> maybe_preload_comments(params[:comments]) # add this line
end
...
defp maybe_preload_comments(query, nil), do: query
defp maybe_preload_comments(query, true), do: maybe_preload_comments(query, %{})
defp maybe_preload_comments(query, params) do
comments_query = from(comment in Comment)
from(posts in query, preload: [comments: ^comments_query])
end
With the above code we can now preload Posts and it’s Authors in one SQL command, and Ecto will load up Comments using separate SQL command as it will be more efficient in this case:
# return list of Posts with Comment and Posts's Author
list_posts(%{author: true, comment: true})
Super cool!
Mixing both types of preloads
What if we want to preload Comment’s Authors too? Since
these are :belongs_to
association, we could load it in the same SQL
query as the one which preloads Comments. The function call we want to
handle looks like this:
# return list of posts with Comment and it's Author preloaded:
list_posts(%{author: true, comment: %{author: true}})
To achieve this, we only slightly need to amend our
maybe_prealod_comments/1
function, and add one condition to preload
Comment’s Authors:
defp maybe_preload_comments(query, params) do
comments_query =
from(comment in Comment)
|> maybe_preload_comment_author(params[:author])
from(posts in query, preload: [comments: ^comments_query])
end
defp maybe_preload_comment_author(query, nil), do: query
defp maybe_preload_comment_author(query, _) do
from(comment in query,
left_join: author in Author,
on: author.id == comment.author_id,
preload: [author: author]
)
end
Filtering the data
We also want to fitler the data: by Tag, and by Author’s name. Note that preloading Tags is different to filtering by tag.
When preloading the Tags, we want to go through Taggings first, so our query to fetch all the Tags will involve JOIN clause:
defp build_query(params, filters) do
query = base_query()
query
|> maybe_preload_author(params[:author])
|> maybe_preload_comments(params[:comments])
|> maybe_preload_tags(params[:tags])
|> maybe_filter_by_tag(filters[:tag])
|> maybe_filter_by_author(filters[:author])
end
...
defp maybe_preload_tags(query, nil), do: query
defp maybe_preload_tags(query, _) do
tags_query =
from(tagging in Tagging,
inner_join: tag in Tag,
on: tag.id == tagging.tag_id,
preload: [tag: tag]
)
query = from(posts in query, preload: [{:taggings, ^tags_query}, :tags])
end
Similar, filtering by Tag’s name will require having to go through Taggings:
defp maybe_filter_by_tag(query, nil), do: query
defp maybe_filter_by_tag(query, tag_name) do
from(posts in query,
inner_join: tagging in Tagging,
on: tagging.post_id == posts.id,
inner_join: tag in Tag,
on: tag.id == tagging.tag_id,
where: tag.name == ^tag_name
)
end
And finally, remaining piece of functionality is to filter by Author’s name:
defp maybe_filter_by_author(query, nil), do: query
defp maybe_filter_by_author(query, author_name) do
from(posts in query,
inner_join: author in Author,
on: author.id == posts.author_id,
where: author.name == ^author_name
)
end
We did it! We can now filter & preload our list of Posts in very flexible, efficient manner!
Example project with implementation & tests
You can find the example to this blog post on GitHub with module implementing all the preloading and filtering API defined here and relevant tests here
Post by Hubert Łępicki
Hubert is partner at AmberBit. Rails, Elixir and functional programming are his areas of expertise.