Building SQL expressions with Sequel
I’ve recently started working on a new project which uses Sequel, and it reminded me how much I love it. For those who don’t know, Sequel is a superb alternative to Active Record. I wrote a gentle introduction to Sequel a while back.
One of the prettiest parts of Sequel for me is the API for building SQL expressions, so in this article I would like to talk more about that. But before we start with Sequel, I first want to talk a little bit about Active Record.
Active Record
In Active Record, equality and inclusion expressions are typically built with simple ruby hashes:
Movie.where(rating: 7, genre: ["Adventure", "Comedy"])
# WHERE movies.rating = 7 AND movies.genre IN ('Adventure', 'Comedy')
Negation is also easy with where.not(...)
:
Movie.where.not(rating: 7, genre: ["Adventure", "Comedy"])
# WHERE movies.rating != 7 AND movies.genre NOT IN ('Adventure', 'Comedy')
A nested hash can be used to qualify columns with a different table name (useful for joined datasets):
Movie.joins(:rating).where(rating: { imdb: 7, netflix: 5..8 })
# WHERE rating.imdb = 7 AND rating.netflix >= 5 AND rating.netflix <= 8
However, as soon as we need something that cannot be expressed by a hash, it’s common to just reach for raw SQL strings:
Movie.where("year >= ? OR name LIKE ?", 2012, "Summer%")
# or
Movie.where("year >= :year OR name LIKE :term", year: 2012, term: "Summer%")
Personally, I find this less readable than the hash version, because we need to use placeholders for values that need to be escaped, and provide actual values after the SQL string. Using raw SQL strings is also not very consistent, because now you have some expressions built in Ruby and other are provided in raw SQL.
To address these disadvantages, many people have discovered Arel, Active Record’s internal query builder. It has an expression builder API that allows us to avoid SQL strings for many types of expressions.
Movie.where(Movie.arel_table[:rating].gteq(7))
# SELECT movies.* FROM movies WHERE movies.rating >= 7
We can chain method calls to create more complex expressions, and add the
arel-helpers gem that aliases #arel_table
to #[]
:
Movie.where(Movie[:rating].gteq(7).or(Movie[:director].eq("Quentin Tarantino")))
# SELECT movies.* FROM movies WHERE movies.rating >= 7 OR movies.director = 'Quentin Tarantino'
However, whether this approach is better than using raw SQL strings is up for
debate. On one hand, Arel allows you to inline the values that need to be
escaped, so you don’t have to use the ?
placeholders. It’s also database
agnostic, since you’re building an abstract syntax tree that will be converted
to the database-specific SQL by the underlying Active Record adapter.
On the other hand, Arel expressions can get difficult to read as they typically
have much more parentheses than their raw SQL counterparts. Also, Arel is
considered private API, so it’s not optimized for convenience and it’s subject
to change. That’s probably the reason why I also found it very scarcely
documented (you won’t find ActiveRecord::Base.arel_table
in the API
documentation).
Sequel
Sequel’s SQL expression builder API is similar to Arel in the sense that it also builds an abstract syntax tree. However, in Sequel this is public API, it’s optimized for convenience, it’s stable, and it’s very well documented.
We will now walk through various SQL expression types, starting from the most common ones, and show how Sequel always has us covered.
Operators
We’ve already seen that Arel uses methods for building expressions with SQL operators, which can create a lot of nested parentheses. Sequel, on the other hand, uses actual Ruby operators, which make the expressions naturally readable.
Equality operators
Sequel can use simple Ruby hashes to do equality and inclusion, just like Active Record:
Movie.where(
director: "Quentin Tarantino", # string equality
genre: ["Adventure", "Drama"], # set inclusion
released: true, # TRUE/FALSE equality
premiered_at: nil, # NULL equality
)
But what if you want to join conditions with OR
instead of AND
? Or you want
to use inequality (>
, >=
, <=
, <
) or numeric (+
, -
, *
, /
)
operators? Or LIKE
conditions? Sure, you can always switch to raw SQL
strings, but with Sequel we can do better.
Sequel gives you the option to break away from hashes by providing an API for building expressions. One of the simplest expression objects is a column identifier:
Sequel[:director]
# Sequel::SQL::Identifier: director
The basic operators defined on expression objects are =~
(meaning “equals”)
and !~
(meaning “not equals”). The result of an operator is again an
expression object.
Sequel[:director] =~ "Quentin Tarantino"
# Sequel::SQL::BooleanExpression: director = 'Quentin Tarantino'
Sequel[:genre] !~ "Drama"
# Sequel::SQL::BooleanExpression: genre != 'Drama'
We can go back and rewrite our hash example using Sequel expressions:
Movie.where(
(Sequel[:director] =~ "Quentin Tarantino") &
(Sequel[:genre] =~ ["Adventure", "Drama"]) &
(Sequel[:released] =~ true) &
(Sequel[:premiered_at] =~ nil)
)
This is of course longer and more verbose than the hash version, it’s just to introduce the API, so bear with me.
Now, having to repeat Sequel[]
whenever we want to create an identifier
object can be cumbersome, so Sequel offers the virtual row block syntax that
automatically creates identifier objects via method_missing
. All you have to
do is pass a block to .where
and then you can omit the Sequel[]
:
Movie.where{
(director =~ "Quentin Tarantino") &
(genre =~ ["Adventure", "Drama"]) &
(released =~ true) &
(premiered_at =~ nil)
}
Active Record users might recognize this syntax if they have ever used the Squeel Active Record extension. Squeel was heavily inspired by Sequel’s virtual row blocks.
Boolean operators
You may have noticed the &
operators between the conditions above. As you
might have guessed, Sequel will translate those into AND
in SQL.
(Sequel[:director] =~ "Quentin Tarantino") & (Sequel[:genre] !~ "Drama")
# SQL: (director = 'Quentin Tarantino') AND (genre != 'Drama')
Up until now there was no benefit of using the expression API over hashes,
because we’ve only dealt with equality conditions joined with AND
. If we
wanted to use an OR
condition instead, in Active Record we would have to
switch to raw SQL. But with Sequel we can just use the |
operator:
(Sequel[:director] =~ "Quentin Tarantino") | (Sequel[:genre] !~ "Drama")
# SQL: (director = 'Quentin Tarantino') OR (genre != 'Drama')
Note that we have to put parentheses around the clauses, because &
and |
operators have higher predence than =~
and !~
.
You might be wondering why we can’t just use &&
and ||
instead of &
and
|
. The reason is that &&
and ||
are keywords built into the Ruby
interpreter, therefore it’s not possible to change their meaning. &
and |
,
on the other hand, are just methods which can be overriden like many other
operators.
def &(other)
# ...
end
def |(other)
# ...
end
The third boolean operator is negation (NOT
). With Sequel we can negate
expressions using the ~
unary operator:
~((Sequel[:director] =~ "Quentin Tarantino") | (Sequel[:genre] !~ "Drama"))
# SQL: (director != 'Quentin Tarantino') AND (genre = 'Drama')
If you prefer and are able to use plain hashes, you can still use the &
, |
,
and ~
operators defined on the Sequel
module:
Movie.where(Sequel.|(
Sequel.&(genre: "Comedy", released: true),
Sequel.~(genre: "Horror"),
))
# SELECT * FROM movies WHERE ((genre = 'Comedy') AND (released IS TRUE)) OR (genre != 'Horror')
Inequality operators
In addition to the equality operators (=
and !=
), Sequel also defines
inequality operators (>
, >=
, <=
, <
) on the expression objects.
Sequel[:rating] >= 7 # SQL: rating >= 7
We can then chain them with boolean operators in the same way:
Movie.where((Sequel[:imdb_rating] > 7.5) & (Sequel[:rotten_tomatoes_rating] >= 6))
# SELECT * FROM movies WHERE imdb_rating > 7.5 AND rotten_tomatoes_rating >= 6
And use virtual row blocks:
Movie.where{(imdb_rating > 7.5) & (rotten_tomatoes_rating >= 6)}
# SELECT * FROM movies WHERE imdb_rating > 7.5 AND rotten_tomatoes_rating >= 6
Numeric and String operators
The expression objects also define some numeric operators we’d expect:
Sequel[:order] + 2 # SQL: order + 2
Sequel[:order] - 2 # SQL: order - 2
Sequel[:order] * 2 # SQL: order * 2
Sequel[:order] / 2 # SQL: order / 2
On string expressions +
is defined as concatenation:
Sequel[:first_name] + " " + Sequel[:last_name]
# SQL: first_name || ' ' || last_name
Qualifying
Sequel doesn’t automatically qualify column identifiers, because SQL doesn’t either.
Movie.where(name: "Matrix")
# ActiveRecord: SELECT movies.* FROM movies WHERE movies.name = 'Matrix'
# Sequel: SELECT * FROM movies WHERE name = 'Matrix'
To qualify a column identifier with a table name, you can call another square brackets on the identifier:
Movie.where(Sequel[:movies][:name] => "Matrix")
Movie.where{movies[:name] =~ "Matrix"}
# SELECT * FROM movies WHERE movies.name = 'Matrix'
You can also qualify whole expressions with Sequel.deep_qualify
:
Sequel.deep_qualify(:movies,
(Sequel[:imdb_rating] + Sequel[:rotten_tomatoes_rating] >= 10) &
(Sequel[:released_at].extract(:year) >= 2000)
)
# SQL: (movies.imdb_rating + movies.rotten_tomatoes_rating >= 10) AND
# (extract(year FROM movies.released_at) >= 2000)
To just qualify all column identifiers in a dataset, use
Sequel::Dataset#qualify
:
Movie.select(:name).where(genre: "Adventure").order(:rating)
# SELECT name FROM movies WHERE (genre = 'Adventure') ORDER BY rating
Movie.select(:name).where(genre: "Adventure").order(:rating).qualify
# SELECT movies.name FROM movies WHERE (movies.genre = 'Adventure') ORDER BY movies.rating
Active Record does have the more convenient hash syntax for qualifying #where
conditions. However, I don’t find it ideal to have to nest conditions that have
the same level of importance.
# ActiveRecord
Movie.joins(:rating)
.where(
released: true, # movies.released IS TRUE
rating: { imdb: 7 } # rating.imdb = 7
)
# Sequel
Movie.association_join(:rating)
.where{
(released =~ true) & # movies.released IS TRUE
(rating[imdb] =~ 7) # rating.imdb = 7
}
Ordering
Ordered expressions can be created with Sequel.asc
and Sequel.desc
:
Sequel.desc(:rating)
# Sequel::SQL::OrderedExpression: rating DESC
Sequel.asc(:rating)
# Sequel::SQL::OrderedExpression: rating ASC
These can then be passed to #order
:
Movie.order(Sequel.desc(:rating))
Movie.order(Sequel[:rating].desc)
Movie.order{rating.desc}
# SELECT * FROM movies ORDER rating DESC
When I first came to Sequel after having used Active Record for many years, I
was wondering if there was something like order(rating: :desc)
in Sequel.
Jeremy explained that this syntax would violate the general rule in
Sequel that hashes are used for equality expressions. I also came to realize
how flexible is that #asc
and #desc
are methods.
For example, that allows Sequel to support NULLS FIRST
and NULLS LAST
via
the :nulls
option:
Movie.order{rating.desc(nulls: :last)}
# SELECT * FROM movies ORDER BY rating DESC NULLS LAST
You can also order by any complex expressions:
Movie.order{(imdb_rating + rotten_tomatoes_rating).desc}
# SELECT * FROM movies ORDER BY (imdb_rating + rotten_tomatoes_rating) DESC
Since ordered expressions are expressions, Sequel will recognize them anywhere, even for example in parameters for a window function:
Movie.select{row_number.function.over(partition: :director, order: released_at.asc)}
# SELECT row_number() OVER (PARTITION BY director ORDER BY released_at ASC) FROM movies
Like
Sequel supports LIKE and ILIKE expressions via #like
and #ilike
methods:
Sequel.like(:title, "Summer%")
# Sequel::SQL::BooleanExpression: title LIKE 'Summer%'
Sequel.ilike(:title, "summer%")
# Sequel::SQL::BooleanExpression: title ILIKE 'Summer%'
They can then be used in #where
:
Movie.where(Sequel.like(:title, "Summer%"))
Movie.where(Sequel[:title].like("Summer%"))
Movie.where{title.like("Summer%")}
# SELECT * FROM movies WHERE title LIKE 'Summer%'
Here is an example of creating filter expression that matches a search term onto multiple columns:
filter = [:title, :summary, :director]
.map { |column| Sequel.ilike(column, "%#{search_term}%") }
.inject(:&)
Movie.where(filter)
# SELECT * FROM movies WHERE title ILIKE '%foo%' AND summary ILIKE '%foo%' AND director ILIKE '%foo%'
Functions
Some SQL functions have their dedicated API in Sequel, but in general Sequel
supports calling any SQL function. Sequel.function
can be used to create
a function expression; the first argument is the function name and the rest
are function arguments.
Sequel.function(:max, :rating) # SQL: max(rating)
They can then be used in places like #select
:
Movie.select(Sequel.function(:max, :rating))
# SELECT max(rating) FROM movies
To create count(*)
, you can use the *
unary operator:
Movie.select(Sequel.function(:count).*)
# SELECT count(*) FROM movies
Sequel’s virtual row blocks make calling SQL functions more convenient via
method_missing
. An undefined method called with arguments will automatically
be converted into a function, whereas to create a function without arguments
you’ll need to add a call to #function
.
Movie.select{max(rating)} # SELECT max(rating) FROM movies
Movie.select{row_number.function} # SELECT row_number() FROM movies
Movie.select{count.function.*} # SELECT count(*) FROM movies
This means you can easily convert even complex SQL queries into Sequel:
SELECT coalesce(first_name, '') || ' ' || coalesce(last_name, '') AS full_name
FROM users
WHERE now() - created_at <= 24*60*60
ORDER BY lower(full_name)
User
.select{(coalesce(:first_name, '') + ' ' + coalesce(:last_name, '')).as(:full_name)}
.where{(now.function - :created_at) <= 24*60*60}
.order{lower(:full_name)}
In Arel you have the Arel::Nodes::NamedFunction
counterpart, but using it is
much more verbose:
# ActiveRecord
User.select(Arel::Nodes::NamedFunction.new("coalesce", [User.arel_table[:first_name], Arel.sql("''")]))
# SELECT coalesce(users.first_name, '') FROM users
Aliasing
Sequel provides support for aliasing via the #as
method. You can alias
anything from column identifiers:
Movie.select(Sequel.as(:imdb_rating, :rating))
# SELECT imdb_rating AS rating FROM movies
to expressions:
User.select(Sequel.join([:first_name, :last_name], ' ').as(:full_name))
# SELECT (first_name || ' ' || last_name) AS full_name FROM users
and even whole datasets:
User.join(UserActivityLog.order{created_at.desc}.limit(100).as(:logs), user_id: :id)
# SELECT * FROM users JOIN (
# SELECT * FROM user_activity_logs ORDER BY created_at DESC LIMIT 100
# ) AS logs ON (logs.user_id = users.id)
Arel also has #as
for aliasing, but for some reason it’s not available for all
types of expressions:
# ActiveRecord
Movie.arel_table[:rating].as("total_rating")
# SQL: rating AS total_rating
(Movie.arel_table[:imdb_rating] + Movie.arel_table[:netflix_rating]).as("total_rating")
# ~> NoMethodError: undefined method `as' for #<Arel::Nodes::Grouping:0x00007fb915477e28>
Case
You can build CASE
statements with Sequel.case
, where first argument is the
hash of WHERE
/THEN
conditions, second argument is the ELSE
default value,
and the third optional argument is the CASE
expression.
Sequel.case( # CASE
{
(Sequel[:rating] >= 8) => "good", # WHEN rating >= 8 THEN 'good'
(Sequel[:rating] >= 6) => "ok", # WHEN rating >= 6 THEN 'ok'
(Sequel[:rating] >= 4) => "bad", # WHEN rating >= 4 THEN 'bad'
},
"abysmal" # ELSE 'abysmal'
)
A few years ago I made a pull request to RubyGems.org, which
added a Rake task that syncs cached download counts from Redis into the
Postgres database. Since the rubygems
table has many rows, I didn’t want
to execute an UPDATE
query for each row, so I built a CASE
statement in
SQL:
case_query = Rubygem.pluck(:name)
.map { |name| "WHEN '#{name}' THEN #{$redis["downloads:rubygem:#{name}"].to_i}" }
.join("\n ")
ActiveRecord::Base.connection.execute <<-SQL.strip_heredoc
UPDATE rubygems
SET downloads = CASE name
#{case_query}
END
SQL
With Sequel this whole expression can be created in Ruby:
counts_by_name = Rubygem.select_map(:name)
.map { |name| [name, $redis["downloads:rubygem:#{name}"].to_i] }
Rubygem.update(downloads: Sequel.case(counts_by_name, :downloads, :name))
Arel also has its own Arel::Nodes::Case
, which is built incrementally.
However, Active Record’s #update_all
method doesn’t seem to accept Arel
expressions, because this case expression gets converted to NULL
.
download_counts = Arel::Nodes::Case.new(Rubygem.arel_table[:name])
Rubygem.pluck(:name)
.map { |name| [name, $redis["downloads:rubygem:#{name}"].to_i] }
.each { |name, count| download_counts.when(name).then(count) }
download_counts.else(Rubygem.arel_table[:downloads])
Rubygem.update_all(downloads: download_counts)
# UPDATE "rubygems" SET "downloads" = NULL
Array & JSON operations
The last type of expressions we’ll cover are operations with Postgres Array and JSON types. These operations include operators and function names that can be difficult to remember, and Sequel provides a convenience API via the pg_array_ops and pg_json_ops extensions (see this document for more on Sequel extensions).
DB.extension :pg_array, :pg_json # load support for array and json types
Sequel.extension :pg_array_ops, :pg_json_ops # load methods for building array/json expressions
We first need to create an array/json expression by calling #pg_array
and
#pg_jsonb
, and then we can call operation methods on that object.
Movie.where{
genres.pg_array.contains(["Adventure", "Comedy"]) &
(imdb.pg_jsonb['Rating'] >= 7)
}
# SELECT * FROM movies WHERE (
# (genres @> ARRAY['Adventure','Comedy']) AND
# ((imdb_data -> 'Rating') >= 7)
# )
If you want to avoid repeating #pg_array
or #pg_jsonb
in your queries, you
can save the array/json expression object into a variable or a constant.
genres = Sequel.pg_array(:genres)
imdb = Sequel.pg_jsonb(:imdb)
Movie.where(genres.contains(["Adventure", "Comedy"]) & (imdb['Rating'] >= 7))
Other uses
Other than for building queries, Sequel supports the expression API in many other areas where SQL expressions are needed.
For instance, you can use them when creating an index:
create_table :users do
primary_key :id
String :handle
DateTime :deleted_at
index Sequel.function(:lower, :handle),
name: :users_unique_handle,
unique: true,
where: Sequel[:deleted_at] !~ nil
# CREATE UNIQUE INDEX users_unique_handle
# ON users (lower(handle))
# WHERE (deleted_at IS NOT NULL)
end
Another example is the USING
statement when changing column type:
alter_table :albums do
set_column_type :artist_id, :integer, using: Sequel[:artist_id].cast(Integer)
# ALTER COLUMN artist_id SET DATA TYPE integer USING CAST(artist_id AS integer)
end
Yet another area where expressions are supported is in Sequel’s UPSERT method:
Movie
.insert_conflict(
update: { imdb: Sequel.pg_jsonb(:imdb).concat(Sequel[:excluded][:imdb]) },
update_where: Sequel[:year] >= 2000,
)
.multi_insert(imported_values)
# INSERT INTO movies VALUES ...
# ON CONFLICT DO
# UPDATE SET imdb = (imdb || excluded.imdb)
# WHERE (year >= 2000)
Conclusion
Sequel’s API for building SQL expressions allows you to consistently stay in Ruby even for very advanced use cases. This has numerous advantages:
- it’s more readable, with different styles to choose from
- it’s more hackable and reusable
- doesn’t require you to look up the correct SQL syntax every time
- eliminates the possibility of SQL injections
We’re much more motivated to use the various features our database has to offer when our database library encourages us to.