如果你使用Rails
以及它的 active record
, 你也许已经了解了通过使用方法indcludes
来进行数据的eager
加载. 但是你可能会发现对于这个方法法背后 rails 以及 activerecord 的实现是取决于不同的上下文的. 有时候你可能会发现拿到的查询语句很简单, 而有时却是一个相对很大的查询语句. 而且对于每一个字段都已经别名了呢. 而且我们同时还有preload
方法和eager_load
方法, 而他们是否能够获取到系统的结果呢? 在 Rails4
中又发生了什么变化呢? 有点模糊说不清那就我们一起来看看到底他们背后有什么 玄妙之处 .
step 1
首选, 我们来使用Active Record
类还有聚合定义本次使用到的场景:
class User < ActiveRecord::Base
has_many :addresses
end
class Address < ActiveRecord::Base
belongs_to :user
end
我们再来造点数据
rob = User.create!(name: "Robert Pankowecki", email: "robert@example.org")
bob = User.create!(name: "Bob Doe", email: "bob@example.org")
rob.addresses.create!(country: "Poland", city: "Wrocław", postal_code: "55-555", street: "Rynek")
rob.addresses.create!(country: "France", city: "Paris", postal_code: "75008", street: "8 rue Chambiges")
bob.addresses.create!(country: "Germany", city: "Berlin", postal_code: "10551", street: "Tiergarten")
Rails3
一般来说, 如果想要使用eager loading
的功能, 就应该使用#includes
方法, 这个方法从 Rails1? 2?
就开始推荐使用了. 你会发现其实它会有2个 query 发出:
User.includes(:addresses)
# SELECT "users".* FROM "users"
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)
那么另外个方法到底做了什么呢?
User.preload(:addresses)
# SELECT "users".* FROM "users"
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)
很明显preload
和includes
方法很相似, 接着再看# eager_load
:
User.eager_load(:addresses)
# SELECT
# "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,
# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
# FROM "users"
# LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id"
这个时候发现,和前两个完全不一样. 这个就是Rails
神奇的地方, 它有两种加载数据的方法. 一种就是通过单个的查询语句来获取所有的数据, 然后再聚合, 另外一种就是在查询的时候已经通过语句来完成了数据的聚合
.(如 left join)
也就是说, 如果你使用了preload
那就是意味着你总是想要单独去完成查询, 而# eager_load
通常就是一条语句, 然后更多的工作是在数据库那边完成. 说道这里, 那么#includes
它到底代表什么呢, 这个时候就要取决于上下文的实际情况了. 决定权交给了Rails
. 可能直接看上去就是看查询的条件了, 下面我们看看具体什么样的语句会被委托给eager_load
:
User.includes(:addresses).where("addresses.country = ?", "Poland")
User.eager_load(:addresses).where("addresses.country = ?", "Poland")
# SELECT
# "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,
# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
# FROM "users"
# LEFT OUTER JOIN "addresses"
# ON "addresses"."user_id" = "users"."id"
# WHERE (addresses.country = 'Poland')
在上面的例子里 Rails 是探测到实际上在where
条件里使用到的数据对象
是之前preloaded
也就是(include)进来的数据, 所以直接把 includes 委托给eager_load
, 其实从这里看一直使用 eager_load 就可以实现结果.
但是如果这里直接使用preload
, 你会发现报错的.
User.preload(:addresses).where("addresses.country = ?", "Poland")
# SELECT "users".* FROM "users" WHERE (addresses.country = 'Poland')
#
# SQLite3::SQLException: no such column: addresses.country
这里的区别就是preload
并不会直接去关联表查询, 所以会报错.
可是这样的理解能够透彻吗?
我们回头再看这个例子:
User.includes(:addresses).where("addresses.country = ?", "Poland")
你可能好奇, 这个语句它本身的意图是什么? 到底是想要取到什么样的数据呢?
- 取到所有波兰的地址, 然后加载波兰的地址数据?
- 取到所有的波兰地址, 然后加载所有的地址数据?
- 取到所有的用户数据以及波兰的地址
你能够理解上面的例子里我们达到了什么目的吗? 没错是第一个, 那么我们能够实现第二个和第三个吗?
#preload 到底好不好使?
我们当前的目标是: 获取到所有具有波兰地址的用户数据,但是却加载了他们的所有地址数据. 我只有拿到了所有的数据之后才可以得到用户他们至少有一个波兰的地址.
这里, 很明显,我们的目的就是 只拿到有波兰地址的用户数据, 但是实际情况确实我们得先拿到所有的地址数据. 所以我们可以这样写User.join(:addresses).where("addresses.country = ?", "Poland")
, 这样我们就 只是拿到了部分匹配的地址数据, 对吗?
r = User.joins(:addresses).where("addresses.country = ?", "Poland").includes(:addresses)
r[0]
#=> #<User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">
r[0].addresses
# [
# #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
# ]
但是,事实上并没有和我们想要的结果一样, 我们遗漏了用户的第二条数据, 其实我们还是加载了整张表, 虽然我们使用了#eager_load
声明. 唯一不同的就是之前的例子使用了INNER JOIN
来代替LEFT JOIN
, 但是对于查询语句来说并没有什么区别.
SELECT
"users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,
"addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
FROM "users"
INNER JOIN "addresses"
ON "addresses"."user_id" = "users"."id"
WHERE (addresses.country = 'Poland')
这个时候怎么办呢, 我们就可以通过preload
来搞定 Rails.
r = User.joins(:addresses).where("addresses.country = ?", "Poland").preload(:addresses)
# SELECT "users".* FROM "users"
# INNER JOIN "addresses" ON "addresses"."user_id" = "users"."id"
# WHERE (addresses.country = 'Poland')
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1)
r[0]
# [#<User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">]
r[0].addresses
# [
# <Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">,
# <Address id: 3, user_id: 1, country: "France", street: "8 rue Chambiges", postal_code: "75008", city: "Paris", created_at: "2013-12-08 11:36:30", updated_at: "2013-12-08 11:36:30">]
# ]
这个结果就是我们想要的.通过使用# preload
我们不再混合我们想要的用户数据和所有的数据之间.直接就拿出了想要的用户的数据.
预加载聚合的子集(preloading subset of association)
这里我们的目标变成了 拿到所有的用户数据,并且还有他们的波兰地址.
老实讲, 我并不会想去只是预加载聚合的子集, 而事实上通常系统在其他地方有可能会用到这个结果集. 当然如果你只是想要展示这个结果集的话那就有些道理.
我更加喜欢使用只是在 定义聚合的地方添加条件:
class User < ActiveRecord::Base
has_many :addresses
has_many :polish_addresses, conditions: {country: "Poland"}, class_name: "Address"
end
这样我们就可以很容易拿到结果:
r = User.preload(:polish_addresses)
# SELECT "users".* FROM "users"
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."country" = 'Poland' AND "addresses"."user_id" IN (1, 2)
r
# [
# <User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">
# <User id: 2, name: "Bob Doe", email: "bob@example.org", created_at: "2013-12-08 11:26:25", updated_at: "2013-12-08 11:26:25">
# ]
r[0].polish_addresses
# [
# #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
# ]
r[1].polish_addresses
# []
或者
r = User.eager_load(:polish_addresses)
# SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,
# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
# FROM "users"
# LEFT OUTER JOIN "addresses"
# ON "addresses"."user_id" = "users"."id" AND "addresses"."country" = 'Poland'
r
# [
# #<User id: 1, name: "Robert Pankowecki", email: "robert@example.org", created_at: "2013-12-08 11:26:24", updated_at: "2013-12-08 11:26:24">,
# #<User id: 2, name: "Bob Doe", email: "bob@example.org", created_at: "2013-12-08 11:26:25", updated_at: "2013-12-08 11:26:25">
# ]
r[0].polish_addresses
# [
# #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
# ]
r[1].polish_addresses
# []
终极问题
你可能会问, "这特么很难吗?" 我不确定, 但是我想大部分的 ORM系统总是希望能够让你构建一个查询, 然后从一个表里加载数据. 通过eager loding
让事情变得复杂. 当我们想要得到复合的数据结果, 从多个表里,而且是多个条件的情况下. 在 Rails 里我们使用链条式的 API 来实现2个或者更多的查询(如果使用# preload
)
那么我喜欢什么样的 API 呢, 我在想这样:
User.joins(:addresses).where("addresses.country = ?", "Poland").preload do |users|
users.preload(:addresses).where("addresses.country = ?", "Germany")
users.preload(:lists) do |lists|
lists.preload(:tasks).where("tasks.state = ?", "unfinished")
end
end
而在 Rails4里发生了什么变化呢
class User < ActiveRecord::Base
has_many :addresses
has_many :polish_addresses, -> {where(country: "Poland")}, class_name: "Address"
end
Rails 鼓励大家使用更多的lambda
表达式的语法来定义聚合的条件. 这样非常不错, 其实也不只是lambda
表达式, 方法的方式也是一样的, 都是在 定义范围
# Bad, Time.now would be always the time when the class was loaded
# You might not even spot the bug in development because classes are
# automatically reloaded for you after changes.
scope :from_the_past, where("happens_at <= ?", Time.now)
# OK
scope :from_the_past, -> { where("happens_at <= ?", Time.now) }
# OK
def self.from_the_past
where("happens_at <= ?", Time.now)
end
虽然在我们的例子里where(country: "Poland")
总是一样的,不管是动态编译还是一开始就加载, 但是好在就是 Rails 会帮助我们远离 bug, 我们看看它们是否发生了变化:
User.includes(:addresses)
# SELECT "users".* FROM "users"
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)
User.preload(:addresses)
# SELECT "users".* FROM "users"
# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)
User.eager_load(:addresses)
# SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,
# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
# FROM "users"
# LEFT OUTER JOIN "addresses"
# ON "addresses"."user_id" = "users"."id"
并没有, 但是我们试着加几个条件, 之前总是会报错的那种情况:
User.includes(:addresses).where("addresses.country = ?", "Poland")
试试这个, 发现报错了, 接着试试这个
User.includes(:addresses).where("addresses.country = ?", "Poland").references(:addresses)
我很好奇如果我们想要预加载更多的表,而只引用其中一张表呢?
User.includes(:addresses).where("addresses.country = ?", "Poland").references(:addresses)
我试想addresses
将会通过eager_load
的方式还有place
将会通过preload
的方式来加载. 但是结果并不是, 在 Rails4中并不会警告如果使用了reference
而同时显式使用eager_load
,
User.eager_load(:addresses).where("addresses.country = ?", "Poland")
User.includes(:addresses).where("addresses.country = ?", "Poland").references(:addresses)
User.eager_load(:addresses).where("addresses.country = ?", "Poland")
```
而如果使用 preload 还是报同样的错:
User.preload(:addresses).where("addresses.country = ?", "Poland")
# SELECT "users".* FROM "users" WHERE (addresses.country = 'Poland')
#
# SQLite3::SQLException: no such column: addresses.country: SELECT "users".* FROM "users" WHERE (addresses.country = 'Poland')
总结
对于这三个方法
- includes
- preload
- eager_load
includes
会被 rails 实际委托给 preload 或者 eager_load, 取决于后面是否跟了条件, 而条件里是否包含了需要 preload 的表.
#preload
就是要使用单独的语句先拿到全部的结果集.
#eager_load
通过都是使用了LEFT JOIN
来eager load
关联的表.
ref:
- http://blog.arkency.com/2013/12/rails4-preloading/
- https://www.rubyplus.com/articles/3211-Eager-Loading-in-Rails-5
- https://www.allerin.com/blog/eager-loading-inwith-rails
- http://stackoverflow.com/questions/10084355/eager-loading-and-lazy-loading-in-rails
- http://stackoverflow.com/questions/41654098/rails-5-how-to-form-association-between-tables-on-multiple-shared-attributes