-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.
Here is what I wanted to use:
active_record = Provider.left_join(zip: :district).group(:id)
However I found out that I cannot give field like ANY_VALUE(district.id)
, because it gets converted to symbol infrom_activerecord
, and subsequently pluck
tries to convert it to table
.column
.
(At least thats how I understand it works).
So, we found out way to bypass this and I was thinking about adding this to daru
, in something like this:
# Load dataframe from AR::Relation
#
# @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
# @param with_sql_methods [Boolean] Enables giving fields with SQL methods
#
# @return A dataframe containing the data in the given relation
def from_activerecord(relation, *fields, with_sql_methods: false)
fields = relation.klass.column_names if fields.empty?
fields = if with_sql_methods
fields.map(&:to_s)
else
fields.map(&:to_sym)
result = relation.pluck(*fields).transpose
Daru::DataFrame.new(result, order: fields).tap(&:update)
end
Now I can create new DataFrame as
data_frame = Daru::DataFrame.from_activerecord(active_record,
["ANY_VALUE(district.id)"],
with_sql_methods: true)
What do you think about that?
Originally posted here - SciRuby/daru#523
Metadata
Metadata
Assignees
Labels
No labels