Skip to content

Add way to generate DataFrame from active_record with aggregated fields #80

@janpeterka

Description

@janpeterka

I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.

Here is what I wanted to use:

active_record = Provider.left_join(zip: :district).group(:id)

However I found out that I cannot give field like ANY_VALUE(district.id), because it gets converted to symbol infrom_activerecord, and subsequently pluck tries to convert it to table.column.
(At least thats how I understand it works).

So, we found out way to bypass this and I was thinking about adding this to daru, in something like this:

      # Load dataframe from AR::Relation
      #
      # @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
      # @param with_sql_methods [Boolean] Enables giving fields with SQL methods
      #
      # @return A dataframe containing the data in the given relation
      def from_activerecord(relation, *fields, with_sql_methods: false)
        fields = relation.klass.column_names if fields.empty?

        fields = if with_sql_methods
                   fields.map(&:to_s)
                 else
                   fields.map(&:to_sym)

        result = relation.pluck(*fields).transpose
        Daru::DataFrame.new(result, order: fields).tap(&:update)
      end

Now I can create new DataFrame as

data_frame = Daru::DataFrame.from_activerecord(active_record,
                                              ["ANY_VALUE(district.id)"],
                                              with_sql_methods: true)

What do you think about that?


Originally posted here - SciRuby/daru#523

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions