-
Couldn't load subscription status.
- Fork 34
WIP use Tables.Columns instead of columntable
#247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
e6a5365
d7d1283
8a00097
89adab5
3810c48
8dcbf62
6032e6e
59601d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -112,16 +112,16 @@ julia> sch[term(:y)] | |||||
| y(continuous) | ||||||
| ``` | ||||||
| """ | ||||||
| schema(data, hints=Dict{Symbol,Any}()) = schema(columntable(data), hints) | ||||||
| schema(dt::D, hints=Dict{Symbol,Any}()) where {D<:ColumnTable} = | ||||||
| schema(Term.(collect(fieldnames(D))), dt, hints) | ||||||
| schema(ts::AbstractVector{<:AbstractTerm}, data, hints::Dict{Symbol}) = | ||||||
| schema(ts, columntable(data), hints) | ||||||
| schema(data, hints=Dict{Symbol,Any}()) = | ||||||
| schema(Term.(collect(Tables.columnnames(data))), data, hints) | ||||||
|
|
||||||
| # handle hints: | ||||||
| schema(ts::AbstractVector{<:AbstractTerm}, dt::ColumnTable, | ||||||
| hints::Dict{Symbol}=Dict{Symbol,Any}()) = | ||||||
| sch = Schema(t=>concrete_term(t, dt, hints) for t in ts) | ||||||
| function schema(ts::AbstractVector{<:AbstractTerm}, | ||||||
| data, | ||||||
| hints::Dict{Symbol}=Dict{Symbol,Any}()) | ||||||
| data = Tables.Columns(Tables.columns(data)) | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the advantage of wrapping the result of |
||||||
| sch = Schema(t=>concrete_term(t, data, hints) for t in ts) | ||||||
| end | ||||||
|
|
||||||
| schema(f::TermOrTerms, data, hints::Dict{Symbol}) = | ||||||
| schema(filter(needs_schema, terms(f)), data, hints) | ||||||
|
|
@@ -168,15 +168,15 @@ a(continuous) | |||||
| """ | ||||||
| concrete_term(t::Term, d, hints::Dict{Symbol}) = | ||||||
| concrete_term(t, d, get(hints, t.sym, nothing)) | ||||||
| concrete_term(t::Term, dt::ColumnTable, hint) = | ||||||
| concrete_term(t, getproperty(dt, t.sym), hint) | ||||||
| concrete_term(t::Term, dt::ColumnTable, hints::Dict{Symbol}) = | ||||||
| concrete_term(t, getproperty(dt, t.sym), get(hints, t.sym, nothing)) | ||||||
| concrete_term(t::Term, d, hint) = | ||||||
| concrete_term(t, getcolumn(d, t.sym), hint) | ||||||
| concrete_term(t::Term, d, hints::Dict{Symbol}) = | ||||||
| concrete_term(t, getcolumn(d, t.sym), get(hints, t.sym, nothing)) | ||||||
| concrete_term(t::Term, d) = concrete_term(t, d, nothing) | ||||||
|
|
||||||
| # if the "hint" is already an AbstractTerm, use that | ||||||
| # need this specified to avoid ambiguity | ||||||
| concrete_term(t::Term, d::ColumnTable, hint::AbstractTerm) = hint | ||||||
| concrete_term(t::Term, d::Tables.Columns, hint::AbstractTerm) = hint | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just
Suggested change
|
||||||
| concrete_term(t::Term, x, hint::AbstractTerm) = hint | ||||||
|
|
||||||
| # second possible fix for #97 | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT
TableOperations.dropmissingoperates row-wise (it calls filter). I'm afraid this is going to kill performance for data frames.Maybe an optimized method for column tables could be added? (EDIT: That's probably doable, as we can use a faster approach than
filtersince we know that the condition can be computed separately for each row.) Another solution would be to definedropmissingin DataAPI, say thatdropmissing(::Any)is owned by TableOperations, but havedropmissing(::DataFrame)be defined in DataFrames.Also,
narrowtypesis a much more costly operation that just doingnonmissingtype(eltype(col))as it requires going over all entries. DataFrames'sdropmissingdoes that by default, maybe TableOperations could take a similar argument.