-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making types less confusing in the library... #96
Comments
Gordon - I appreciate your thoughtful comments. I come to Julia by way of Scheme and Clojure. I really like Julia, but I feel it needs laziness, TOC and other functional language elements. My perfect language would be a marriage of Clojure and Julia. (I would be happy to loose the immutability of Clojure and being locked into Java.) It appears Lazy has clearly been influenced by Clojure. I have been using the Lazy package to provide some of the Clojure functionality in Julia. It’s been my “go to” package for prototyping and playing around with new ideas. I only wish it could go into production. You have clearly given this some thought. Have your proceeded with any of these suggestions? Mike - Great work on this package! Any thoughts on implementing some of these changes? It would be amazing to have Lazy programs capable of Clojure and Haskell type functional performance. Also, I know nothing about how to implement TOC. Is that something that could be built into a Lazy, without having to build it into Base? Thanks |
From one (former) prairie-guy to anther (Saskatchewan), although we have prairies in common, my preferred functional programming languages are not the same as yours - my favourites are F# and Haskell. I have learned Lispish languages (actually more Scheme and Racket) and also Clojure, but my strong preference is for strongly typed languages and I (obviously) like "white space" syntax. As to syntax, I am willing to work with Julia, and recognize that the non-symmetrical "end"'s allow one to make "one-liners" that one can't do in "pure" white space syntax, and also see that some future version of Julia could actually use a white space offside rule to make the "end"'s optional... I don't think this Lazy library is derived in any way from the Clojure implementation, but rather from the general lazy principles that are a key part of the requirements of a truly functional language in being able to defer execution - the API is just that of any general functional implementation. As well, it seems you miss the point of "immutability" as it can be enforced in Clojure, in being able to control or restrict side effects, which is another key point of functional programming. As to strong typing, this "Lazy" library is a prime example where dynamically typed languages can go severely wrong, with the author seemingly never sure whether the library's purpose is to produce non-lazy or lazy lists with the error that what is produced can intermingle nodes of each. One of the reasons I like Julia is that it can produce very fast native generated code -WHEN ONE CAN GUIDE THE COMPILER INTO KNOWING AT COMPILE TIME THE TYPE OF ARGUMENTS, but the problem with this blatant use of dynamic typing is that it takes away that compiler ability in many use cases. When one adds to that the Julia compiler's limitation (at the time I wrote this issue, it may have been fixed since then) for the Julia compiler to know the full signature of procedures and closures used as parameters AS EXPRESSED IN MY SLOWNESS COMMENT ABOVE, a key requirement for true functional programming, I laid my Julia investigations aside and haven't returned to them. Given that this is fixed in newer versions of Julia, one could turn this into a useful library within the limitations as I expressed in my issues, and I guess I would be willing to work on it as I expressed when I wrote this issue up. However, this current library is a mess with intermingling of too many different things, and it would be hard to entirely fix it while preserving the current ABI. As well, lazyiness isn't entirely about lists and streams as many often think of it, but rather about deferring execution for any data structure, and to really be appropriate to its current name it should really support the use of laziness with other data structures. If Julia now supports full type signatures including parameters and return value for procedures and closures so that the same usual native code optimizations are applied as for other types, than Julia would become a reasonable performance language using functional paradigms, only somewhat slow due to the many small memory structure allocations/deallocations required with the LLVM memory management not optimized for such - most are not, with the exception being the Java virtual Machine which does exceptionally well considering that it wasn't designed to support a functional Java. DotNet is many times worse, as are the usual imperative language implementations for C/C++, etc. There are a few things that aren't given much attention in common non-functional languages such as Julia, as follows: 1) Algebraic Data Types (ADT's) and a clean way to do variant case matching, 2) currying of parameters where every function can be considered to be a closure of previous parameters with one additional parameter, and 3) Tail Call Optimization (TCO) (which is what I think you mean by TOC?). Julia has Union types which can be considered as ADT's and while the syntax to do matching based on these may not be as clean as for languages designed to be functional, it it usable, especially with a little library support as in macros, etc., currying is likely not considered and if not carefully optimized can have a performance cost, but again could be implemented when necessary using a library and the LLVM compiler may get smart enough to reduce/eliminate the cost, and TCO, which is a greater need for recursive expressions. If one were to accept the limited TCO of Clojure where tail recursive functions are "lifted" into internal loops rather than the full recursion of Haskell and F#, it isn't all that hard to do but better done by the "base" compiler rather than by a library such as this - it has been done for other fairly simple new languages such as the Elm functional "transpiler" to JavaScript. |
Hey @GordonBGood I'd be interested in working with you on your ideas for a new version (or versions?) of Lazy.jl, if you're still interested! |
Yes, I actually quite like the potential of the Julia language and (with
others) see one of the major problems is the lack of solid libraries for
varous purposes, with one of those lacks as seen here in this Lazy library
for purposes of writing lazy forms of code.
I'm sorry I haven't kept up lately, but one of the reasons I stopped
contributing to Julia libraries is the seeming lack of design forethought
on how to help the compiler properly determine dynamic procedural types at
compile time rather than run time in there was no way to specify that a
variable should be a procedure/function with expected types of parameter
and return types. The run time determination of such first class procedure
types can cost something like a factor of ten performance penalty. Perhaps
this has been addressed while I have been away, but if not it means that
Julia can never be an efficient functional language until it is.
Thus, even if we were to split the current Lazy library into multiple
modules and fix the necessary distinction between normal linked lists and
lazy lists, the code using this library would still be relatively slow
until the Julia compiler implementation fixes this.
Perhaps you know if Julia now provides a function type signature?
…On Wed, Dec 23, 2020, 12:48 ym-han ***@***.***> wrote:
Hey @GordonBGood <https://github.com/GordonBGood> I'd be interested in
working with you on your ideas for a new version (or versions?) of Lazy.jl,
if you're still interested in improving this library!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRTMTP7XZOG7HJSD66M3G3SWGABTANCNFSM4FUA2HQQ>
.
|
I don't know, but I could post it on the Julia discourse (or you could), and keep politely asking People Who Would Know About It till we get a response. I'm interested in developing this library further, because I was just exposed to functional programming (and streams!) in an intro CS course, and loved it. And streams are just super useful. Just let me know how you want to proceed with this. Again, I could post about it on the discourse for you if you want --- though I suspect that it'd be better if you did so, lest something go awry in translation. Edit: Also, some googling turned up the following two things: https://discourse.julialang.org/t/enforcing-function-signatures-by-both-argument-return-types/8174/3 and https://docs.julialang.org/en/v1/manual/types/#Type-Declarations |
I had already read the documentation link above but the discourse thread was new to me; however, the discussion is old and nothing seems to have been done as far as compiler support: the workarounds discussed won't solve the performance issues caused by runtime determination of Function parameters and return value types. I did further read the developer documentation on Function's in the closure section on how Closures are made to use nested functions to capture variables, but that is not that helpful other than to confirm that the compiler authors are aware of some sometime usefulness of closures. For use in functional laziness, we need to be able to "type hint" the compilers with a function type signature such as in Haskell and as for the implied signatures in F# as in "Param1type -> Param2type -> ... -> Returntype" or as what might conform more to the Julia idiom - "Function(Param1type, param2type, ...) :: Returntype". That said, and given that this syntax or something like it is necessary to make true functional type programming in Julia perform reasonably efficiently, there likely isn't a real priority to accomplish this for the following reasons:
primes :: () -> [Int] -- the following map performs the sieving by multiples of base primes
primes() = 2 : _Y ( (3:) . minusat 5 . cmpsts . map(\p -> [p*p, p*p+2*p..]) )
where
_Y g = g (_Y g) -- = g (g (g ( ... ))) non-sharing multistage fixpoint combinator
minusat k s@(c:cs) | k < c = k : minusat (k+2) s -- ~= ([k,k+2..] \\ s)
| otherwise = minusat (k+2) cs -- when null(s\\[k,k+2..])
cmpsts ((x:xs):t) = x : (merge xs . cmpsts . pairs) t -- tree-shaped folding big union
where -- ~= nub . sort . concat
merge a@(x:xs) b@(y:ys) = case compare x y of
LT -> x : merge xs b
EQ -> x : merge xs ys
GT -> y : merge a ys
pairs (xs:ys:t) = merge xs ys : pairs t
main :: IO ()
main = do
putStr "First 25 primes: "
putStrLn $ show $ take 25 $ primes()
putStr "Number of primes to a milion: "
putStrLn $ show $ length $ takeWhile ((>=) 1000000) $ primes() In F#, this is almost as concise even though it is does not natively have lazy lists, by using laziness to implement a Co-Inductive Stream (CIS) since full memoization is not required for this algorithm: type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness
let primesTreeFold() =
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
let pmltpls p = let adv = p + p
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec pairs (CIS(cs0, cs0tlf)) =
let (CIS(cs1, cs1tlf)) = cs0tlf() in CIS(cs0 ^^ cs1, fun() -> pairs (cs1tlf()))
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(c, fun() -> ctlf() ^^ (cmpsts << pairs << amstlf)())
let rec minusat n (CIS(c, ctlf) as cs) =
if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
else minusat (n + 2u) (ctlf())
let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))
printf "First 25 primes are: "
primesTreeFold() |> Seq.take 25 |> Seq.iter (printf "%d ")
printf "\nThe number of primes to a million is "
let start = System.DateTime.Now.Ticks
let answr = primesTreeFold() |> Seq.takeWhile (fun p -> p <= 1000000u) |> Seq.length
let elpsd = (System.DateTime.Now.Ticks - start) / 10000L
printfn "%d in %d milliseconds." answr elpsd In Julia, this F# code can be translated as follows: const Thunk = Function # can't define other than as a generalized Function
struct CIS{T}
head :: T
tail :: Thunk # produces the next CIS{T}
CIS{T}(head :: T, tail :: Thunk) where T = new(head, tail)
end
Base.eltype(::Type{CIS{T}}) where T = T
Base.IteratorSize(::Type{CIS{T}}) where T = Base.SizeUnknown()
function Base.iterate(C::CIS{T}, state = C) :: Tuple{T, CIS{T}} where T
state.head, state.tail()
end
function treefoldingprimes()::CIS{Int}
function merge(xs::CIS{Int}, ys::CIS{Int})::CIS{Int}
x = xs.head; y = ys.head
if x < y CIS{Int}(x, () -> merge(xs.tail(), ys))
elseif y < x CIS{Int}(y, () -> merge(xs, ys.tail()))
else CIS{Int}(x, () -> merge(xs.tail(), ys.tail())) end
end
function pmultiples(p::Int)::CIS{Int}
adv :: Int = p + p
next(c::Int)::CIS{Int} = CIS{Int}(c, () -> next(c + adv)); next(p * p)
end
function allmultiples(ps::CIS{Int})::CIS{CIS{Int}}
CIS{CIS{Int}}(pmultiples(ps.head), () -> allmultiples(ps.tail()))
end
function pairs(css :: CIS{CIS{Int}})::CIS{CIS{Int}}
nextcss = css.tail()
CIS{CIS{Int}}(merge(css.head, nextcss.head), ()->pairs(nextcss.tail()))
end
function composites(css :: CIS{CIS{Int}})::CIS{Int}
CIS{Int}(css.head.head, ()-> merge(css.head.tail(),
css.tail() |> pairs |> composites))
end
function minusat(n::Int, cs::CIS{Int})::CIS{Int}
if n < cs.head CIS{Int}(n, () -> minusat(n + 2, cs))
else minusat(n + 2, cs.tail()) end
end
oddprimes()::CIS{Int} = CIS{Int}(3, () -> minusat(5, oddprimes()
|> allmultiples |> composites))
CIS{Int}(2, () -> oddprimes())
end
@time let count = 0; for p in treefoldingprimes() p > 1000000 && break; count += 1 end; println(count) end As can be seen, no other language is as concise as Haskell for this algorithm, nor as fast as Haskell takes only about a hundred milliseconds for sieving to a million where the others are about five times slower each, but the main point is that this is quite a trivial range of primes and the algorithm is only useful to a range in the order of ten million in an execution time of seconds no matter what is done, whereas an optimized page-segmented array based version can sieve ranges of a hundred times larger (billions to a hundred billion) in that amount of time, although at the cost of about ten to a hundred times as many lines of code. Note that the Julia (and F#) implementation avoids the use of closures as for a map function else the Julia implementation would be slower due to this limitation in determining type at runtime. b) The functional determination of the sequence of Hamming (Smooth 5) numbers: in a similar way, the Haskell expression of the non-duplicating (non-Djikstra) functional algorithm is as follows: hamming :: () -> [Integer]
hamming() = 1 : foldr u [] [2,3,5] where
u n s = r where
r = merge s (map (n*) (1:r)) where
merge [] b = b
merge a@(x:xs) b@(y:ys) | x < y = x : merge xs b
| otherwise = y : merge a ys
main :: IO ()
main = do
print $ take 20 (hamming ())
print $ (hamming ()) !! 1690
print $ (hamming ()) !! (1000000-1) Using an equivalent to this Lazy library in an implementation of lazy lists (as the memoization is required for this algorithm) we can convert this algorithm to F# or Julia with neither nearly as concise as this Haskell version and with about the same ratio of slowness of performance as for the above primes examples. However, this algorithm can be written to use growable arrays and array value reduction when the sequence proceeds to where part of the array is no longer used as well as sorting by logarithmic representation to make this almost a hundred times faster, with even faster algorithms even better suited (by a factor of hundreds more) for some applications meaning that this algorithm is mostly suitable only for "toy" benchmark use. A Julia implementation of the "growable/reducable" array algorithm is as follows: function trival((twos, threes, fives))
BigInt(2)^twos * BigInt(3)^threes * BigInt(5)^fives
end
mutable struct Hammings
ham532 :: Vector{Tuple{Float64,Tuple{Int,Int,Int}}}
ham53 :: Vector{Tuple{Float64,Tuple{Int,Int,Int}}}
ndx532 :: Int
ndx53 :: Int
next2 :: Tuple{Float64,Tuple{Int,Int,Int}}
next3 :: Tuple{Float64,Tuple{Int,Int,Int}}
next5 :: Tuple{Float64,Tuple{Int,Int,Int}}
next53 :: Tuple{Float64,Tuple{Int,Int,Int}}
Hammings() = new(
Vector{Tuple{Float64,Tuple{Int,Int,Int}}}(),
Vector{Tuple{Float64,Tuple{Int,Int,Int}}}(),
1, 1,
(1.0, (1, 0, 0)), (log(2, 3), (0, 1, 0)),
(log(2, 5), (0, 0, 1)), (0.0, (0, 0, 0))
)
end
Base.eltype(::Type{Hammings}) = Tuple{Int,Int,Int}
function Base.iterate(HM::Hammings, st = HM) # :: Union{Nothing,Tuple{Tuple{Float64,Tuple{Int,Int,Int}},Hammings}}
log2of2, log2of3, log2of5 = 1.0, log(2,3), log(2,5)
if st.next2[1] < st.next53[1]
push!(st.ham532, st.next2); st.ndx532 += 1
last, (twos, threes, fives) = st.ham532[st.ndx532]
st.next2 = (log2of2 + last, (twos + 1, threes, fives))
else
push!(st.ham532, st.next53)
if st.next3[1] < st.next5[1]
st.next53 = st.next3; push!(st.ham53, st.next3)
last, (_, threes, fives) = st.ham53[st.ndx53]; st.ndx53 += 1
st.next3 = (log2of3 + last, (0, threes + 1, fives))
else
st.next53 = st.next5; push!(st.ham53, st.next5)
last, (_, _, fives) = st.next5
st.next5 = (log2of5 + last, (0, 0, fives + 1))
end
end
len53 = length(st.ham53)
if st.ndx53 > (len53 >>> 1)
nlen53 = len53 - st.ndx53 + 1
copyto!(st.ham53, 1, st.ham53, st.ndx53, nlen53)
resize!(st.ham53, nlen53); st.ndx53 = 1
end
len532 = length(st.ham532)
if st.ndx532 > (len532 >>> 1)
nlen532 = len532 - st.ndx532 + 1
copyto!(st.ham532, 1, st.ham532, st.ndx532, nlen532)
resize!(st.ham532, nlen532); st.ndx532 = 1
end
_, tri = st.ham532[end]
tri, st
# convert(Union{Nothing,Tuple{Tuple{Float64,Tuple{Int,Int,Int}},Hammings}},(st.ham532[end], st))
# (length(st.ham532), length(st.ham53)), st
end
foreach(x -> print(trival(x)," "), (Iterators.take(Hammings(), 20))); println()
let count = 1691; for t in Hammings() count <= 1 && (println(trival(t)); break); count -= 1 end end
let count = 1000000; for t in Hammings() count <= 1 && (println(trival(t)); break); count -= 1 end end As can be seen, pure list based functional algorithms very often aren't the best choice for many production code algorithms as other than for the advantage of being very concise, they aren't the most efficient for "industrial strength" use. For his reason, I haven't spend to time to better implement this library, and I suspect there isn't a lot of incentive for the Julia compiler development team to do the changes to add proper Function type signatures. However, I can advise if someone wants to undertake this work. The example files are taken from the RosettaCode website for the most part as to Sieve of Eratosthenes and Hamming Number Sequence examples, but for the most part I am the contributor to that website for these examples. |
Essentially, this
Lazy
library isn't just aLazyList
library but both aLazyList
library and a (non-lazy)LinkedList
library and as both theLazyList
andLinkedList
types are subtypes of the abstract typeList
, one can create a non-memoized non-lazyLinkedList
that contains memoized lazyLazyList
tails and a memoized lazyLazyList
that contains non-memoized tails and any combination thereof. Thus, one could generate a trail ofList
elements which contain any combination of successiveLazyList
orLinkedList
elements with either terminated by anEmptyList
. This is conceptually and performance-wise wrong, as the run-time must continuously check which type a given instance is.As mentioned in a previous issue, this library is too "rich" in offering too many facilities in one: This is a further case where it should really be separated into at least a further two separate libraries, for performance reasons if nothing else. Yes, that requires specialized functions for each type, but that is what is provided by the particular library, and isn't a concern of the user. If one needs the memoizing of the
LazyList
, then one uses that library, if one doesn't need laziness at all then one can use theLinkedList
library, and I would add a thirdStream
type library which supports laziness without memoization when that is adequate and preferable for performance reasons.Also the
@lazy
macro isn't really properly named in that really generates aLazyList
so why not call it@lazylist
? Then there would be@linkedlist
(or just@list
) and@stream
macros for the other two cases.For instance, the
LazyList{T}
type would be defined as follows:Other than the ambiguity of the name of the library, we could define all three data types of
LazyList
as above,Stream
, andLinkedList
in the same file and handle the differences with specialization in the methods with a common abstract type,AbstractList
much like now, but the difference would be that there would be no possibility of intermingling type in the same list.Of course, all of the internals would be hidden behind macros and function calls as now, with only the actual three types and the functions and macros exposed.
Now I realize that this breaks just about everything about this library as far as back compatibility, so perhaps we should just start again? If not, other than name discrepancies in names of the library, macros, and functions, it could be made to work. This is one big PR, but I can handle it if you agree how we should proceed.
However, testing the above shows a real problem with this code: it's incredibly slow!!! For the simplest of
LazyList
's defined as the infinite series of natural numbers from one,numbers()
, it takes about four billion CPU clock cycles to count to 100,000 or 40,000 clock cycles per iteration. There are a million allocations or 4000 clock cycles per allocation, but I've got other applications that allocate the same amount of memory per allocation ten times as fast. I've checked for type warnings and the native code and don't see any problems other than I think that the problem may be just the raw number of stacked function calls to program "functionally", and if this is the case and can't be optimized away, and I haven't let any unnecessary type ambiguity slip through, this speed makes Julia useless for function programming for models of any size.As well, Julia isn't very fast at allocating many small objects on the heap as is required for functional programming: a million allocations at about 25 bytes each shouldn't take nearly this long. Haskell can allocate these sizes of object 100's of millions of times, or about a 100 times faster; Julia isn't top class as far all allocations, but it isn't the slowest either as it compares about equivalently with Microsoft's DotNet (F#) for this, so that isn't the major bottleneck here.
I believe that the main cause of the slowness in using Julia for functional programming is that
Function
doesn't have a type signature causing some type ambiguity when using anonymous functions to implement deferred execution, which is likely made worse when the functions are used as closures and need to capture free variables. This whole structure needs to be redesigned to overcome some of the problems; this problem of no known type means that the run time needs to run generic checking of arguments and return values until something is produced that can be assigned to a known or inferred type.Unless the speed problems can be found and fixed, there isn't a lot of point improving this library, as currently it can only be used as a "toy" package.
The text was updated successfully, but these errors were encountered: