Julia loops are as slow as R loops
Julia loops are as slow as R loops
The code below in Julia and R is to show that the estimator of the population variance is a biased estimator, that is it depends on the sample size and no matter how many times we average over different observations, for small number of data points it is not equal to the variance of the population.
It takes for Julia ~10 seconds to finish the two loops and R does it in ~7 seconds.
If I leave the code inside the loops commented then the loops in R and Julia take the same time and if I only sum the iterators by s = s + i+ j
Julia finishes in ~0.15s and R in ~0.5s.
s = s + i+ j
Is it that Julia loops are slow or R became fast?
How can I improve the speed of the code below for Julia?
Can the R code become faster?
Julia:
using Plots
trials = 100000
sample_size = 10;
sd = Array{Float64}(trials,sample_size-1)
tic()
for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
sd2 = mean(sd,1)
plot(sd2[1:end])
R:
trials = 100000
sample_size = 10
sd = matrix(, nrow = trials, ncol = sample_size-1)
start_time = Sys.time()
for(i in 2:sample_size){
for(j in 1:trials){
res <- rnorm(n = i, mean = 0, sd = 1)
sd[j,i-1] = (1/(i))*(sum(res*res))-(1/((i)*i))*(sum(res)*sum(res))
}
}
end_time = Sys.time()
end_time - start_time
sd2 = apply(sd,2,mean)
plot(sqrt(sd2))
The plot in case anybody is curious!:
One way I could achieve much higher speed is to use parallel loop which is ver easy to implement in Julia:
using Plots
trials = 100000
sample_size = 10;
sd = SharedArray{Float64}(trials,sample_size-1)
tic()
@parallel for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
sd2 = mean(sd,1)
plot(sd2[1:end])
R
my.sd <- function(i) {; res <- rnorm(n = i, mean = 0, sd = 1); mean(res*res) - mean(res)^2; }; sd <- replicate(trials, sapply(2:sample_size, my.sd))
After you wrap this in a function, you will see that almost the entire time is spent inside
randn
, it has nothing to do with the speed of loops in Julia vs R. You are also writing sum(res)*sum(res)
instead of sum(res)^2
, and sum(res.^2)
instead of sum(abs2, res)
, which are both wasting resources. You can rewrite to this: sd[j, i-1] = sum(abs2, res) / i - (sum(res) / i)^2
.– DNF
Jun 29 at 12:36
randn
sum(res)*sum(res)
sum(res)^2
sum(res.^2)
sum(abs2, res)
sd[j, i-1] = sum(abs2, res) / i - (sum(res) / i)^2
@DNF, all this is correct but additionally in Julia also loops themselves when executed in global scope are slower than loops executed in a function.
– Bogumił Kamiński
Jun 29 at 13:15
1 Answer
1
Using global variables in Julia in general is slow and should give you speed comparable to R. You should wrap your code in a function to make it fast.
Here is a timing from my laptop (I cut out only the relevant part):
julia> function test()
trials = 100000
sample_size = 10;
sd = Array{Float64}(trials,sample_size-1)
tic()
for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
end
test (generic function with 1 method)
julia> test()
elapsed time: 0.243233887 seconds
0.243233887
Additionally in Julia if you use randn!
instead of randn
you can speed it up even more as you avoid reallocation of res
vector (I am not doing other optimizations to the code as this optimization is distinct to Julia in comparison to R; all other possible speedups in this code would help Julia and R in a similar way):
randn!
randn
res
julia> function test2()
trials = 100000
sample_size = 10;
sd = Array{Float64}(trials,sample_size-1)
tic()
for i = 2:sample_size
res = zeros(i)
for j = 1:trials
randn!(res)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
end
test2 (generic function with 1 method)
julia> test2()
elapsed time: 0.154881137 seconds
0.154881137
Finally it is better to use BenchmarkTools
package to measure execution time in Julia. First tic
and toc
functions will be removed from Julia 0.7. Second - you mix compilation and execution time if you use them (when running test
function twice you will see that the time is reduced on the second run as Julia does not spend time compiling functions).
BenchmarkTools
tic
toc
test
EDIT:
You can keep trials
, sample_size
and sd
as global variables but then you should prefix them with const
. Then it is enough to wrap a loop in a function like this:
trials
sample_size
sd
const
const trials = 100000;
const sample_size = 10;
const sd = Array{Float64}(trials,sample_size-1);
function f()
for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
end
tic()
f()
toc()
Now for @parallel
:
@parallel
First, you should use @sync
before @parallel
to make sure all works correctly (i.e. that all workers have finished before you move to the next instruction). To see why this is needed run the following code on a system with more than one worker:
@sync
@parallel
sd = SharedArray{Float64}(10^6);
@parallel for i = 1:2
if i < 2
sd[i] = 1
else
for j in 2:10^6
sd[j] = 1
end
end
end
minimum(sd) # most probably prints 0.0
sleep(1)
minimum(sd) # most probably prints 1.0
while this
sd = SharedArray{Float64}(10^6);
@sync @parallel for i = 1:2
if i < 2
sd[i] = 1
else
for j in 2:10^6
sd[j] = 1
end
end
end
minimum(sd) # always prints 1.0
Second, the speed improvement is due to @parallel
macro not SharedArray
. If you try your code on Julia with one worker it is also faster. The reason, in short, is that @parallel internally wraps your code inside a function. You can check it by using @macroexpand
:
@parallel
SharedArray
@macroexpand
julia> @macroexpand @sync @parallel for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
quote # task.jl, line 301:
(Base.sync_begin)() # task.jl, line 302:
#19#v = (Base.Distributed.pfor)(begin # distributedmacros.jl, line 172:
function (#20#R, #21#lo::Base.Distributed.Int, #22#hi::Base.Distributed.Int) # distributedmacros.jl, line 173:
for i = #20#R[#21#lo:#22#hi] # distributedmacros.jl, line 174:
begin # REPL[22], line 2:
for j = 1:trials # REPL[22], line 3:
res = randn(i) # REPL[22], line 4:
sd[j, i - 1] = (1 / i) * sum(res .^ 2) - (1 / (i * i)) * (sum(res) * sum(res))
end
end
end
end
end, 2:sample_size) # task.jl, line 303:
(Base.sync_end)() # task.jl, line 304:
#19#v
end
Thank you for the answer. When I use a shared array to do parallel computing the code becomes fast too (0.1s). Does this have anything to do with global variable apart from the fact that I am using six cores? Is it possible to define global variables in a better way rather than put the code inside the function?
– MOON
Jun 29 at 9:46
Is there any references to explain why global variables make Julia faster? Are there any risks with using global variables?
– MOON
Jun 29 at 10:02
It is explained in the Julia Manual docs.julialang.org/en/latest/manual/performance-tips/…. For
SharedArray
itself does not help here @parallel
helps. I will improve my answer to cover this.– Bogumił Kamiński
Jun 29 at 12:34
SharedArray
@parallel
The start of the answer makes it sound like global variables make Julia faster, while the opposite is true. Perhaps clarify that. Also, if you move
randn!
outside the innermost loop, to work on a matrix, it can be multithreaded. Finally, the expression in the innermost loop is suboptimal (e.g. sum(res)*sum(res)
.)– DNF
Jun 29 at 12:59
randn!
sum(res)*sum(res)
I have updated the answer - it is long now, as you have asked about many topics in one question. I hope all is clear.
– Bogumił Kamiński
Jun 29 at 13:42
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
in
R
I would do:my.sd <- function(i) {; res <- rnorm(n = i, mean = 0, sd = 1); mean(res*res) - mean(res)^2; }; sd <- replicate(trials, sapply(2:sample_size, my.sd))
– jogo
Jun 29 at 9:42