Julia loops are as slow as R loops

The code below in Julia and R is to show that the estimator of the population variance is a biased estimator, that is it depends on the sample size and no matter how many times we average over different observations, for small number of data points it is not equal to the variance of the population.

It takes for Julia ~10 seconds to finish the two loops and R does it in ~7 seconds.
If I leave the code inside the loops commented then the loops in R and Julia take the same time and if I only sum the iterators by s = s + i+ j Julia finishes in ~0.15s and R in ~0.5s.

s = s + i+ j

Is it that Julia loops are slow or R became fast?
How can I improve the speed of the code below for Julia?
Can the R code become faster?

Julia:

using Plots trials = 100000 sample_size = 10; sd = Array{Float64}(trials,sample_size-1) tic() for i = 2:sample_size for j = 1:trials res = randn(i) sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res)) end end toc() sd2 = mean(sd,1) plot(sd2[1:end])

trials = 100000 sample_size = 10 sd = matrix(, nrow = trials, ncol = sample_size-1) start_time = Sys.time() for(i in 2:sample_size){ for(j in 1:trials){ res <- rnorm(n = i, mean = 0, sd = 1) sd[j,i-1] = (1/(i))*(sum(res*res))-(1/((i)*i))*(sum(res)*sum(res)) } } end_time = Sys.time() end_time - start_time sd2 = apply(sd,2,mean) plot(sqrt(sd2))

The plot in case anybody is curious!:
enter image description here

One way I could achieve much higher speed is to use parallel loop which is ver easy to implement in Julia:

using Plots trials = 100000 sample_size = 10; sd = SharedArray{Float64}(trials,sample_size-1) tic() @parallel for i = 2:sample_size for j = 1:trials res = randn(i) sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res)) end end toc() sd2 = mean(sd,1) plot(sd2[1:end])

in R I would do:

my.sd <- function(i) {;   res <- rnorm(n = i, mean = 0, sd = 1);   mean(res*res) - mean(res)^2; }; sd <- replicate(trials, sapply(2:sample_size, my.sd))

– jogo
Jun 29 at 9:42

R

my.sd <- function(i) {; res <- rnorm(n = i, mean = 0, sd = 1); mean(res*res) - mean(res)^2; }; sd <- replicate(trials, sapply(2:sample_size, my.sd))

After you wrap this in a function, you will see that almost the entire time is spent inside randn, it has nothing to do with the speed of loops in Julia vs R. You are also writing sum(res)*sum(res) instead of sum(res)^2, and sum(res.^2) instead of sum(abs2, res), which are both wasting resources. You can rewrite to this: sd[j, i-1] = sum(abs2, res) / i - (sum(res) / i)^2.
– DNF
Jun 29 at 12:36

randn

sum(res)*sum(res)

sum(res)^2

sum(res.^2)

sum(abs2, res)

sd[j, i-1] = sum(abs2, res) / i - (sum(res) / i)^2

@DNF, all this is correct but additionally in Julia also loops themselves when executed in global scope are slower than loops executed in a function.
– Bogumił Kamiński
Jun 29 at 13:15

1 Answer
1

Using global variables in Julia in general is slow and should give you speed comparable to R. You should wrap your code in a function to make it fast.

Here is a timing from my laptop (I cut out only the relevant part):

julia> function test() trials = 100000 sample_size = 10; sd = Array{Float64}(trials,sample_size-1) tic() for i = 2:sample_size for j = 1:trials res = randn(i) sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res)) end end toc() end test (generic function with 1 method) julia> test() elapsed time: 0.243233887 seconds 0.243233887

Additionally in Julia if you use randn! instead of randn you can speed it up even more as you avoid reallocation of res vector (I am not doing other optimizations to the code as this optimization is distinct to Julia in comparison to R; all other possible speedups in this code would help Julia and R in a similar way):

randn!

randn

res

julia> function test2() trials = 100000 sample_size = 10; sd = Array{Float64}(trials,sample_size-1) tic() for i = 2:sample_size res = zeros(i) for j = 1:trials randn!(res) sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res)) end end toc() end test2 (generic function with 1 method) julia> test2() elapsed time: 0.154881137 seconds 0.154881137

Finally it is better to use BenchmarkTools package to measure execution time in Julia. First tic and toc functions will be removed from Julia 0.7. Second - you mix compilation and execution time if you use them (when running test function twice you will see that the time is reduced on the second run as Julia does not spend time compiling functions).

BenchmarkTools

tic

toc

test

EDIT:

You can keep trials, sample_size and sd as global variables but then you should prefix them with const. Then it is enough to wrap a loop in a function like this:

trials

sample_size

sd

const

const trials = 100000; const sample_size = 10; const sd = Array{Float64}(trials,sample_size-1); function f() for i = 2:sample_size for j = 1:trials res = randn(i) sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res)) end end end tic() f() toc()

Now for @parallel:

@parallel

First, you should use @sync before @parallel to make sure all works correctly (i.e. that all workers have finished before you move to the next instruction). To see why this is needed run the following code on a system with more than one worker:

@sync

@parallel

sd = SharedArray{Float64}(10^6); @parallel for i = 1:2 if i < 2 sd[i] = 1 else for j in 2:10^6 sd[j] = 1 end end end minimum(sd) # most probably prints 0.0 sleep(1) minimum(sd) # most probably prints 1.0

while this

sd = SharedArray{Float64}(10^6); @sync @parallel for i = 1:2 if i < 2 sd[i] = 1 else for j in 2:10^6 sd[j] = 1 end end end minimum(sd) # always prints 1.0

Second, the speed improvement is due to @parallel macro not SharedArray. If you try your code on Julia with one worker it is also faster. The reason, in short, is that @parallel internally wraps your code inside a function. You can check it by using @macroexpand:

@parallel

SharedArray

@macroexpand

julia> @macroexpand @sync @parallel for i = 2:sample_size for j = 1:trials res = randn(i) sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res)) end end quote # task.jl, line 301: (Base.sync_begin)() # task.jl, line 302: #19#v = (Base.Distributed.pfor)(begin # distributedmacros.jl, line 172: function (#20#R, #21#lo::Base.Distributed.Int, #22#hi::Base.Distributed.Int) # distributedmacros.jl, line 173: for i = #20#R[#21#lo:#22#hi] # distributedmacros.jl, line 174: begin # REPL[22], line 2: for j = 1:trials # REPL[22], line 3: res = randn(i) # REPL[22], line 4: sd[j, i - 1] = (1 / i) * sum(res .^ 2) - (1 / (i * i)) * (sum(res) * sum(res)) end end end end end, 2:sample_size) # task.jl, line 303: (Base.sync_end)() # task.jl, line 304: #19#v end

Thank you for the answer. When I use a shared array to do parallel computing the code becomes fast too (0.1s). Does this have anything to do with global variable apart from the fact that I am using six cores? Is it possible to define global variables in a better way rather than put the code inside the function?
– MOON
Jun 29 at 9:46

Is there any references to explain why global variables make Julia faster? Are there any risks with using global variables?
– MOON
Jun 29 at 10:02

It is explained in the Julia Manual docs.julialang.org/en/latest/manual/performance-tips/…. For SharedArray itself does not help here @parallel helps. I will improve my answer to cover this.
– Bogumił Kamiński
Jun 29 at 12:34

SharedArray

@parallel

The start of the answer makes it sound like global variables make Julia faster, while the opposite is true. Perhaps clarify that. Also, if you move randn! outside the innermost loop, to work on a matrix, it can be multithreaded. Finally, the expression in the innermost loop is suboptimal (e.g. sum(res)*sum(res).)
– DNF
Jun 29 at 12:59

randn!

sum(res)*sum(res)

I have updated the answer - it is long now, as you have asked about many topics in one question. I hope all is clear.
– Bogumił Kamiński
Jun 29 at 13:42

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk