If you are determined to loop through your array only once, the running sums can be combined. Sum_x2 = **2 for v in x) for i in range(d) ] Then the standard deviation is: d = len(x) If that's not an option and you need a pure Python solution, keep reading. My preference would be to use the numpy array maths extension to convert your array of arrays into a numpy 2D array and get the standard deviation directly: > x =, ] * 10 How big is your array? Unless it is zillions of elements long, don't worry about looping through it twice. I guess one of these three must be right -) However it maybe PRMS (which Sinan's Statistics::Descriptive example show) or RMS (which ars's NumPy example shows). This seems to suggest that ADEV is the "standard deviation". Have a look at PDL::Primitive for more information on the statsover function. My ( $mean, $prms, $median, $min, $max, $adev, $rms ) = statsover( $figs ) This is the Perl Data Language which is designed for high precision mathematics and scientific computing. Have a look at PDL (pronounced "piddle!"). Printf "Running stdev: %f\n", $stat->standard_deviation Printf "Running mean: %f\n", $stat->mean # You also have the option of using sparse data structures My $stat = Statistics::Descriptive::Full->new Statistics::Descriptive is a very decent Perl module for these types of calculations: #!/usr/bin/perl The benefit of this is numerically stable and accurate results.ĭisclaimer: I am the author the Python runstats module. Statistics summaries are based on the Knuth and Welford method for computing standard deviation in one pass as described in the Art of Computer Programming, Vol 2, p. Print 'Index', index, 'standard deviation:', stat.stddev() We can use this to create your "running" version.
Runstats summaries can produce the mean, variance, standard deviation, skewness, and kurtosis in a single pass of data.
Install runstats from PyPI: pip install runstats The Python runstats Module is for just this sort of thing. # īy the way, there's some interesting discussion in this blog post and comments on one-pass methods for computing means and variances: If you use a numpy array, it will do the work for you, efficiently: from numpy import array Return self.new_s / (self.n - 1) if self.n > 1 else 0.0 Self.new_s = self.old_s + (x - self.old_m) * (x - self.new_m) Self.new_m = self.old_m + (x - self.old_m) / self.n Here is a literal pure Python translation of the Welford's algorithm implementation from : Go to the external references in other answers (Wikipedia, etc) for more information. You may need to worry about the numerical stability of taking the difference between two large numbers if you are dealing with large samples. This is the sample standard deviation you get the population standard deviation using 'n' instead of 'n - 1' as the divisor. The value of the standard deviation is then: stdev = sqrt((sum_x2 / n) - (mean * mean)) The basic answer is to accumulate the sum of both x (call it 'sum_x1') and x 2 (call it 'sum_x2') as you go.
You can also take a look at my Java implement the javadoc, source, and unit tests are all online:
I wrote two blog entries on the topic which go into more details, including how to delete previous values online: Dividing by N-1 leads to an unbiased estimate of variance from the sample, whereas dividing by N on average underestimates variance (because it doesn't take into account the variance between the sample mean and the true mean). You might also want to brush up on the difference between dividing by the number of samples (N) and N-1 in the variance calculation (squared deviation). The stability only really matters when you have lots of values that are close to each other as they lead to what is known as " catastrophic cancellation" in the floating point literature. It's more numerically stable than either the two-pass or online simple sum of squares collectors suggested in other responses.