Wednesday, September 19, 2012

Systematic Resampling

The idea is simple, but I don't know why I took so much of time to understand it. I am still not sure if I got it right. Anyway, let me explain what I understood.

Please refer to this page to understand the basics of systematic sampling. The starting point is chosen at random and then the other points are selected at regular intervals. 

Let us assume that we have population X[N] with N elements. The population has an associated weight vector W[N]. We need to sample Y[n] of n elements out of X using systematic sampling. The algorithm that I could understand from Section 8.6 of [1] is as follows:
  1. Compute the sum of all weights $ \displaystyle S = \sum_i^N W_i $
  2. Compute the selection interval I = S/n ; 
  3. To initiate the random selection process, select a uniform random number R in the open interval (0, I].
  4. The n samples are given as : R, R+I, R+2I, ... , R+(n-1)I

void systematic_resampling1(gsl_rng *r, double *dest, size_t m, double *src, size_t n, double *w)
{
  if(m > n)
  {
    cerr << "choose m <= n" << endl;
    exit(-1);
  }

  float sum_w = 0.0;
  for(size_t i = 0; i < n; ++i)
    sum_w += w[i];

  // selection interval
  float I = sum_w / m ;  

  //generate a random number between 0 and I
  float u = gsl_rng_uniform_pos(r) * I;

  size_t j = 0;
  while(j < m)
  {            
    float C = 0.0;
    for(size_t i = 0; i < n; ++i)
    {
      C = C + w[i];      

      if(u <= C)
      {
        dest[j] = src[i];
        j = j + 1;
        u = u + I;
      }
    }
  }
}

The other Algorithm [2], that seems to achieve the same is as follows:

void systematic_resampling2(gsl_rng *r, double *dest, size_t m, double *src, size_t n, double *w)
{
  if(m > n)
  {
    exit(-1);
  }
  float sum_w = 0.0;
  for(size_t i = 0; i < n; ++i)
    sum_w += w[i];

  //generate a random number between 0 and 1
  float u = gsl_rng_uniform_pos(r);

  int j = 0;
  float C = 0.0;
  for(size_t i = 0; i < n; ++i)
  {
    C = C + w[i]/sum_w;      
    if(C > 1.0)
      C = 1.0;

    while((u+j)/m <= C)
    {
      dest[j] = src[i];
      j = j + 1;
    }
  }
}



Reference:

[1]  Kirk M. Wolter, "Introduction to Variance Estimation", Statistics for Social and Behavioral Sciences series, Springer, 2nd Edition, 2007.

[2] Murray Pollock, "Introduction to Particle Filtering", Algorithms & Computationally Intensive Inference Reading Group Discussion Notes, 2010. 

Monday, September 17, 2012

Weighted Random Sampling With Replacement in C


The problem is to select k items with-replacement from a list of n items with a probability that depends on the weights associated with the items in the original list.

Input:  x[n], w[n]
Output: x[n]   (x is overwritten)

void wrswr(gsl_rng *r, std::vector &x, const std::vector &w)
{
  if(x.size() != w.size())
  {
    cerr << "w and x must have same length ..." << endl;
    exit(-1);
  }
  else
  {
    size_t N = (size_t)x.size();
    std::vector tempx(N);

    double sum_w = 0.0;
    for(size_t i = 0; i < N; ++i)
    {
      sum_w += w.at(i);
      tempx.at(i) = x.at(i);  // copy of source xk
    }

    for(size_t i = 0; i < N; ++i)
    {
      double u = gsl_rng_uniform(r);

      size_t j = 0;
      double c = w[0] / sum_w;
      while (u > c && j < N)
      {
        j = j + 1;
        c = c + w[j] / sum_w;
      }
      x.at(i) = tempx.at(j); 
    }

  }
}         


Weighted Random Sampling


You need to compile it using GSL library. I implemented the idea available on this post.  For a normal random sampling with replacement algorithm without weights is available in GSL itself. Refer to this page for details on "Shuffling and Sampling" with GSL. 
 
pre { margin: 5px 20px; border: 1px dashed #666; padding: 5px; background: #f8f8f8; white-space: pre-wrap; /* css-3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* Internet Explorer 5.5+ */ }