Picture
Sampling Algorithm
History of sampling 
 Random sampling by using lots is an old idea, mentioned several times in the Bible. In 1786 Pierre Simon Laplace estimated the population of France by using a sample, along with ratio estimator. He also computed probabilistic estimates of the error. These were not expressed as modern confidence intervals but as the sample size that would be needed to achieve a particular upper bound on the sampling error with probability 1000/1001. His estimates used Bayes' theorem with a uniform prior probability and assumed that his sample was random.

In the USA the 1936 Literary Digest prediction of a Republican win in the presidential election went badly awry, due to severe bias . More than two million people responded to the study with their names obtained through magazine subscription lists and telephone directories. It was not appreciated that these lists were heavily biased towards Republicans and the resulting sample, though very large, was deeply flawed. 

Sampling (statistics)

In statistics and survey methodologysampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population.

Researchers rarely survey the entire population because the cost of a census is too high. The three main advantages of sampling are that the cost is lower, data collection is faster, and since the data set is smaller it is possible to ensure homogeneity and to improve the accuracy and quality of the data.

Each observation measures one or more properties (such as weight, location, color) of observable bodies distinguished as independent objects or individuals. In survey sampling, weights can be applied to the data to adjust for the sample design, particularly stratified sampling (blocking). Results from probability theory and statistical theory are employed to guide practice. In business and medical research, sampling is widely used for gathering information about a population.

  Process The sampling process comprises several stages:
  •  Defining the population of concern
  •  Specifying a sampling frame, a set of items or events possible to measure
  •  Specifying a sampling method for selecting items or events from the frame
  •  Determining the sample size
  •  Implementing the sampling plan
  •  Sampling and data collecting
Sampling methods :
 Within any of the types of frame identified above, a variety of sampling methods can be employed, individually or in combination. Factors commonly influencing the choice between these designs include:

1.      Nature and quality of the frame

2.       Availability of auxiliary information about units on the frame

3.       Accuracy requirements, and the need to measure accuracy

4.       Whether detailed analysis of the sample is expected

5.     Cost/operational concerns
  •   Simple random sampling 
  •   Systematic sampling 
  •   Stratified sampling 
  •   Poststratification
  •   Oversampling
  •   Probability proportional to size sampling 
  •   Cluster sampling 
  •   Quota sampling 
  •   convenience sampling or Accidental Sampling 
  •    Line-intercept sampling   
Sampling and data collection :
Good data collection involves
  •   Following the defined sampling process
  •   Keeping the data in time order
  •   Noting comments and other contextual events
  •   Recording non-responses
Most sampling books and papers written by non-statisticians focus only in the data collection aspect, which is just a small though important part of the sampling process.

Errors in sample surveys 
 Survey results are typically subject to some error. Total errors can be classified into sampling errors and non-sampling errors. The term "error" here includes systematic biases as well as random errors.

Sampling errors and biases Sampling errors and biases are induced by the sample design. They include:

1.    Selection bias: When the true selection probabilities differ from those assumed in calculating the results.

2.    Random sampling error: Random variation in the results due to the elements in the sample being selected at random.

Non-sampling error 
 Non-sampling errors are caused by other problems in data collection and processing. They include:

1.    Overcoverage: Inclusion of data from outside of the population.

2.    Undercoverage: Sampling frame does not include elements in the population.

3.    Measurement error: E.g. when respondents misunderstand a question, or find it difficult to answer.

4.    Processing error: Mistakes in data coding.

5.    Non-response: Failure to obtain complete data from all selected individuals.

After sampling, a review should be held of the exact process followed in sampling, rather than that intended, in order to study any effects that any divergences might have on subsequent analysis. A particular problem is that of non-response.

Sources:-Wikipedia & NSSO.

9/27/2012 06:38:43 am

will be restored soon

Reply



Leave a Reply.