4/28/2011

The influence of "discreteness" of a binomial distribution on the true confidence interval

This is a homework problem but I think it is worthwhile to explore more on this.

Basically, I am trying to compare the coverage of  a confidence interval based on score test for a binomial distribution with different sample sizes.

Here, assume the true probability is 0.01 and plan to use a 95% score confidence interval to estimate
the true rate.

As we can see from the figure above, apparently, small sample sizes result in poor coverage probability. after around 200, as sample sizes increase, the coverage probabilities wiggle around the 95% line. I trying to figure out whether this probability wiggles when we have large sample size.

Code for getting this figure:

n.pool <- c(10,50,100,150,200,250,500,750,1000,1500,2000,10000)# sample size
N <- 400 # Simulation size
m <- 100 # run each loop for m times
pi.0 <- 0.01
result <- matrix(rep(NA,length(n.pool)*m),nrow = m)

for(i in 1:m)
{
 count <- 1 # define a counter
 for(n in n.pool)
 {
  y <- rbinom(N,n,pi.0)
  ci.score <- sapply(y,function(x) (prop.test(x,n,pi.0,correct = FALSE)$conf.int),simplify = TRUE)
  ci.capture <- ci.score[1,]<= pi.0 & ci.score[2,]>= pi.0 # ci.capture is TRUE if the C.I. of score test capture the true pi.0
  score.count <- table(ci.capture)
  ci.coverage <- score.count[2]/N # proportion of TRUE
  result[i,count] <- ci.coverage
  count <- count + 1
 }
}
result <- as.data.frame(result)
colnames(result) <- as.character(n.pool)
bplot <- boxplot(result,xlab='Sample Size',ylab = "Coverage Probability")

No comments: