Not at all, even though there is a vast body of litterature explaining why some statistical protocols related to drug acceptance tests are inadequate (eg overreliance on parametric tests, like Student or similar statistics, which discard the possibility of "large outliers" in the underlying distribution), these drug test protocols are relatively reliable. First because, as you mentioned the samples are larger and random. Splitting them into subgroups via a controlled experience plan is part of the statistical theory, it actually improves, not decrease, the reliability of the results (in a nutshell, it incorporates external knowledge, eg placebos are placebos, into the test protocol). Besides, the aim of medical treatment by drugs is to save lives, so there probably is some kind of risk-benefit trade-off at work here.
Outliers are for the quantitative measurements, not a simple measure like allells, or microsattelites. Allowances for the correspondence are usually too broad for any biological random events. Nature of these studies also require the presence or absence of some patterns. Anyway, ask me any questions if you don't understand these, I can explain each in details.
Regarding drug test, you are not surely increasing power by splitting them, as you can clearly see by increasing the degree of freedoms. (I can show you how it works, but please post another thread to discuss)
But there is a very big difference between a controlled experience plan of 10 000 persons, and an uncontrolled surveys on a few hundred samples, split into groups of 10 or 50. The problem is twofold. First, the sampling is probably not random, nor controlled, which means that statistical laws hold less well. Second, the samples are below the minimal limits for basic theorems of probability to hold (ie the various forms of the law of large number, convergence of empirical means to expectations, uniform convergence of frequencies to distributions, or convergence of the underlying residuals to a normal law).
Weak convergence is sufficient, and as I told you, normality assumption does not need to hold because of the discrete nature of nucleotides. Do you think we need an almost sure sense of random variable? no, we don't even need an estimate of the powers.
Random sampling is not that considered important, as it means less in genome studies. Human genome won't mutate in the same generation, and mutation rate is extremely low for Y-chromosomes. That's why we can even "infer" that human is out of africa, because of the constant behaviours across the samples within the same group.
Not necessarily. You assumption is at the moment on the laws of random variables which you can model. Bootstrapping is just enough in most cases.The sample you presented is not very small in itself, but its splitting into very small groups (of less than 100 persons, and sometimes less than 10), sort of prevent any proper statistics to be done.
My impression is that such isolated cases are good for inducing general hypotheses, which is what the papers you quote probably do. But in order for them to become proof, or evidence, larger samples, and statistical tests are needed.
In all fairness, there are a few theories on induction from small samples (you could search for Vapnik as a starter), but samples are often much larger than a few units, and they are quite complex to design and use (this branch of statistics is in infancy too)
Francois
You are making assumptions again. You seem to assume that I understand little about the theory which might be true, as has been a while I studied last time, but in reality, all these college stuffs are useless, and mostly statisitcs remains arts rather than a complete beautiful mathematics.
That's why scientist call "scientific evidences". But these are usually good enough estimates, that you might not be aware of.
Edited by color red, 23 October 2006 - 06:16 AM.











