Saturday, 24 March 2012

Common mistakes which Researchers make

Here are some common mistakes which researchers often make during the study.

(1) Simpson's Paradox: Forgetting to use a weighting variable for combined assessment, which is irrelevant for the individual group assessment and the the direction of effect seem reversed when the groups are combined. Let's take a real life example from a medical study comparing the success rates of two treatments for kidney stones.
Treatment A
Treatment B
78% (273/350)
83% (289/350)

It seems that treatment A is more effective. If we include the data about kidney stone size, we get a different answer.

Treatment A
Treatment B
Small Stones
93% (81/87)
87% (234/270)
Large Stones
73% (192/263)
69% (55/80)
78% (273/350)
83% (289/350)

 The information about the stone size (a confounding variable) has reversed our conclusion about the effectiveness of the treatments. Now treatment A seems to be more effective in both the cases. The error committed here is forgetting to use the weighting variable (number of cases) in the combined assessment. The treatment B is beneficial in combined group because Small stones patients, which recover more often than Large stones patients (regardless of the treatment), are also the ones more likely to use the treatment B. The standard way of dealing with these confounders is to "hold them fixed." Here, if being a small stones patient is perceived to be the cause of both recovery and treatment usage, the effect of the treatment needs to be evaluated separately for two groups and then averaged accordingly. Assuming the stone size is the only confounding factor, first table represents the efficacy of treatments in respective groups and the last row of the second table represents merely its evidential weight in the absence of stone size information, and the paradox resolves.

(2) Accepting the Null Hypothesis: It is tempting to say that the Null hypothesis is accepted when test statistic is not significant. The fact is that the Null hypothesis can never be accepted, we only fail to reject it, or else it would establish the phenomenon of ZERO effect in the population.

(3) What does it mean to reject the Null hypothesis In experimental setup we actually calculate the probability that we would obtain the particular data given the Null hypothesis is true. We are not calculating the probability of Null being true given the data. For example if we test a Null hypothesis about difference of means of two population means and reject it at p=.05, we are saying that if the Null hypothesis were true, the probability of obtaining the difference between the means as great as we found is only 0.05. This is quite different from saying that the probability that Null is true is 0.05. We are actually dealing with conditional probability here. The probability we have here is probability of data given H0 is true (P(D|H0) and not the probability that H0 is true given the data (P(H0|D).

(4) Turning around to the the other side in case of 1-tail test: he decision about conducting 1-tail test or 2-tail test has to be taken before the experiment is set up and data is collected. We can not plan to run a 1-tail test and then if the data come out in the other way, just change the test to a 2-tail test. If we start the experiment with extreme 5% of the left hand tail as rejection region and then turn around and reject any outcome that happens to fall in the extreme 2.5% of both sides, we are actually working at 7.5% (5% left side + 2.5% right side). It is one of the reason the 2-tails tests are often selected.

(5) Failure to acknowledge "Know your data approach" : Any analysis must be preceded by three preliminary analysis. First is the exploratory analysis, which dictates the trends and also the type of analysis to be followed further. Second Missing value analysis which finds out the process behind missing values in data set, if any, and then rectifies the data set by processes like deletion, imputation etc. Third process is the outlier analysis, which identifies outliers and the reasons to be ascertained for those outliers. Outliers form the essential component of the data and sometime form THE data in the data set.

(5) Failure to apply Model Parsimony : What is the use of that analysis, if the model arrived is too complex to be understood and similar or better results can be arrived at by much simpler models.

(6) Failure to apportion interaction effect variance if  it is not found significant : If some interaction effect is not found significant then instead of consuming the degrees of freedom for this effect, the variance due to this can be apportioned into other main or other interaction effects. This may make other effects significant which were non-significant earlier.

 (7) Failure to adjust for co-variates :If effects of co-variates are not adjusted in the model, we may get derive erroneous conclusion of denoting the other effects (both main and interaction) as significant, which might have come significant because of variations in the co-variates solely.

 (8) Two groups designs are faulty in most of the cases : Two group designs (even the venerable pre-post design) are faulty designs in the sense that they do not tackle all the threats to internal validity especially the threat posed by interaction testing effect. Four group designs like Solomon design tackles these threats, but it is costly design.

 (9) Assumptions consigned to oblivion : The application of tests is highly sensitive to the dictated assumptions for the tests.

 (10) Failure to use transformations before applying the non-parametric tests: If the assumptions of the tests are not satisfied, transformations should be applied first so that the assumptions could be satisfied. As a last alternative non-parametric tests me be used.

 (11) Failure to use appropriate sample size: This will have drastic implications on the final interpretation and analysis. Non-significant result may look like significant and the vice versa.  The analysis done without considering sample size is sheer waste !

 (12) Failure to use appropriate correlation for correlation matrix in analyses like Factor Analysis etc.: This again will have drastic implications on the final interpretation and analysis as correlation matrix will have totally different values.

  (13) Failure to impregnate the design with powerful elements( Randomization, Replication, Blocking, Orthogonality and factorials etc.): This is one of the ways we can improve the validity of the experiment. again will have drastic implications on the final interpretation and analysis as correlation matrix will have totally different values. The general rule is "Block what you can, randomize what you cannot."

  (14) Failure to validate results : Objective is population and not sample: Usually experiment is conducted, data is collected and the results is interpreted for sample, we just forget about the population where our actual interest lies.

  (15) Failure to use WLS, MLE instead of OLS wherever needed (in case OLS doesn’t give BLUE): Usually Different estimation methods have their usage in different situations and appropriate care needs to be taken for this.

  (16) People simple take arbitrary α (.05 or .01) and simple ignore β: Usually α and β are related and can't be analysed independently. Statistical power is an important parameter to be considered.

   (17) Attenuation of ρ in Concurrent validation: When the established test has lower validity the reliability of new test, which is being validated with the established test, gets attenuated. Appropriate transformation needs to be done in this case to take care of the attenuation.

 (18) Failure to acknowledge that KP correlation uses simple correlation and Stepwise Regression uses Part Correlation: Using wrong kind of correlation may give wrong predictors as significant

(19) Two Group designs are faulty in most cases: The error due to interaction testing effect can't be taken care of in two group designs. Four group six study designs are better but expensive.

 (20) Failure to use IRT appropriately : Using the IRT model, which is not fitting on the data and then calculating the parameters is just waste of whole exercise.

 (21) Failure to recognize that Computer Adaptive Test is not same as Computer Administered Test / Computerized Tests : The tailored item selection in case of adaptive test can result in reduced standard errors and greater precision with only a handful of properly selected items.

 (22) Failure to incorporate Taguchi method designs (TQM) : The use of orthogonal arrays in fractional factorial designs for efficient handling of desired main and interaction effects in line with quadratic loss function is increasingly being encouraged.

 (23) Failure to incorporate response surface methods designs : This facilitates continuous feasible factor levels and irregular response surface optimization, which is not possible in traditional factorial designs.

No comments:

Post a comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Blogger Templates