Many clinical trials compare two or more treatment groups by using a binary outcome measure. For example, the goal could be to determine whether the frequency of pain episodes is significantly reduced in the treatment group (arm A) as compared to the control group (arm B). However, for ethical or regulatory reasons, group sequential designs are commonly employed. Then, based on a binomial distribution, the stopping boundaries for the interim analyses are constructed for assessing the difference in the response probabilities between the two groups. This is easily accomplished by using any of the standard procedures, e.g., those discussed by Jennison and Turnbull (2000), and using one of the most commonly used software packages, East (2000). Several factors are known to often affect the primary outcome of interest, but their true distributions are not known in advance. In addition, these factors may cause heterogeneous treatment responses among individuals in a group, and their exact effect size may be unknown. To limit the effect of such factors on the comparison of the two arms, stratified randomization is used in the actual conduct of the trial. Then, a stratified analysis based on the odds ratio proposed in Jennison and Turnbull (2000, pages 251-252) and consistent with the stratified design is undertaken. However, the stopping rules used for the interim analyses are those obtained for determining the differences in response rates in a design that was not stratified. The purpose of this paper is to assess the robustness of such an approach on the performance of the odds ratio test when the underlying distribution and effect size of the factors that influence the outcome may vary. The simulation studies indicate that, in general, the stratified approach offers consistently better results than does the unstratified approach, as long as the difference in the weighted average of the response probabilities across strata between the two groups remains closer to the hypothesized values, irrespective of the differences in the (allocation) distributions and heterogeneous response rate. However, if the response probabilities deviate significantly from the hypothesized values so that the difference in the weighted average is less than the hypothesized value, then the proposed study could be significantly underpowered.