Pitfalls of Observational Studies

Clever Hans and the observer effect

After showing two groups of schoolteachers a videotape of an eight-year-old boy, psychologists John Santrock and Russel Tracy found that the teachers’ judgment of the child ultimately depended on whether they had been told the child came from a divorced home or an intact home.  The child was rated as less well-adjusted if the teachers thought he came from a home where the parents were divorced.[1] This finding might seem inconsequential to the field of architecture, but for a profession that often relies on observational studies to evaluate a design’s effect on its users I argue that Santrock and Tracy’s study is one among many architects need to pay attention to.

An observational study*, like post-occupancy surveys, is a common method architects use to evaluate a design’s effect on its users. If done well observational studies can provide a wealth of valuable and reliable information. They do, however, have their pitfalls, most notably controlling for cognitive and selection biases. At the risk of limiting readership, I will illustrate these challenges by reviewing a specific observational study dealing with design. Although specific, the following example wrestles with the same difficulties that other observational studies in architecture wrestle with.

In 2008 Professor Magda Mostafa published a study that examined the effects of two architectural interventions (spatial sequencing and acoustics) on a group of children with autism. The observational phase of the study involved one control group and one study group. The duration of the study lasted one academic year.[2] For spatial sequencing, Mostafa partitioned the study group’s classroom into different learning areas; one-on-one learning spaces, a group learning space, and one “escape” space. The acoustical “intervention involved the acoustical modification of a speech and language therapy room.”[3] The control group’s classroom and speech therapy room remained unchanged. Those are the basics.[4]

As I have previously written, Mostafa’s study is an important study for those researching .[5] The study does however have some peculiar results. For example, the control group did not improve at all, if anything the children in the control group got worse. Are we meant to believe that the children attending this autism school have been regressing or not improving at all prior to the architectural interventions? Assuming Mostafa’s hypothesis is correct, shouldn’t the control group have improved at roughly the same rate as before the interventions, just not as much as the study group? Was the original/control environment so horrendous that the teachers were ineffective against it? It doesn’t appear to be vastly different from many of the environments where well-tested behavioral/educational interventions, like Applied Behavioral Analysis (ABA), have been found to be effective. It seems more likely that the peculiar findings can be partly explained by the pitfalls that plague many observational studies in architecture. Here we will focus on non-blinded studies and selection bias.

Non-blinded Studies
A common draw back for most architectural observation studies is that they are not blinded, and so it is with Mostafa’s study. The people that collected the data knew which children where in the control group and which were in the study group. This might not sound like a big deal, but we shouldn’t underestimate our ability to fool ourselves. For example, when doctors were asked to assess new multiple sclerosis treatments, their assessment ultimately depended on whether they were blinded or not. The doctors who knew which patients took what treatment found the treatment to be effective, while those kept in the dark found the new treatments to be of no effect when compared to a placebo.[6] Some may doubt that we see or fail to see the things that are or are not really there, but that is not what Dr. E. James Potchen found when he presented radiologists with chest x-rays. Many of the radiologist were recorded as seeing things that were not in the chest x-rays while at the same time many missed seeing things like a missing left clavicle.[7]

Even with quantifiable measures such as response time, attention span, and occurrences of self-stimulatory behavior, it is not unreasonable to expect some of the children’s behaviors in Mostafa’s study were over or under interpreted by the non-blinded researchers.  For example, a slight look away from a task by a child in the study group might be unconsciously ignored, but could be noticed as a break in attention span for a child in the control group. Ideally, the periodic evaluations of skill development should have been conducted in a neutral setting by blinded researchers.[8]

The observers, however, are not the only ones we need to worry about. The teachers also knew if they were in the study or control group. Perhaps those in the study group worked more enthusiastically as they believed in Mostafa’s hypothesis, while those in the control group unconsciously worked less enthusiastically because they believed they were working in the suboptimal environment. Numerous studies have shown that people’s perception of their lifestyle, health and environment can influence outcomes and behaviors irrespective of their actual lifestyle, health and environment.[9]  In the case of Mostafa’s study it would have been impossible to blind the teachers to the spatial intervention, but not in the case of the acoustical intervention. For example, a speech therapy room for the control group could have been outfitted with placebo “acoustical” panels or materials that have no discernable effects on the original acoustics.

Of course it is impractical to blind many, if not most, of the observational studies done in architecture. The problems of non-blinded studies, however, do not go away, and most likely have an effect on the results. For this reason we must be extremely cautious and humble when accepting the results.

Selection Bias
Similar to post-occupancy surveys, observational studies can suffer from the pesky problem of selection bias. Mostafa’s study sampled a group of children from a school in Cairo, Egypt called ADVANCE The Egyptian Society for Developing Skills of Special Needs Children. The study only looked at two primary level classrooms comprising twelve children—six in the control group and six in the study group. These two classes made up 25% of the student body that consisted of six other classes. Professor Mostafa said she chose the two primary level classes because younger children are more impressionable and the impact of the architectural interventions would be more noticeable. In reality, the classes were also based on skill and ability rather than age alone.

With such a small sample, it is possible that by pure chance alone the six in the study group would perform better than the control group regardless of any architectural intervention. This could be true on the individual and group level. Different groups have different dynamics. Some personalities clash while others complement each other. Some groups will fare better than others irrespective of an intervention. The study treats each child as an independent data point when they are not. The children are not coins being flipped where the previous flip has no sway over the current flip. By not accounting for the children’s influence on each other the data might look more significant than would otherwise be.

Further confounding the sampling problem is our lack of knowledge about the children’s socioeconomic statuses. Did one classroom happen to get all the children of one socioeconomic status while the other got the complete opposite or a variety? The advantages or disadvantages of the children’s socioeconomic statuses could have easily influenced the results.

What about the teachers? Was there a difference in the years of teaching experience between the two rooms? How about the levels of education or experience working with individuals with autism? For all we know the differences between the two groups are the result of a sampling problem with the teachers. In lieu of all the missing information about socioeconomic status and teacher experience, Mostafa’s data set cannot be compared and contrasted to future studies (meta-analysis). Mostafa does say the study was a first stage exploratory study that needs to be tested in a larger and more diverse sample size, but as we have seen other variables need to be controlled for.

In Conclusion
Professor Mostafa’s study is a specific example, but the challenges are not unique. A good deal of the observational studies architects conduct struggle with the same issues. Many observational studies simply lack the rigor to substantiate their knowledge claims. It might be nearly impossible to control for many of the pernicious pitfalls, and I fully acknowledge that. The expectation is not that architects need to blind every observational study or rid every study of selection bias, but to acknowledge the weaknesses and be that much more generous with the error bars they affix to their knowledge.

If you enjoyed this article check out more by Christopher N. Henry here.

Christopher Henry has been researching, writing, and consulting on autism design since 2005. He has conducted post-occupancy evaluations of autism schools, homes and clinics in Denmark, England, and the US. Christopher also spent 9-months working direct-care at Bittersweet Farms, a residential and vocational facility for adults with autism. He currently runs Autism Design Consultants, where you can find more information about autism design.


*It should be noted that by definition an observational study does not impose interventions to test for causation. Researchers in an observational study simply observe and take measurements to determine association. This differs from a designed experiment where the researchers do impose interventions and then observe and take measurements. This technically makes Magda Mostafa’s study a designed experiment and not an observational study. The article is looking at the pitfalls of human observation and selection bias as opposed to something like measuring the heat gain/loss of the building, blood pressure, heart rate, PSA levels, etc, which have their own inferential limitations. The articles lumps the technical terms of designed and observational studies together on how the data were collected, and not whether the study sought to determine association or causation.

[1] I first came across this study in: Harris, Judith Rich. The Nurture Assumption, Free Press; Rev Updated edition 2011, Kindle Location 6642-45.

The actual study can be found here: Santrock, John W. Tracy, Russel L. “Effects of children’s family structure status on the development of stereotypes by teachers.” Journal of Educational Psychology, Vol 70(5), Oct. 1978, 754-757.

[2] The first semester was spent collecting baseline data of both the control and study group before Mostafa implemented two architectural interventions. The second semester was used to measure the impact of the architectural interventions. The study measured changes in attention span, response time, and behavior temperament.

[3] Mostafa, Magda. “An Architecture for Autism: Concepts of Design Intervention for the Autistic User.” International Journal of Architectural Research. Volume 2 Issue 1. 193. March 2008.

Mostafa chose the two interventions based on survey she conducted. One intervention related to spatial sequencing and the other related to acoustics.

[4] There is one methodological oversight that I do not address in this article because it is not specific to observational studies, but it is an important one. Professor Mostafa changed three variables: spatial, visual (a consequence of the spatial), and acoustical. That makes it impossible to determine what intervention had what effect. Perhaps one intervention was detrimental, but the other two were helpful so there is still a net gain. Perhaps two had no effect and one had a positive effect. Without knowing what intervention has what effect, schools run the risk of spending money on interventions that have no benefits or worse interventions that hinder the children’s progress.

[5] The significance of Mostafa’s study does not rest upon the methodological rigor, but on Mostafa’s great efforts to drag autism design research out of the current anecdotal quagmire it is now currently in. See Henry, Christopher N., “Architecture for Autism: architects heading in the right direction,” ArchDaily.com.

[6]  Evans, Imogen, Hazel Thornton, Iain Chalmers, Paul Glasziou and Ben Goldacre. Testing Treatments: Better Research for Better Healthcare. Pinter & Martin; 2nd edition 2011 Kindle Location 1622-31.

[7] Groopman, Jerome. How Doctors Think. Mariner Books, Kindle Location 2550-2584. In this study, the radiologist did better (found the missing clavicle) when they were told the chest x-rays were part of a series to find cancer. So giving some suggestion is not terrible in all circumstances. Still, neutral/blinded evaluators could have been told the children had autism and instructed to record behaviors, attention span, and response time without telling the evaluators which child had been in which environment.

Example of seeing things that are not there. “Ironically, Potchen pointed out, based on his studies of radiologist, “if you look at the film too long, you increase the risk of hurting the patient.” After about thirty-eight seconds, he found, many radiologists begin to “see things that are not there.” In essence, they generated false positives and begin to designate normal structures as abnormal.”

[8] Some might argue that testing in a neutral setting is unfair as the entire study revolves around the differences between the control and study environments. Mostafa’s sensory sensitive environment, however, is not the end goal. It is hoped that the sensory sensitive environment will help the children gain skills that can be generalized to different settings. If her hypothesis is correct then the students who developed skills in the sensory sensitive environment should perform better than the control group in a neutral setting. See page 193-194 of the study.

[9] Goldacre, Ben. “Think yourself thin…” The Guardian, August 23, 2008. http://www.badscience.net/2008/08/think-yourself-thin/

Cite: Henry, Christopher N.. "Pitfalls of Observational Studies" 10 May 2012. ArchDaily. Accessed 22 Dec 2014. <http://www.archdaily.com/?p=233177>