The Blinding Problem: Placebos, Expectancy, and the Crisis of Evidence in Psychedelic Medicine

Physicians from various continents over time have sworn by the healing powers of crocodile dung, fox lung, the powder of precious stones, and moss scraped from the skulls of victims of violent death.

They had reason to prescribe them: these remedies sometimes worked.

“While many of these substances were pharmacologically inert, they sometimes played an important role as therapeutically beneficial explanatory fictions…”, David Jopling explains in Talking Cures and Placebo Effects (2008). “[B]ecause the physicians who dispensed the substances supplied patients with a rationale, conceptual scheme, or myth that offered an explanation for otherwise puzzling and frightening symptoms”.

Would I be displeased if my general practitioner (GP) prescribed crocodile dung for my eczema? Honestly, probably, yes. They could show me tables, charts, testimonials of success – perhaps even spin a compelling yarn about the magical resonance of crocodile flesh with my dermatitis.

If I care about what works, why would I be so reluctant?

We may ascribe these reported benefits to the so-called ‘placebo effect’. The word placebo – from the Latin “I shall please” – was coined to describe quack medicines reliant on belief rather than clear medical process. But the distinction between the placebo and the pharmakon – a word that means the ‘cure’, ‘poison’ and ‘scapegoat’ all at once – has become steadily blurred by gathering evidence of the psychological operations involved in healing.

The ‘placebo effect’ is any medical benefit ascribed to an inactive treatment. It is typically based on belief and hope, Pavlovian conditioning, or the activation of endogenous healing mechanisms such as endogenous opioid production. The latter effect is so pronounced that it may be blocked by naloxone, a drug used to prevent opioid overdoses. The dopamine release of placebo treatments for Parkinson’s disease has been measured in PET imagery of the basal ganglia. A survey of British GPs found that nearly every single respondent (ninety-eight per cent) had prescribed ‘placebo’ treatments at some point in their careers. A different poll suggested seventy-seven per cent use placebos on a weekly basis, including inessential physical exams and vitamin pills. It is common to hear calls to ‘enhance’ and ‘deploy’ the placebo effect for further healing. They wear a uniform of reassuring colours, deploy mythic ideas (the ‘chemical imbalance’ theory of depression, for instance), and decorate their offices with signs of authority, like certificates, even though their expertise may be exceeded by chatbots. Indeed, medical error is a leading cause of death.

At some point, though, we must work out a difference: what is the benefit of the active treatment?

If we can’t work it out, then the difference between evidence-based medicine and quackery isn’t clear. As the surgeon and sceptic Dr Ian Harris has written, the medic becomes a “magician”, where the failure to deliver benefit may be ascribed to insufficient ‘belief’. More than falsifiable medical theories, the sheer faith of vulnerable patients becomes the main mechanism of action. The real healing delivered by our essential medicines is needlessly blurred: one need not believe in a typhoid jab, for instance, for its power to save lives.

But is belief something we need when it comes to mental health?

The Blinding Problem

The randomised controlled trial (RCT) is traditionally how we’ve spelled the difference. There are two arms to the investigation: one receives the active treatment, the other an inert pill (so-called ‘sugar pill’ or ‘dummy pill’). Neither investigator nor patient is made aware of who is in which arm, each being randomly assigned, so the treatment effect is largely isolated from the expectation – and bias – of those involved.

It all makes sense on paper. One major problem, however, is ‘functional unblinding’: when the blind doesn’t hold because it is clear to one or both parties who has received the active treatment. One cause of unblinding is the unique subjective effects of the active treatment. A patient might suffer side effects like a dry mouth or sexual dysfunction from SSRIs, for instance. Realising that one is in the active treatment, as we will see, may result in expectancy effects: the belief that one is doing better through receiving an exciting new drug. A review of nearly 2,500 RCTs for schizophrenia and affective disorders like depression found that only two and a half per cent evaluated blinding. In the subset that did check blinding, patients correctly guessed their treatment arm over half of the time. A review of over sixty RCTs for various psychological interventions found that none evaluated the quality of blinding.

It is easier to blind some treatments than others. Electroconvulsive therapy, for instance, involves sedation, seizure induction, and cognitive effects that are highly apparent for any recipient.

In the case of psychotherapy, one review found that trials “cannot be double-blinded”. One option for the control is a ‘sham’ therapy. The clinician might engage in mere listening and giving minimal advice, or generic forms of counselling without any specific basis for the active treatment. The clinician will invariably guess the treatment arm to which they’ve been assigned. Psychotherapies such as CBT and DBT are rooted in specific practices and lingos. DBT patients make use of “diary cards” to record which coping skills were used that week, for instance: if we used a related “sham” therapy, perhaps the triallist would need “sham diary cards”. But this no longer constitutes an inert treatment, but instead resembles a crap version of DBT: or, put more formally, a “low dose of the active treatment”. In the absence of good sham therapies, eighty-seven per cent of trials make use of waiting lists for their comparators – but those on waiting lists are more likely to remain depressed than the average person, skewing results.

Psychedelic drugs are more vulnerable to ‘functional unblinding’. An estimated ninety-five per cent of participants correctly guess their treatment arm, because the dramatic mental effects of LSD, DMT, and psilocybin are so apparent. In the Johns Hopkins 2006 study, which evaluated the potential for psilocybin to occasion mystical experiences, the figure was surprisingly low: a mere seventy-seven per cent, still higher than other psychiatric medicines. Unblinding is not monitored as much as it should be. Across 112 psychedelic RCTs, about thirty per cent evaluated blinding integrity at all, but over half cited blinding as a limitation. The result is that all trials but one were rated at “high risk of bias”.

This is not a new challenge. In 1955, researchers at Spring Grove State Hospital, a pioneering site that conducted psychedelic research well into the 1970s, concluded that placebo controls were simply “not useful”. The fact that “it was obvious who had received LSD” was circulated at the American Psychiatric Association meeting that year. The importance of rigorous methodology was increasingly recognised after the thalidomide scandal. Thalidomide was marketed in forty six countries around the world as a morning sickness remedy for pregnant women, having never been formally approved by the Food and Drug Administration (FDA). The drug created miscarriages and severe malformations in more than ten thousand children, becoming a landmark scandal. The 1962 Kefauver–Harris Amendments required substantial evidence of efficacy and safety from “adequate and well‑controlled investigations,” and RCTs swiftly became the “gold standard”. Legislators and regulators converged on a “magic bullet” model of drug action centred on specific biological sites.

What can psychedelic researchers do about the blinding problem? You could ignore it and simply compare the drugs’ effectiveness to the next best alternative, like SSRIs. But this has sidestepped the issue: we still don’t know if the psychedelic drug is effective. SSRI effects are also blurred by expectancy.

You could use “active controls” to produce a subjective effect that could be confused for psilocybin. One common option is niacin, used in the Marsh Chapel Experiment in 1962 and up to the present day, which creates a “flushing” effect that may be confused for the body high of a psychedelic. This is not generally successful, and neither are mild psychoactives like low-dose benzodiazepines, stimulants, or sedating agents. Still, if the blind is broken and the active drug fails to defeat placebo, we may have a considerable signal that the drug effect is unexceptional: this was found in a head-to-head of intravenous ketamine and mitzolam, in which more than ninety per cent guessed their assignment, but no difference occurred. Another approach is to use low doses of psychedelics. This most likely “totally suck[s] at blinding psychedelic macrodose trials”, Dr. Balázs Szigeti of the University of California, San Francisco has said, because “very few will confuse either of these conditions with the psychedelic fairyland of 25mg psilocybin.” This doesn’t test the presence of the drug itself, but simply a low dose of it.

Researchers have made some innovative proposals. One idea is to capture and measure the expectancy itself as a variable using questionnaires, as well as standardising expectations in both arms through a common psychoeducational tool. To ensure fairness, trialists then distribute high-expectancy participants equally across both arms.

You could put patients to sleep. In the 2023 Stanford anaesthesia study, researchers administered ketamine or a saline placebo to patients under surgical anaesthesia, ensuring no one experienced the psychedelic trip or knew which drug they received. Ketamine performed no better than the placebo. Yet both groups improved dramatically. We might worry that the emphasis placed on controlling all ‘expectancy’ has fuelled the decline in therapy placements in psychedelic trials. COMPASS Pathways, for instance, has sought to “minimise preparation and support” to isolate the drug effect, following Lykos’ disastrous application in 2024. We also see the introduction of ‘trip-free’ psychedelics for which controls might be more feasible.

Unblinding and Bias

Are psychedelics too unique to meet the “gold standard” of the RCT? Or is this special pleading?

It is worth restating why ‘unblinding’ matters.

In a successful RCT, the risk of bias and expectancy is ideally managed on both sides. This includes the inflated hope in the active treatment, and the disappointment in the control: the so-called ‘nocebo’ effect. What we get is an approximate measure of the real value added of a medication, whose benefits must exceed its risks.

In the case of psychedelic drugs, patients face a complex menu of potential dangers. Cardiotoxicity, prolonged hallucinations, comedowns, mental dependence, fits and seizures, bad trips, sexual and financial exploitation: the prospective disasters might be uncommon, but they’re very real. Psychedelic drugs are recognised as being vastly more expensive than interventions like manualised CBT or SSRIs. Someone must foot the bill, whether it be desperate clients out-of-pocket, health insurers, or taxpayers funding a public healthcare system. It already looks like psychedelics do not defeat SSRIs, whose benefits compared to well-orchestrated controls are vanishingly thin. Esketamine failed to defeat placebo in five of its six efficacy studies before it was approved by the FDA, bringing a particular risk of addiction.

Perhaps most importantly, unblinding expands the surface area for researcher bias: once a researcher knows a patient is under the active drug, various incentives are afoot. In Lykos Therapeutics' application for MDMA-assisted therapy for PTSD, over ninety per cent of those in the MDMA arm correctly identified their assignment. Ninety per cent of those in the active arm also no longer met the criteria for PTSD after three months.

A fact less emphasised is that seventy per cent in the control arm achieved the same result.

Is the twenty per cent margin enough to justify the risks of MDMA? Many thought so: we saw rapturous predictions that “psychiatry may never be the same”. But the level of unblinding was severe enough for the FDA to reject the treatment in August 2024 – and the trial was not as rosy as it first appeared.

Several serious adverse events, including suicidality and sexual abuse, were not properly recorded. Patients told The Wall Street Journal that they were pressured to report good outcomes because they were “making history”: an extreme instance of ‘social desirability bias’, a common problem in trials.

Most shockingly, Lykos’ founder, Dr Rick Doblin, claimed that psychedelic drugs could create a global utopia.

The problem of ideological bias is not unique to the psychedelic field. The creator of mindfulness-based stress reduction, Jon Kabat-Zinn, has claimed that the practice could bring an end to global conflict. One commentary called for “enhanced independence”, in which these often unstated commitments could be disclosed. Indeed, psychedelic investigators have long claimed that their field of study could save the human race. Dr Humphry Osmond, who coined the term “psychedelic”, thought them critical to the survival of the species. Betty Eisner, an early pioneer, was so convinced of her ability as a psychedelic guide that she became a guru and founded a cult. The late Professor Roland Griffiths, who gained a reputation as a “sober scientist” in the revived psychedelic field, claimed that psychedelics were essential tools for managing existential risks like climate change and nuclear war. “You know how important what we are doing is, don’t you”, Griffiths told the documentarian behind the Netflix production How To Change Your Mind. “It is a fight between good and evil, and we must fight to win”.

If researchers know who got the pill, they may try, consciously or unconsciously, to effect better outcomes. Many researchers have financial stakes, through board memberships of psychedelic companies or intellectual property holdings. Large surveys of researchers report that more than half admit to at least one questionable research practice, such as selectively reporting outcomes or massaging analyses to achieve ‘significant’ statistical results. Studies with positive findings are about three times more likely to be published than their null counterparts.

RCTs may amplify these risks because they are so expensive to run. This requires fundraising from pharmaceutical companies or public grantmakers. Industry-funded trials report fifty per cent higher efficacy than independent studies. Eighty per cent of clinical data from Chinese trials, delivered through the aegis of the ruling party, have proven fraudulent. Less than a fifth of psychotherapy trials even discussed or reported researcher allegiance, or the affiliation of a researcher with the field, and one in twenty-five actually measured it. One meta-analysis found that about half of CBT’s purported superiority in early studies could be predicted by the authors’ affiliation with CBT. A review found that the superiority of MCBT disappeared among papers unauthored by mindfulness advocates.

The Inflation of Expectancy

Basing a treatment on expectancy effects is dangerous. In psychedelic trials, for instance, when a participant realises they are under the active drug, they may apply the great hopes of “breakthrough”, “miracle”, “ten thousand hours of therapy” in one session: an ‘extra-pharmacological’ phenomenon known as “the Pollan effect”, after the journalist Michael Pollan’s bestseller How To Change Your Mind (2018). These ideas have a healing function. A defining feature of depressive illness is the foreclosure and darkening of all hopes. The psychedelic may amplify such hopes, creating a virtuous cycle, manifesting in real results that inspire more hopes.

The trouble is that the healing becomes a sort of Ponzi scheme, in which fresh generations of hope are required for each new tranche of patients. It becomes a hall of mirrors. The molecule itself – together with a favourable set and setting – is believed to be the therapeutic agent: whether a generator of ‘neuroplastic’ adaptation or an activator for the ‘inner healer’, or some other mechanism. This belief in the drug is the “Pollan effect”. But if we cannot demonstrate that psychedelic-assisted therapy is effective beyond one’s expectation that it will be effective, the patient, who is really healing herself through her own belief, is structurally misled, and resources are not allocated rationally. A dependency on headline media also means that outcomes will likely darken if a backlash occurs. There is evidence to suggest that bad trips became more common as the 1960s proceeded, for example, because of darkening media coverage. The “decline effect” is a common phenomenon in research. Early outcomes are uniquely promising, before a regression to the mean occurs. First-mover studies tend to be smaller, with consequently bigger skews, and are more laden with research bias.

But how do we distinguish the kind of ‘expectancy’ the RCT controls from the ‘set and setting’ we would optimise? Dr Ido Hartogsohn has named psychedelics ‘super-placebos’, and suggested that expectancy effects should be enhanced wherever ethically and practically possible. “Attempting to study psychedelics as though they function like serotonergic antidepressants misidentifies the object of study”, a 2025 paper in the British Journal of Psychiatry read, “and this in turn undermines the validity of trial outcomes.” At Johns Hopkins University, for instance, the pills are delivered in a ritual wooden chalice, encoding an idea that the experience will be sacred. Is this irresponsible inflation? Is it simply doing a good job?

One problem is the ‘nocebo’ response. Patients in the control arm may be disappointed and morose for not receiving the psilocybin. If they do receive it, what if the vaunted ‘mystical experience’ or ‘ego dissolution’ they heard about does not occur? One Johns Hopkins participant who had a history of previous suicidal thoughts killed himself after receiving a control dose. This tragic outcome was judged to be “unrelated” to the study. Dr Rick Strassman, whose DMT research in New Mexico in the 1990s revived psychedelic research, has suggested that the suicide resulted from a dramatic ‘disappointment effect’.

The hall of mirrors gets even weirder. The positive hopes of patients come from studies with “miraculous” outcomes. But these outcomes resulted from wide differentials compared to placebo, which looked worse-than-normal due to these disappointment effects. We want patients to be hopeful and open-minded about the treatment, but a principle of symmetry applies: if the disappointment effect is so severe for the control group, what does that say about the potential inflation effect for the active drug?

Are patients getting better through a delusional belief in the treatment itself? How does this affect their values, their identity, and their future reliance on the psychedelic drugs that purportedly saved them?

A 2013 study found that about a quarter of adults with untreated major depression remit within 3 months, and over half within a year. Could that have been them?

Through close examination of unblinding data, Dr Balázs Szigeti has argued that the ‘nocebo’ is a significantly bigger problem than the placebo. He advocates a Zelen design, in which participants do not know that an active condition exists. The patients are randomised into standard care or an experimental treatment before consent is sought. Those in the former may be unaware they are in a clinical trial at all. Those assigned the experimental treatment will then be told and asked whether they would like to switch. One benefit of the Zelen model is that the “disappointment effect” is reduced. But is failing to let patients know that they might be allocated to an active treatment – especially when they are in vulnerable states – unethical? Any damage to trust may ‘come out’ in the psychedelic state.

Alternatively, you could simply tell a patient they are in the control arm. Placebo effects do not require deception to operate. So-called ‘open-label placebo’ studies have shown that IBS, abdominal pain and depression can be treated with pills known to be inert, which may create less disappointment on the ‘big day’ of psychedelic treatment.

The RCT is not a perfect approach. The problems it presents are much the same as trying to frame the human mind in scientific terms at all. But the ‘blinding problem’ is still important for the elusive question of psychedelic research: what difference do the drugs actually make?