Department of Medicine
School of Medicine Queen's University

News, Innovations and Discoveries Blog

Data linkage and data sharing in clinical trials: Good in principle, complex in practice

Although clinical trials generate vast amounts of data, a large portion is never published or made available to other researchers. Data sharing could advance scientific discovery and improve clinical care by maximizing the knowledge gained from data collected in trials, stimulating new ideas for research, and avoiding unnecessarily duplicative trials. Institute of Medicine


Image from Chung et al

Dr. Annette Hay, Assistant Professor of Medicine (Hematology), recently gave a superb Medical Grand Rounds entitled, From Data Fragmentation To Utopia (available to Queen’s faculty via MedTech (link). Dr. Hay is a member of the Canadian Clinical Trials Group (CCTG), based at Queen’s University. This remarkable entity was founded by Dr. Joe Pater and performs clinical trials in Canada and globally. Since its founding in 1980 the CCTG (Annual budget $25-35 M) has performed 510 clinical trials, enrolling over 80,000 patients and improving the care of cancer patients internationally. The CCTG is a jewel in the Queen’s crown.


Dr. Hay’s lecture introduced the ideas of data linkage and mandatory data sharing. This inspired me to try and envision the brave new world of clinical trials in an era whose zeitgeist includes ambivalence about collection and sharing of personal data and an erosion of trust in conventional knowledge creators and translators (Medical Journals, Big Pharma/Device, Physicians). Another shaper of the coming changes in clinical trials is their staggering expense and the potential for harm if results are not presented honestly. These realities have raised a societal call for cheaper data, more thorough use of all collected data and greater demand for data transparency and for the opportunity to independently verify the collected data. As a result, clinical trialists may soon be adding two new phrases to their lexicon: data linkage and data sharing.

For those who are not clinical trialists the two issues data linkage and data sharing may be opaque, so let me set the stage. First, these are separate issues but have in common a desire to get the most from large, expensive clinical trials. We do these large, multisite clinical trials to test new drugs, devices or strategies of care. Data from this research (which may be funded from the public purse, industry, or both) usually leads to one or more publications and generates the data that provide the basis for submission of files seeking drug and device approval from agencies such as Health Canada (Health Products and Food Branch) or, in the United States, the Food and Drug Administration, FDA).

However, the current way we do clinical trials has some shortcomings. Clinical trials have a limited duration of follow-up and benefits or harm of a new drug, device or strategy may only become apparent on a longer time scale-after the trial is closed. If we could link clinical trials data to health databases at the level of the individual we could theoretically learn how patients do in the long term, after the trial has been long closed. This real world follow-up would be less controlled than the clinical trial environment but would be economical and offer the long-term perspective on benefit, risk, and cost. To achieve data linkage would require a relatively granular connection (at the level of the individual) between the clinical trials dataset and a provincial health registry, such as the Institute for Clinical Evaluative Sciences (ICES). This raises logistical challenges and requires many issues of patient privacy to be addressed.

Dr. Hay’s lecture, prompted me to reflect on both potential benefits and real world complexities of these new approaches to clinical trial data. Changing clinical trials will be complicated by human nature. On one-hand, humans are social creatures. We value networking and sharing. We teach our children, “Play with others”, “Share your toys”. We indoctrinate medical students to value networking and sharing and admonish them to be good CANMEDS Communicators and Collaborators. Permitting data linkage, to save money and/or increase knowledge, or requiring access to clinical trials data, to allow verification of published findings or to perform secondary analyses, has some altruistic appeal. However, the proposition of linking a research subject’s clinical trial data to their personal health record or the mandatory sharing of clinical trials data with external researchers is complex. Patients value their privacy and, researchers are not selfless saints. Whether these are good ideas may depend not only on how and why linkage and sharing are done but also on one’s personal views on privacy and intellectual property, as will be discussed.

Dr. Hay shared the example of the potential value of data linkage citing a trial conducted to determine whether patients with Hodgkin’s lymphoma should get chemotherapy alone or chemotherapy plus radiation therapy. At the 4-year mark, there was no difference between the strategies; however by year-11 it became clear it was best to give chemotherapy alone. This changed practice, allowing patients to avoid unnecessary radiation and saving costs. Such long-term follow-up would be greatly facilitated if the patients’ health status could be followed using their clinical health record.


Dr. Hay and CCTG hope to eventually use personal identifiers (like birthdate, address or even social insurance number), collected as part of the clinical trial, to link the data for these same individuals in registry / administrative databases such as ICES, Statistics Canada and provincial cancer registries, as a means to extend and enrich trial outcome data. Data linkage is already well accepted in other jurisdictions. For example in Glasgow. Scotland, data linkage was used to extend the follow-up of patients enrolled in a primary prevention trial of patients with high cholesterol levels. This West of Scotland Coronary Prevention Study (WOSCOPS) was a well-designed RCT that showed that pravastatin, employed as a tool for primary prevention of cardiovascular disease, saved lives. However this RCT, with its 6-year follow-up, cost £20,000,000.


WOSCOPS trial Figure 2: Kaplan Meier Analysis of time to nonfatal MI or death by treatment group. Shepherd, J et al N Engl J Med 1995; 333:1301-1308November 16, 1995DOI: 10.1056/NEJM199511163332001


In contrast, a study using data linkage demonstrated the sustained beneficial effects of pravastatin at 16-year follow-up (below) at one-thousandth the cost, a mere £15,000! Scotland (Dr. Hay’s home country) has embraced the use of data linkage as national research policy and ensured that both the research projects and researchers are of the highest standard, with careful attention to peer review, ethics approval and data control.


Slide obtained from Dr. Hay and created by Colin McCowan, Professor of Health Informatics, University of Glasgow


Can the technique of data linkage be imported to Canada? In unpublished data Dr. Hay showed that such linkage was feasible in >90% of cases. Would Canadian patients agree to this long-term surveillance? Dr. Hay surveyed 483 patients with various types of cancer. She presented a hypothetical scenario for use of data linking as part of clinical trial participation. The overwhelming opinion of patients was that they were “OK with data linkage” (see response to a representative question below)

Would you be willing to allow the research team running the clinical trials confidential access to your health information contained in administrative databases?



Indeed, 94% of patients surveyed were open to the idea of getting long-term follow-up by data linkage with their personal health record (instead of a conventional visit to the research centre). Dr. Hay next plans to prospectively assess this new strategy in the CCTG Long-term Innovative Follow-up Extension (LIFE) study, to be conducted in collaboration with Statistics Canada.

Data Sharing

 At first glance it would appear to be in the public benefit to share. Since the average clinical trials costs millions and much of the collected information never sees the light of day, data sharing might allow scientists to fully utilize the data. At its root, the research is derived from research subjects who are usually volunteers. Surely data sharing is consistent with their spirit of patients sharing their medical data and biological samples in hopes of helping human kind (and possibly themselves). In most trials there are data that go unpublished and which hypothetically could be used, either by the investigators who did the trial or others, unrelated to the clinical trial team. These additional analyses could be used to answer additional important new questions or provide a richer nuance to the initial publication. For example a trial may focus on treating cancer but its data could be repurposed to examine potential cardiac adverse effects of the chemotherapeutic agent that was studied. If the data exist and are unpublished, it seems reasonable to allow a credible investigator to access the data and determine the agent’s cardiac effects. This could save time and money and potentially enhance patient safety. In principle, data sharing would also permit reanalysis of published data and, if done with integrity, this might provide independent validation (or refutation) of published results.

The aptly named BMJ editor, Dr. Godlee recently made the case for data sharing as a response to a perception of poor reliability and bias in publications “There will be commercial pressures, academic pressures, and to pretend otherwise is absurd. So we have to have many more mechanisms, much more skepticism, and much more willingness to challenge“. In this spirit, mandatory data sharing could be a useful antidote to data error or scientific fraud.




The Institute of Medicine has spoken on the subject of sharing data derived from clinical trials, noting it must be considered the norm (although acknowledging it must be properly regulated): “Stakeholders in clinical trials should foster a culture in which data sharing is the expected norm and should commit to responsible strategies aimed at maximizing the benefits, minimizing the risks, and overcoming the challenges of sharing data for all parties.Holders of clinical trial data should mitigate the risks and enhance the benefits of sharing sensitive clinical trial data by implementing operational strategies that include employing data use agreements, designating an independent review panel, including members of the lay public in governance, and making access to clinical trial data transparent.”


Get IOM report

However when one moves from the theoretical to the real world, data sharing becomes more complex. The New England Journal of Medicine (NEJM) editors, Drs. Longo and Drazen wrote an editorial on Data Sharing, noting several concerns (which I have quoted below). NEJM Concerns:

1) The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters.

2) A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. They termed this new class of people, research parasites, which prompted an exchange in the Twittersphere (see below).

nejmDespite Longo and Drazen’s concerns their editorial was primarily focused on praise from an article in the NEJM which, in their view, utilized data sharing the “right way” (Dalerba P, Sahoo D, Paik S, et al. CDX2 as a prognostic bio- marker in stage II and stage III colon cancer. N Engl J Med 2016; 374:211-22.)


When considering mandatory data sharing we should note that there is nothing wrong with pride in discovery or proprietary interests. Moreover, not all research is publically funded. Should company X provide company Y with access to its expensive and hard won data (with the resulting adverse financial consequences)? This intellectual property concern also applies to individual researchers, who labour to “sell” their theories or to obtain credit for novel reagents and inventions (both for promotion and to obtain grant funding).


We would be naive if we did not believe that there is not pettiness and bias in science. Every theory and most investigators have their detractors. Data sharing can either bring truth to light or allow people with their own strongly held views to rewrite history (with little effort or expense on their part). The British Medical Journal (BMJ) recently published a re-analysis of unpublished data which attempts to overthrow the dietary fat-coronary disease hypothesis, championed by Dr. Ancel Keys in the Minnesota Coronary Experiment (MCE), a randomized controlled trial conducted in 1968-73.

Diet Heart hypothesis of Keys et al


The publication (by Ramsden et al) suggested that while a diet rich in unsaturated fat does lower cholesterol it is not beneficial in reducing mortality.


BMJ 2016;353:i1246

The MCE was the largest (n=9570) dietary trial of cholesterol lowering and used the strategy of replacing saturated fat with vegetable oil rich in linoleic acid. Ramsden et al recovered raw MCE data, including previously unpublished records of serum cholesterol and autopsy reports and resurrected an extensive collection of study documents, notably the 1981 master’s thesis of S K Broste. A reanalysis of the data, based on the unpublished thesis, concluded that replacement of saturated fat with linoleic acid effectively lowers serum cholesterol but does not translates into a lower risk of death from coronary heart disease. Their BMJ paper is massive and it is unclear 40 years later why the thesis was originally unpublished. Ramsden’s reanalysis suggests the diet (below: corn oil diet-blue line) was associated with a higher risk in death than the high fat control diet.



The authors conclude that Findings from the Minnesota Coronary Experiment add to growing evidence that incomplete publication has contributed to overestimation of the benefits of replacing saturated fat with vegetable oils rich in linoleic acid.” The authors (and the BMJ) have chosen to question the validity of conventional wisdom. That is fine; the truth should out. But what is the truth? There are many reasons why theses may remain unpublished and it is challenging to ensure data integrity when resurrecting 40-year-old data from magnetic tapes. Anyone who runs a trial or a basic science lab has data that they choose not to publish for legitimate reasons-the quality of the data, confidence in the person who collected or entered the data etc. If the goal of data sharing is public benefit, one might wonder if Ramsden’s study is helpful or harmful. In the era of Frantz (senior) and Keys, the only way to lower LDL was with diet. Their work led to the notion that lowering cholesterol would be beneficial in preventing atherosclerosis-and this is no longer a question. In the “statin era” we can be confident that lowering LDL cholesterol (with statins) does indeed reduce cardiovascular disease morbidity and and all cause mortality. We arrived at the use of statins and cholesterol reduction via Ancel Keys and the diet-cholesterol hypothesis. So did Frantz and Keys’s potentially incorrect conclusion lead to a happy landing or were they right in their initial study? Conversely, would the current revised data and Ramsden’s paper have misdirected us? I’ll let you be the judge.




When it comes to data sharing (which we in basic science also are required to do) I like the approach proposed by NEJM’s Longo and Drazen. They suggest that data sharing done right involves 4 principles (see quotes from their editorial below):

1) Start with a novel idea, one that is not an obvious extension of the reported work.

2) Identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration.

3) Work together to test the new hypothesis.

4) Report the new findings with relevant co-authorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested.

I would add to their list the need to respect intellectual property and allow inventors and discoveries to pursue not only their line on investigation but the secondary studies they envision, without having data and reagents appropriated by individuals who have not invested in the often-painful process of research and discovery. One would also insist that one’s data be used by people acting with unsullied motives who are themselves competent and who will respect ethical and confidentiality constraints associated with the original data set.

Sharing and linking have a role but research is hard work and researchers merit some protection of their ideas and discoveries and control over their data. Einstein reminds us that perhaps discovery and the joy that goes with it belong to those that actually participate in research and knowledge creation-not to the intelligent student who simply travels the well-trod path.

In the light of knowledge attained, the happy achievement seems almost a matter of course, and any intelligent student can grasp it without too much trouble. But the years of anxious searching in the dark, with their intense longing, their alterations of confidence and exhaustion, and the final emergence into the light-only those who have themselves experienced it can understand that

… Albert Einstein



Leave a Reply

Dr. Archer, Dept. Head
Dr. Archer, Dept. Head