Biomedical Engineering Reference
In-Depth Information
Curation process for internal trials
Data Owners
Identify most
relevant data
Gather input
from data
owner
Curator
ETL Engineer
Curator
Curator, TM Team,
Data owners
Deploy
Obtains, reads, and
understands trial
protocol and other
relevant
documentation
Classifies data
according to
established schema
and converts data
to SAS format
Transforms into
Dataset Explorer
format trough
i2b2 protocols &
load
QA and QC
Data transfer
Communication
with
data steward to
help identify
appropriate data
Data Steward
Collects
permissions and
actual data
Data transfer
Curation process for public data
Data Owners
Provide
clarifications
Contact data
owners
if data are
incomplete
ETL Engineer
Curator
Curator
Curator, TM Team
Deploy
Obtains, reads, and
understands study
manuscript and
other relevant
documentation
Transforms into
Dataset Explorer
format trough i2b2
protocols & load
Classifies data
according to
established schema
and converts data
to SAS formats
QA and QC
Data transfer
Figure 16.3 Data acquisition, curation, and data-loading process for internal trial data
and public studies.
ics professionals to clean and add data to the system. Curators were respon-
sible for understanding each data set added to the system, researching and
reconciling inconsistencies in the data, and tagging documents with metadata
to facilitate search. Additionally, product manager and curators were respon-
sible for validating data after it was loaded into the system for scientifi c utility.
ETL developers were responsible for loading the data sets and metadata into
the database, normalizing data, and making the data available through the
application (Fig. 16.3 ).
16.7
tran SMART DESCRIPTION
All users of the system are authenticated using the same enterprise security
processes that the pharmaceutical companies of Johnson and Johnson require
for other internal systems. Because tranSMART contains sensitive data, we
developed a fi ne-grained security model for managing information in the
system. Each clinical trial data set has a specifi c owner who can control access
to information about and data in the trial.
The tranSMART system was built by maximally optimizing the reuse of
open-source and open-data projects, including Lucene [16], i2b2 (http://www.
i2b2.org) [17], GenePattern [18], Gene Expression Omnibus (GEO) [19],
MeSH [13], and Entrez [20]. The graphical user interface enables data access
Search WWH ::




Custom Search