(Memorial Sloan Kettering Cancer Center Library)
The Library and Archives at Memorial Sloan Kettering Cancer Center is happy to announce that we have a paid, remote/hybrid internship opportunity for summer 2023 in the field of Research Data Management focusing on Data Discovery. The project (listed below) will last 10 weeks from June 12-August 18, as part of the DigITs Summer Internship Program (the division through which the Library and Archives reports).
Application deadline: May 31, 2023
Hourly wage range: $18-$25
Those interested in applying for the internship should send their resume and cover letter directly to Anthony Dellureficio, Associate Librarian for Research Data Management.
If you are interested in this opportunity, we encourage you to apply as soon as possible. We anticipate scheduling interviews immediately following the application deadline.
Research Data Management Project Details:
This project will center around the questions:
- What differences, commonalities, and standards exist between metadata structures of various FAIR repositories?
- Can workflows be developed to enhance metadata sharing from repositories to institutional discovery platforms?
In March 2020, the Library launched the MSK Data Catalog, a searchable and browsable online collection of records describing the contents of datasets and providing access instructions for those wishing to explore the data for their own research. The catalog records consist of rich metadata conforming to schema.org standards, as well as utilizing controlled vocabularies, such as NLM medical subject headings (MeSH) and the MSK’s Oncotree cancer taxonomy. You can read more about the project here: https://datacatalog.mskcc.org/about.
The records in our catalog primarily come from public, FAIR compliant repositories (such as cBioPortal, Gene Expression Omnibus, Dryad, Harvard Dataverse, figShare, Zenodo, etc) and complement our institutional publications database, Synapse, a public-facing resource tracking the intellectual output of MSK researchers.
The intern in this project will:
- Prepare a description by repository of a search strategy to discover deposits affiliated with MSK (this strategy will be shared publicly for adoption by other institutions),
- Document workflows and procedures for exporting metadata from these repositories,
- Document workflows and procedures for massaging exported metadata records and ingesting them into our data catalog (with annotations to help replication of these procedures),
- Create new catalog records for MSK generated datasets in the publicly accessible repositories mentioned above.