The MEDLINE co-occurrences file summarizes the MeSH Descriptors that occur together in MEDLINE citations from the MEDLINE/PubMed Baseline. The MEDLINE/PubMed Baseline is a snapshot created at the beginning of each new MeSH Indexing Year containing the MEDLINE, OLDMEDLINE, and PubMed-not-MEDLINE records.

The example to the right shows the indexing from a sample MEDLINE citation on the left and the list of co-occurrences that would be generated from the indexing on the right. We also track whether each of the MeSH Descriptors is considered a Major Topic (starred). In this example, Poisoning, Poisons, and Veratrum are considered Major Topics. A more complete example is available in the documentation.

Asterisks (stars) on MeSH Descriptors and Qualifiers (e.g., Veratrum/*metabolism) designate that they are the Major Topics of the article. Non-Major (non-asterisked) Descriptors and Qualifiers are usually additional topics substantively discussed within the article, terms added to qualify a Major Topic, or Check Tags (excerpt from Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®: A Tutorial with slight modification for this description). We specifically identify the co-occurrences where both MeSH Descriptors are marked as Major Topics for backward compatibility with the legacy MRCOC file. The legacy MRCOC file only tracked co-occurrences where both MeSH Descriptors were marked as Major Topics. We now track all MeSH Descriptor co-occurrences for completeness.

For each MEDLINE/PubMed Baseline, we have created two files: One with the complete details (detailed_CoOccurs_YYYY.txt) and one with a summarized version (summary_CoOccurs_YYYY.txt) of the identified co-occurrences. The summary file is the replacement for the legacy UMLS MRCOC file. The more detailed file provides deeper and richer data if the information is required. For example, from the detailed file, you can identify all of the PMIDs where the MeSH Descriptors Poisoning and Vanillic Acid co-occur and are both identified as Major Topics. Or, if you are trying to identify the earliest paper talking about both Poisoning and Vanillic Acid, you will find the information in the detailed file.


 Baseline Year   Documentation   summary_CoOccurs_YYYY.txt   detailed_CoOccurs_YYYY.txt   AI/RHEUM and CCPSS 
2016 2013 MRCOC Documentation (PDF)  (214 KB) MRCOC summary_CoOccurs_2016.txt.gz (gzipped text)  (1.5 GB)
14GB uncompressed
1.5GB compressed
MRCOC detailed_CoOccurs_2016.txt.gz (gzipped text)  (18 GB)
147GB uncompressed
19GB compressed
2015 MRCOC summary_CoOccurs_2015.txt.gz (gzipped text)  (1.4 GB)
13GB uncompressed
1.4GB compressed
MRCOC detailed_CoOccurs_2015.txt.gz (gzipped text)  (18 GB)
138GB uncompressed
18GB compressed
2014 MRCOC summary_CoOccurs_2014.txt.gz (gzipped text)  (1.3 GB)
13GB uncompressed
1.3GB compressed
MRCOC detailed_CoOccurs_2014.txt.gz (gzipped text)  (17 GB)
131GB uncompressed
17GB compressed
2013 MRCOC summary_CoOccurs_2013.txt.gz (gzipped text)  (1.3 GB)
12GB uncompressed
1.3GB compressed
MRCOC detailed_CoOccurs_2013.txt.gz (gzipped text)  (16 GB)
125GB uncompressed
16GB compressed

summary_CoOccurs_YYYY.txt: [Large file]
The summary_CoOccurs_YYYY.txt is the replacement for the legacy UMLS MRCOC file. The co-occurrences are summarized by timeframe (MED - last five years of MEDLINE, MBD - previous five years of MEDLINE (years 6-10), and RST - the remaining years of MEDLINE) based on the Year from the Date Completed (the date indexing processing was completed). For each co-occurrence, we track the MeSH Descriptor Unique Identifier (DUI), UMLS Concept Unique Identifier (CUI), the overall frequency for the occurrence of the two MeSH Descriptors in the same MEDLINE citation, the frequency of when both MeSH Descriptors are starred (identified as Major Topics) in the same MEDLINE citation, Date Completed Year, timeframe, and several supplemental information frequencies detailed in the documentation. We specifically flag the co-occurrences where both MeSH Descriptors are marked as Major Topics (starred) for backward compatibility with the legacy MRCOC file.

detailed_CoOccurs_YYYY.txt: [Very large file]
The detailed descriptor co-occurrences file contains the complete information for each MeSH Descriptor co-occurrence and allows for identifying PMIDs for specific sets of co-occurrences. The file is sorted into DUI1, DUI2, Completed Year, and PMID order clustering all of the DUI1/DUI2 co-occurrence combinations by the year completed for easier summarization. The file also contains information identifying which MeSH Qualifiers are associated with which MeSH Descriptors in the co-occurrence. So, it is possible to recreate the legacy MRCOC file LQ and LQB two-way view from this file if desired. This file contains multiple dates allowing for the identification of the earliest occurrence of a co-occurrence. Please see the more detailed explanation of this file in the documentation along with a detailed explanation of the various dates that are tracked.

Historically, the UMLS (Unified Medical Language System) MRCOC file tracked the co-occurrences of important concepts from three sources: MEDLINE, AI/RHEUM (The Artificial Intelligence Rheumatology Consultant System), and CCPSS (The Canonical Clinical Problem Statement System). The AI/RHEUM and CCPSS data are not available to update the information for use in the new MRCOC file. The existing AI/RHEUM and CCPSS records from the MRCOC file are available in the historical versions of the UMLS releases up through the 2013AA release, or as a static file representing the 2013AA version of the AI/RHEUM and CCPSS data from our archive.

