Available Downloads page logo plus icon to right with down arrow and Download Now.
Contact Us   
  Home > Available Downloads
The following files have been made available from the MEDLINE/PubMed Baseline Repository and include all of the generated files we create during our processing of each baseline. We have used GNU's gzip utility to compress the larger files and used the Unix tar command to compile the files into a single download. The compressed files can be expanded using either GNU's gunzip utility or a freely available product like WinZip. Either of these will be able to understand the tar files and separate out the files as appropriate.

To download each file, simply move your cursor over the file icon for the file you wish to download, press the right mouse button and select the "Save Link Target as ..." option.

The frequency count files represent a complete count of the MEDLINE baseline for each category. For example, the MH_freq_count file represents a count for each unique MeSH Heading found in MEDLINE for the given baseline year. We include an overall count for each term and a count of when the term has been starred (considered an IM or Index Medicus index term which represents the most significant points of an article) when applicable (MH and SH terms only). We provide at least two versions of each frequency count category which is just a sorting difference - MH_freq_count is ordered by the overall frequency count for each term, while MH_freq_alpha is ordered in alphabetical order by MeSH Heading. The MH_major_freq_count is sorted using the count of when each term has been starred. The README file associated with each baseline explains in greater detail the format of each of the files.

The raw data files represent the files we generated to use in our MEDLINE Baseline Repository Query tool database and it was felt that others might find these files useful and by providing them here, we help eliminate the duplication of effort. The README file associated with each baseline explains in greater detail the format of each of the files.

We also provide two files where we look at the MeSH Headings assigned to the completed citations during the given baseline or year. The "hist" file is a frequency count of MeSH Headings based on their assigned MeSH Treecodes. The "histST" file is a frequency count of MeSH Headings based on their UMLS Semantic Types and more specifically, their UMLS Semantic Groupings (groups of Semantic Types). We have also included several graphs to help illustrate this data.

The related MeSH Vocabulary data files are also included here to make sure that you have available all of the year specific data you might need for your research.

It's unclear whether the SemGroups.txt file is updated each year, as the Semantic Types change, or is static. This file has grouped the UMLS Semantic Types into 15 (currently) high-level categories. We are using this file to see if we can detect patterns in how the MeSH Headings are assigned in MEDLINE. The papers:

Aggregating UMLS semantic types for reducing conceptual complexity. McCray AT, Burgun A, Bodenreider O; Medinfo. 2001;10(Pt 1):216-20." and Exploring semantic groups through visual approaches., Bodenreider O, McCray AT; Journal of Biomedical Informatics. 2003; 36(6):414-432." provide much greater detail on the grouping of the Semantic Types. Both papers can be found at the Lister Hill National Center for Biomedical Communications web site (https://lhncbc.nlm.nih.gov/) under the "Publications/Tools" section.

The Unique Words from the baseline files are the latest addition to the Repository starting with the 2009 Medline Baseline. We use a very simplified idea of a word -- we throw away anything with all numbers, throws away anything with non-ascii characters, and breaks at anything that is not alphanumeric. The "words" files contains single words and bigram words. The bigram words are made up of a sliding window using the last "valid" word and the current word - so you get something like "last current" where we simply added a space. We also ignore a short (313) list of stop words, so they are not included in the various lists. Each of the "words" files also contains a frequency count for each item. Also, please note that we only look at the Title and Abstract fields to generate our list of words - we have ignored the MeSH Heading fields.

One or more of the following tools may be needed to access the files located on this page after they have been downloaded. The need depends on your current computer resources.

Get Adobe Reader button Adobe's free PDF reader "Adobe Reader"
Info on GNU gunzip gunzip -- to uncompress files
Info on Winzip WinZip -- to uncompress files


2002-2102 MBR Download Archive

Item 20132014201520162017 20182019202020212022
Lexicon MEDLINE N-Gram Sets 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
N-Gram Sets (links)
(N-Grams sized 1 - 5)
  N-Gram Set N-Gram Set N-Gram Set N-Gram Set          
DTD (Document Type Definition) Files 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
DTDs (gzipped tar) 2013 DTD files (gzipped tar)
(7 kb)
2014 DTD files (gzipped tar)
(7 kb)
2015 DTD files (gzipped tar)
(7 kb)
2016 DTD files (gzipped tar)
(7.1 kb)
2017 DTD files (gzipped tar)
(8.4 kb)
         
Frequency Count Files 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
README 2013 README file
(2.4 kb)
2014 README file
(2.4 kb)
2015 README file
(2.4 kb)
2016 README file
(2.4 kb)
2017 README file
(2.4 kb)
         
Summary 2013 Summary file
(228 by)
2014 Summary file
(228 by)
2015 Summary file
(228 by)
2016 Summary file
(228 by)
2017 Summary file
(228 by)
         
Chemical_freq_alpha 2013 Chemical_freq_alpha.gz file
(3.3 mb)
2014 Chemical_freq_alpha.gz file
(3.5 mb)
2015 Chemical_freq_alpha.gz file
(3.6 mb)
2016 Chemical_freq_alpha.gz file
(3.7 mb)
2017 Chemical_freq_alpha.gz file
(3.8 mb)
         
Chemical_freq_count 2013 Chemical_freq_count.gz file
(3.4 mb)
2014 Chemical_freq_count.gz file
(3.6 mb)
2015 Chemical_freq_count.gz file
(3.7 mb)
2016 Chemical_freq_count.gz file
(3.8 mb)
2017 Chemical_freq_count.gz file
(3.9 mb)
         
MH_freq_alpha 2013 MH_freq_alpha.gz file
(472 kb)
2014 MH_freq_alpha.gz file
(479 kb)
2015 MH_freq_alpha.gz file
(485 kb)
2016 MH_freq_alpha.gz file
(499 kb)
2017 MH_freq_alpha.gz file
(510 kb)
         
MH_freq_count 2013 MH_freq_count.gz file
(503 kb)
2014 MH_freq_count.gz file
(511 kb)
2015 MH_freq_count.gz file
(518 kb)
2016 MH_freq_count.gz file
(532 kb)
2017 MH_freq_count.gz file
(545 kb)
         
MH_major_freq_count 2013 MH_major_freq_count.gz file
(509 kb)
2014 MH_major_freq_count.gz file
(516 kb)
2015 MH_major_freq_count.gz file
(523 kb)
2016 MH_major_freq_count.gz file
(537 kb)
2017 MH_major_freq_count.gz file
(549 kb)
         
MH_SH_freq_alpha 2013 MH_SH_freq_alpha.gz file
(4.2 mb)
2014 MH_SH_freq_alpha.gz file
(4.3 mb)
2015 MH_SH_freq_alpha.gz file
(4.4 mb)
2016 MH_SH_freq_alpha.gz file
(4.4 mb)
2017 MH_SH_freq_alpha.gz file
(4.4 mb)
         
MH_SH_freq_count 2013 MH_SH_freq_count.gz file
(6.4 mb)
2014 MH_SH_freq_count.gz file
(6.5 mb)
2015 MH_SH_freq_count.gz file
(6.7 mb)
2016 MH_SH_freq_count.gz file
(6.7 mb)
2017 MH_SH_freq_count.gz file
(6.7 mb)
         
MH_SH_major_freq_count 2013 MH_SH_major_freq_count.gz file
(6.4 mb)
2014 MH_SH_major_freq_count.gz file
(6.5 mb)
2015 MH_SH_major_freq_count.gz file
(6.6 mb)
2016 MH_SH_major_freq_count.gz file
(6.7 mb)
2017 MH_SH_major_freq_count.gz file
(6.7 mb)
         
SH_freq_alpha 2013 SH_freq_alpha.gz file
(1.6 kb)
2014 SH_freq_alpha.gz file
(1.6 kb)
2015 SH_freq_alpha.gz file
(1.6 kb)
2016 SH_freq_alpha.gz file
(1.6 kb)
2017 SH_freq_alpha.gz file
(1.5 kb)
         
SH_freq_count 2013 SH_freq_count.gz file
(1.6 kb)
2014 SH_freq_count.gz file
(1.6 kb)
2015 SH_freq_count.gz file
(1.6 kb)
2016 SH_freq_count.gz file
(1.6 kb)
2017 SH_freq_count.gz file
(1.5 kb)
         
SH_major_freq_count 2013 SH_major_freq_count.gz file
(1.6 kb)
2014 SH_major_freq_count.gz file
(1.6 kb)
2015 SH_major_freq_count.gz file
(1.6 kb)
2016 SH_major_freq_count.gz file
(1.6 kb)
2017 SH_major_freq_count.gz file
(1.5 kb)
         
Raw Data Files 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
README 2013 README file
(3.4 kb)
2014 README file
(3.4 kb)
2015 README file
(3.4 kb)
2016 README file
(3.4 kb)
2017 README file
(3.4 kb)
         
Summary 2013 Summary file
(199 by)
2014 Summary file
(199 by)
2015 Summary file
(199 by)
2016 Summary file
(199 by)
2017 Summary file
(199 by)
         
Chemical_items 2013 Chemical_items.gz file
(325 mb)
2014 Chemical_items.gz file
(339 mb)
2015 Chemical_items.gz file
(355 mb)
2016 Chemical_items.gz file
(376 mb)
2017 Chemical_items.gz file
(399 mb)
         
ID_items 2013 ID_items.gz file
(91 mb)
2014 ID_items.gz file
(95 mb)
2015 ID_items.gz file
(101 mb)
2016 ID_items.gz file
(107 mb)
2017 ID_items.gz file
(124 mb)
         
Full_MH_SH_items 2013 Full_MH_SH_items.gz file
(1.7 gb)
2014 Full_MH_SH_items.gz file
(1.8 gb)
2015 Full_MH_SH_items.gz file
(1.9 gb)
2016 Full_MH_SH_items.gz file
(2.0 gb)
2017 Full_MH_SH_items.gz file
(2.2 gb)
         
MH_items 2013 MH_items.gz file
(2.8 gb)
2014 MH_items.gz file
(2.9 gb)
2015 MH_items.gz file
(3.1 gb)
2016 MH_items.gz file
(3.3 gb)
2017 MH_items.gz file
(3.5 gb)
         
MH_SH_items 2013 MH_SH_items.gz file
(942 mb)
2014 MH_SH_items.gz file
(988 mb)
2015 MH_SH_items.gz file
(1.1 gb)
2016 MH_SH_items.gz file
(1.1 gb)
2017 MH_SH_items.gz file
(1.2 gb)
         
SH_items 2013 SH_items.u.gz file
(357 mb)
2014 SH_items.u.gz file
(367 mb)
2015 SH_items.u.gz file
(392 mb)
2016 SH_items.u.gz file
(415 mb)
2017 SH_items.u.gz file
(448 mb)
         
Histogram/Summary Files 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
README 2013 README file
(2.9 kb)
2014 README file
(2.9 kb)
2015 README file
(2.9 kb)
2016 README file
(2.2 kb)
2017 README file
(2.2 kb)
         
hist 2013 hist file
(4.7 kb)
2014 hist file
(4.7 kb)
2015 hist file
(4.8 kb)
2016 hist file
(4.8 kb)
2017 hist file
(4.8 kb)
         
Graph of hist File 2013 Graph of hist file - PDF
(120 kb)
2014 Graph of hist file - PDF
(119 kb)
2015 Graph of hist file - PDF
(119 kb)
             
Graph of Combined hist Files 2013 Graph of combined hist file - PDF
(125 kb)
2014 Graph of combined hist file - PDF
(152 kb)
2015 Graph of combined hist file - PDF
(125 kb)
             
histST 2013 histST file
(315 by)
2014 histST file
(313 by)
2015 histST file
(315 by)
2016 histST file
(315 by)
2017 histST file
(316 by)
         
Graph of histST File 2013 Graph of histST file - PDF
(59 kb)
2014 Graph of histST file - PDF
(59 kb)
2015 Graph of histST file - PDF
(60 kb)
             
Graph of Combined histST Files 2013 Graph of combined histST file - PDF
(81 kb)
2014 Graph of combined histST file - PDF
(84 kb)
2015 Graph of combined histST file - PDF
(84 kb)
             
hist_Full 2013 hist_Full file
(40 kb)
2014 hist_Full file
(41 kb)
2015 hist_Full file
(42 kb)
2016 hist_Full file
(43 kb)
2017 hist_Full file
(44 kb)
         
Graph of hist_Full File 2013 Graph of hist_Full file - PDF
(47 kb)
2014 Graph of hist_Full file - PDF
(48 kb)
2015 Graph of hist_Full file - PDF
(49 kb)
             
histST_Full 2013 histST_Full file
(5.1 kb)
2014 histST_Full file
(5.2 kb)
2015 histST_Full file
(5.4 kb)
2016 histST_Full file
(5.5 kb)
2017 histST_Full file
(5.6 kb)
         
Graph of histST_Full File 2013 Graph of histST_Full file - PDF
(60 kb)
2014 Graph of histST_Full file - PDF
(67 kb)
2015 Graph of histST_Full file - PDF
(66 kb)
             
Related MeSH Files 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Access to All MeSH Files NEW for 2017: The MeSH FTP download site: ftp://nlmpubs.nlm.nih.gov/online/mesh/ now includes separate directories for each release year of MeSH. In addition, MeSH created the folder "MESH_FILES" with the latest release files that are updated every morning Monday - Friday. The yearly release folders span from 2011 to the latest full release which occurs in November of the preceding year (for example, 2016 MeSH was released in November of 2015). A single directory is also included for earlier files from 1999-2010.
README 2013 README file
(1.1 kb)
2014 README file
(1.1 kb)
2015 README file
(1.1 kb)
2016 README file
(1.1 kb)
2017 README file
(817 by)
         
streeYYYY.bin 2013 streeYYYY.bin file
(858 kb)
2014 streeYYYY.bin file
(870 kb)
2015 streeYYYY.bin file
(881 kb)
2016 streeYYYY.bin file
(896 kb)
2017 streeYYYY.bin file
(919 kb)
         
UMLS Semantic Groups File 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
SemGroups.txt 2011 SemGroups.txt file
(5.8 kb)
         
Unique Words from Medline Baseline 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
readme 2013 Unique Baseline Words readme file
(774 by)
2014 Unique Baseline Words readme file
(1.4kb)
2015 Unique Baseline Words readme file
(1.4kb)
2016 Unique Baseline Words readme file
(1.4kb)
2017 Unique Baseline Words readme file
(1.4kb)
         
singleWords 2013 Unique Baseline Words singleWords file
(12 mb)
2014 Unique Baseline Words singleWords file
(13 mb)
2015 Unique Baseline Words singleWords file
(14 mb)
2016 Unique Baseline Words singleWords file
(15 mb)
2017 Unique Baseline Words singleWords file
(16 mb)
         
bigramWords 2013 Unique Baseline Words bigramWords file
(198 mb)
2014 Unique Baseline Words bigramWords file
(233 mb)
2015 Unique Baseline Words bigramWords file
(245 mb)
2016 Unique Baseline Words bigramWords file
(257 mb)
2017 Unique Baseline Words bigramWords file
(283 mb)
         
wrd_stop 2009 Unique Baseline Words StopWords file
(1.9 kb)
         

Copyright, Privacy, Accessibility, Viewers and Players,
Freedom of Information Act, Contact Us
Last Modified: May 31, 2017   
link to https://www.usa.gov/ - image is USA.gov logo link to https://www.hhs.gov - image is HHS.gov logo link to https://www.nih.gov - image is NIH.gov logo link to https://www.nlm.nih.gov - image spells out U.S. National Library of Medicine