The baselines are normally generated towards the middle of November each year and contain all completed citations in MEDLINE as of that date. The baselines represent MEDLINE after the year-end processing has been completed. This means that the records have been revised with the upcoming year's new MeSH vocabulary terms. We currently have available the 2002 - current year MEDLINE/PubMed Baselines. The naming of the baselines represents this year-end processing. For example, the 2002 MEDLINE/PubMed Baseline contains all completed citations from the mid-1960's until the date the baseline was created in late November 2001 with the year-end processing assigning appropriate 2002 MeSH vocabulary terms, thus it is a baseline for the 2002 year.
The baselines also contain citations that are not MEDLINE. All of
the baselines we have stored (2002 on) contain "Out-of-scope" citations
which were renamed to "PubMed-not-MEDLINE" starting with the 2004
MEDLINE/PubMed Baseline. The PubMed-not-MEDLINE status refers to
citations that reside in PubMed from journals included in MEDLINE and
have undergone quality review but are not assigned MeSH headings because
the cited item is not in scope for MEDLINE either by topic or by date of
publication. Citations in the Out-of-scope or PubMed-not-MEDLINE status
make up a very small percentage of the total number of citations
contained in the baselines (For example, 0.51% or 75,271 records in
the 2005 baseline and 1.8% or 323,919 records in the 2009 baseline).
Starting with the 2005 MEDLINE/PubMed Baseline, OLDMEDLINE citations are
also included in the baselines. The OLDMEDLINE citations make up
approximately 11% of the total number of baseline citations.
The OLDMEDLINE citations are from international
biomedical journals covering the fields of medicine, preclinical
sciences, and allied health sciences. The citations were originally
printed in hardcopy indexes published prior to 1966. For
additional information, please refer to the following URL:
https://www.nlm.nih.gov/databases/databases_oldmedline.html.
In the 2005 baseline the subject indexing from the OLDMEDLINE citations
were stored solely in the "Other Term" (or "OT") tagged fields and not
the
MeSH Terms (or MH) tagged fields. This means that searching the 2005
baseline from our MBR Query Tool via the MH field does not include any
OLDMEDLINE citations. The only way to include OLDMEDLINE records in the
2005 baseline is to do a timeframe query without specifying any field
specific search criteria. Beginning with the 2006 baseline, Other Terms
are starting to be mapped to current MeSH Terms so that searching via
the MH field may retrieve some OLDMEDLINE records, but, not necessarily
the complete set of possibilities.
Starting with the 2007 MEDLINE/PubMed Baseline, on records where all the
OLDMEDLINE terms are converted to MeSH Headings, the citation status
changes to MEDLINE. You need to rely on the
<CitationSubset>OM</CitationSubset>
element to determine if a citation is in the OLDMEDLINE subset.
Select the Baseline year to see the directory listing for that MEDLINE/PubMed Baseline allowing you to download all of the files for that baseline year. The DTDs link is for a gzipped tar file containing all of the required DTD files for that year's baseline files.
Baseline | Created | # Files | # Citations | DTD Files |
---|---|---|---|---|
2021 | December 14, 2020 | 1062 | 31,850,051 | DTDs |
2020 | November 19, 2019 - December 3, 2020 | 1015 | 30,420,660 | DTDs |
2019 | December 10 & 11, 2018 | 972 | 29,138,916 | DTDs |
2018 | November 27 & 28, 2017 | 928 | 27,837,540 | DTDs |
2017 | December 13, 2016 | 892 | 26,759,399 | DTDs |
2016 | November 20, 2015 | 812 | 24,358,442 | DTDs |
2015 | November 24, 2014 | 779 | 23,343,329 | DTDs |
2014 | November 21, 2013 | 746 | 22,376,811 | DTDs |
2013 | November 15 & 16, 2012 | 717 | 21,508,439 | DTDs |
2012 | November 18, 2011 | 684 | 20,494,848 | DTDs |
2011 | November 19, 2010 | 653 | 19,569,568 | DTDs |
2010 | November 20, 2009 | 617 | 18,502,916 | DTDs |
2009 | November 21 & 22, 2008 | 593 | 17,764,826 | DTDs |
2008 | November 16 & 17, 2007 | 563 | 16,880,015 | DTDs |
2007 | November 17 & 18, 2006 | 538 | 16,120,074 | DTDs |
2006 | November 18 & 19, 2005 | 516 | 15,433,668 | DTDs |
2005 | November 20, 2004 | 500 | 14,792,864 | DTDs |
2004 | November 14-18, 2003 | 417 | 12,421,396 | DTDs |
2003 | November 1-4, 2002 | 396 | 11,847,524 | DTDs |
2002 | Approx. November 21, 2001 | 379 | 11,299,108 | DTDs |
We generate a large number of data files during our normal processing of each set of baseline files. We make available the files that we think others might be able to use with the goal of trying to reduce any duplication of effort.
The MeSH FTP download site: ftp://nlmpubs.nlm.nih.gov/online/mesh/ now includes separate directories for each release year of MeSH. In addition, MeSH created the folder "MESH_FILES" with the latest release files that are updated every morning Monday - Friday. The yearly release folders span from 2011 to the latest full release which occurs in November of the preceding year (for example, 2016 MeSH was released in November of 2015). A single directory is also included for earlier files from 1999-2010.
Semantic Types and Groups: A parsable list of Semantic Types and their abbreviations from the UMLS and a parsable list of Semantic Groups and their mappings to the Semantic Types.
Semantic Types and Groups
The MEDLINE N-gram Set from
The SPECIALIST Lexicon. N-grams of size 1 - 5 are identified from all of the Title and Abstract for each MEDLINE citation in baseline.
Latest information on the SPECIALIST Lexicon MEDLINE N-Gram Set
Copyright,
Privacy,
Accessibility,
Viewers and Players,
Freedom of Information Act, Contact Us Last Modified: March 15, 2021 Server: ii-public2 |
![]() |
![]() |