High Quality Data to the Rescue

Rapid access to high quality COVID-19 and related contextual data is critical for policymakers, researchers, the media, and the public at large, to understand and effectively respond to emerging evidence. Such data is, however, not particularly easy to find and access, and is rarely in pristine or ready-for-use state. More importantly, it is rarely published in machine friendly formats or "as a service", locking it away from the information systems, data scientists, developers, or machine learning environments. These issues result in significant data wrangling, unreliable modeling, and misinterpretation while dramatically slowing down our capacity to react.

Given the critical needs for timely access to high quality COVID-19 data, we strongly believe our Rich Data Services platform, combined with our strong expertise in data and metadata management, can make a significant difference in addressing this global crisis, both in the short and long term.

Through the delivery of free high quality COVID data as a service, this project aims at:

  • Offering data scientists, developers, and other users a flexible platform for the discovery, extraction, and online or offline analysis of data
  • Granting immediate data access to computer applications or web sites, enabling rapid retrieval, analysis, visualization, machine learning, or other purposes.
  • Addressing quality issues by delivering curated data in open formats, ready for immediate use, allowing the focus to be on the research, and reducing wrangling and other time-consuming tasks
  • Facilitating harmonization and data linking through the use of standard classifications and common variable naming conventions
  • Promoting the use of open data, domain standards, and related best practices
  • Making curated data and tools developed for this project available in a public repository

Can't Wait to Get Started?

What Data Can I Find Here?

We are in the initial stages of this project and are initially focusing on popular or quality pilot datasets. We are currently providing access to the following:

  • Case counts based on the Johns Hopkins daily report organized by Country, U.S. States, and U.S. County
  • U.S State level case and tests data from the COVID Tracking Project
  • Datasets collected directly fron selected U.S. State agencies (New York, Ohio, Tennessee)
  • Case data from the Government of Canada and Statistics Canada
  • Canada Public Use Microdata files for the just released Impacts on COVID-19 Survey the Labour Force Survey (Jan 2018 - Apr 2020), and other relevant datasets

See our public COVID-19 GitHub project for more information on these resources, along with source data and tools we are using to curate and publish in our catalog. We have identified several other potential sources, and anticipate for our collection to rapidly grow over time. We naturally welcome your suggestions.

Note that the nature of the data drives how the RDS API and applications should be used. Make sure you understand the data before jumping into the applications or analysis. Traditional survey data is naturally easy to browse or tabulate, but may require the use of statistical weights. Time series data, commonly used by many of the COVID-19 datasets, must always include a time dimension in their analysis. Be a responsible user!

For Users & Researchers

COVID-19 data can be accessed through the following web applications, built for your convenience on top of the RDS API, and offering a browser-based user interface:

  • The RDS Explorer, which allows you to browse record level data, create subsets, and download in various open formats for offline analysis.
  • The RDS Tabulation Engine: for quickly creating analytical tables or data extracts for visualizations.

For Developers & Data Scientists

If you are an application/web developer, scientist interested in data as a service, or into machine learning, RDS has you covered. No longer shall you need to spend hours data wrangling or coping with proprietary systems.

RDS is a REST based Application Program Interface (API) that can be used to concurrently query data and metadata from your favorite package or environment, to support web portals, visualizations, applications, analysis, or download.

For Data Publishers

RDS is, amongst other things, a modern data publication solution. For this particular project, we act as the data broker by collecting, curating, and publishing the data ourselves. Contact us if you do have data that you feel would be a good addition to our COVID-19 project data collection.

Our RDS platform is also available to early adopters, visit our Rich Data Services website for more information.

If you need assistance getting started or have any question, consult our knowledge base or simply open a free helpdesk request for assistance..

What's Next?

RDS Explorer, RDS TabEngine, or above examples, are just a few of the many potential uses of the RDS API. Building your own applications or visualizations, accessing the data and metadata from your favorite analytical tools, integrating in your web portal, or enabling machine learning is where the power of the platform lies. So unleash your creativity, skills, and passion and use RDS in your own way.

Our project objectives are to help deepen the understanding of COVID-19, support better research, and foster evidence-based decision making. We also hope to to encourage the adoption of best practices for the publication of data. Should you need assistance, you can:

We're looking forward to hearing your stories and seeing your work.

What next

Can You Help?

Our COVID-19 project is currently primarily supported through internal resources, which can be challenging for a small company whose day to day business needs to continue. We welcome any kind of support you may be able to provide to help us focus on thie COVID-19 project. This includes financial support, in-kind contributions, and domain expertise. At this time, we are particularly looking for:

  • Funding opportunities, and assistance preparing or submitting proposals
  • Sponsorship or Donations
  • Amazon Web Services credits to operate the infrastructure (or engineering support)
  • Suggestions/feedback on how we can improve the data offfering, data quality, and our platform
  • Spreading the word about the project

About Us

Metadata Technology North America (MTNA) is a Knoxville, TN based small business that provides unique technological solutions and expertise around the management of statistical data with a focus on leveraging modern information technology, global metadata standards, and data management best practices. For over two decades, it has acted as an innovation enabler to a multitude of governmental agencies and research institutions. Central to MTNA's strategy is ensuring that data is surrounded by all the necessary knowledge (or "metadata") to support its effective use, enable machine automation, maximize quality, facilitate discovery, and access for sound research and decision-making. Visit http://www.mtna.us for more information

What next

Frequently Asked Questions

  • How are datasets selected for this project? keyboard_arrow_down
    This is at this point mainly based on our team finding data of interest, and our ability to ingest them rapidly in the catalog. Our current focus on the United States and Canada, and we are also looking into Europe and other highly affected countries. Africa is also high in our priority. There are many data sources out there, and we welcome suggestions.
  • How is RDS COVID data & metadata curated? keyboard_arrow_down

    We typically perform the following tasks when assessing and converting data for publication in the RDS COVID-19 Catalog:

    • Restructure the data as needed to facilitate analysis with RDS or statistical and analytical packages
    • Convert names into standard codes. For example, we typically change country, subdivisions, and other geospatial entities into ISO, ANSI, FIPS, codes.
    • Convert dates to standard ISO formats, and extend the dataset by adding additional time variables (particularly for time series)
    • Capture core metadata (data dictionary), such as variable names, labels, descriptions, classifications
    • Load the data into a data warehouse

    Once a workflow is well-defined, we then automate the process to ensure data refreshes with the source.

  • What standards is this project using? keyboard_arrow_down

    On the metadata aspects, our management practices and platform are informed by internationally accepted standards for the management of official statistics and scientific data, specifically the Generic Statistical Information Model (GSIM) and the Data Documentation Initiative. These have been endorsed by the High Level Group for the Modernization of Official Statistics, the Research Data Alliance, and numerous data archives, research groups, and organizations around the world. We as much as possible aim to abide by the FAIR principles.

    • ISO 3166 for countries, their subdivisions
    • ISO 8601 for dates and other temporal variables
    • For United States, FIPS 5-2 and FIPS 6-4 (while technically obsolete, this remains widely used) as well as other coding schemes used by the U.S. census Bureau.
    • Statistics Canada classifications
    • Other international and national classifications maintained or endorsed by the United Nations Classification Division.

    Once a workflow is well-defined, we then automate the process to ensure data refreshes with the source.

The Science of Better Data.

Work with a team passionate about data.

Contact Us