Email LinkedIn

MN3441 Technology for Managerial Data Analysis

Relational Databases (video time: 46 minutes)

video duration 4:23

A database is an organized collection of data. There are several types of databases, which are designed for different purposes. In this lesson, we’ll cover relational databases, which connect data records through tables made of records (rows) and fields (columns).

Motivation

Datasets may be quite large, but the part of interest for a particular application may only be a small subset of the overall data. For example, the FY2018 DON XML data from the Federal Data Procurement System is around 3 GB. However, records in that dataset contain as many as 200 fields, including dates, amounts, parties, descriptions, etc. If you were only interested in tallying the financial obligations, you might only need a small number of those fields to do so. Databases have the capability to separate elements of the data, store them efficiently and allow fields of interest to be queried.

Database Software

Popular relational database technologies include:

We’ll use Microsoft Access in this lesson, not because it’s the best, but because it’s a desktop application with a graphical user-interface that may be helpful for beginners. Typically, when organizations store data, their databases are stored on servers to which individual users or applications connect, using software tools or programming languages. In Access, we can both build the database and query the data, so we’ll use it as a self-contained environment.

...