Relational Databases (video time: 46 minutes)
A database is an organized collection of data. There are several types of databases, which are designed for different purposes. In this lesson, we’ll cover relational databases, which connect data records through tables made of records (rows) and fields (columns).
Motivation
Datasets may be quite large, but the part of interest for a particular application may only be a small subset of the overall data. For example, the FY2018 DON XML data from the Federal Data Procurement System is around 3 GB. However, records in that dataset contain as many as 200 fields, including dates, amounts, parties, descriptions, etc. If you were only interested in tallying the financial obligations, you might only need a small number of those fields to do so. Databases have the capability to separate elements of the data, store them efficiently and allow fields of interest to be queried.
Database Software
Popular relational database technologies include:
- Microsoft Access
- Microsoft SQL Server
- Oracle
- IBM DB2
- MySQL (open source)
- PostgreSQL (open source)
- SQLite (open source)
We’ll use Microsoft Access in this lesson, not because it’s the best, but because it’s a desktop application with a graphical user-interface that may be helpful for beginners. Typically, when organizations store data, their databases are stored on servers to which individual users or applications connect, using software tools or programming languages. In Access, we can both build the database and query the data, so we’ll use it as a self-contained environment.
MN3441 Technology for Managerial Data Analysis