CS3315, Machine Learning and Big Data (3-1)

Syllabus for Fall 2019

 

Catalog description

 

A survey of methods for process large amounts of data and classifying and analyzing it using machine-learning methods.� Big-data topics examine the obstacles to processing including managerial obstacles, problems of data consistency, problems of data accuracy, data-reduction methods, and big-data distributed processing methods.� Topics on machine learning include concept learning, decision trees, Bayesian models, linear models, neural networks, case-based reasoning, genetic algorithms, sequence learning, and assessment techniques.� Students will do projects with software tools on military data. ��Prerequisites: CS3310, Artificial Intelligence, and one college level course in programming.

 

Instructor

 

Prof. Neil Rowe, ncrowe@nps.edu, (831) 656-2462, GE-328 faculty.nps.edu/ncrowe.� No official office hours but he is usually there 900-1700 when not teaching.

 

Grading

 

Three homework assignments, a test, and a term project. �They are weighted approximately equally.� Usually averages on the homework and test are 70%, so the questions are difficult. However, grades are assigned relative to the rest of the class so you are not penalized when a question is difficult for everyone.� Overall, a median grade in this course is between an A- and B+. �You will only get below a B if you do not appear to be doing the required work.

 

Homework should be submitted as a paper (hard) copy of a single document with your name at the top of the document, all material for each problem together (no appendices), and without the text of the problems (just your answers).� Generally homework questions are written so quoting or paraphrasing something written somewhere else will not answer the question.� Homework submitted after the due date but before solutions are made available is subject to a 15% penalty; homework will not be accepted after solutions are made available.

 

Homework 1 can be done in two-person teams without communicating with any other members of the class.� Homeworks 2 and 3 must be done individually without communicating with anyone except the instructor.� You can look up or discuss general knowledge of the subject, but you cannot discuss the homework questions with anyone besides the instructor.� Spreadsheet programs like Excel are not allowed for the assignments; you can use a calculator, but all the steps of your calculations need to be written out on the homework.

 

Since homework and test averages run around 70%, so expect them to be reasonably difficult and require some time to do.� It is important to pace yourself on the homework and start working on the homework as soon as it is handed out.� If you have trouble figuring out what to do for a homework question, discuss with the instructor well before it is due. �Do not expect to be able to only work on an assignment the night before it is due.

 

The term project can be done individually, in a two-person team, or a three-person team.� More details are given below.

 

Laboratory

 

The lab hours will be used for demonstrations as well as making up missed lectures due to instructor travel.

 

Textbook and notes

 

The required textbook is I. Witten, E. Frank, M. Pal, and C. Hall, Data Mining: Practical Machine Learning Tools and Techniques, fourth edition, Morgan Kaufmann, 2016. �The third edition is sufficient, but the fourth edition is better.� You should be able to get free access to this through Safari Books On Line.� Go to http://techbus.safaribooksonline.com/ ?uicode=dodnavy and register for an account. If you get the "no accounts available", send a request email to dod@safaribooksonline.com.

 

We will use a Sakai site for course information, slide revisions, and additional readings.� We will also email you copies of the slides before the first class; please report if you do not receive them.

 

This course is focused on algorithms, not training to use software.� There are many tools for machine learning and they are mostly easy to use, so we expect you can figure how to use them on your own as needed for the project.

 

Lectures

 

You may not use digital devices during lectures.

 

Schedule

 

By 10/7: Read chapter 3 of the Schonberger book (posted on the Sakai site)

By 10/14: Read Chapter 1 and sections 5.1, 5.2, 5.3, and 5.8 in both editions of the textbook; also skim chapter 9 of the Grus book to be familiar with what is there when doing your term project

By 10/21: Read Chapters 2 and 3, and sections 4.4, and 4.5 in both editions

On 10/17: No class

On 10/18: Homework 1 due by 1500

By 10/28: Read sections 7.2 in 3rd edition or 8.2 in 4th edition; read 4.1 and 6.2 in both editions

11/4-11/6: No class

By 11/5: Read section 4.3 and 6.1 in both editions

On 11/11: Homework 2 due by 1500

By 11/12: Read section 4.2 in both editions

By 11/19: Read section 4.6 in both editions

By 11/27: Read sections 4.7 and 4.8 in both editions, and 7.1 in the 3rd edition and 8.1 in 4th edition

By 12/3: Read chapters 1, 2, 7, and 8 of the Smith book (posted on the Sakai site)

On 12/5: Homework 3 due by 1500

On 12/13: Test in class (last day of class)

Term project writeup due electronically to ncrowe@nps.edu by 12/20 by 1500 PST

 

Outline of the course

 

Part 1: Introduction to big data

Part 2: Data setup

Part 3: Big-data processing

Part 4: Introduction to machine learning

Part 5: Logical learning

Part 6: Decision graphs

Part 7: Bayesian models

Part 8: Linear models

Part 9: Neural networks

Part 10: Distributional models

Part 11: Case-based reasoning

Part 12: Sequence learning

Part 13: Miscellaneous learning methods

Part 14: Further directions and conclusions

 

Guidelines for the term project

 

General guidelines:

 

 

For the writeup: