CS3315, Machine Learning and Big Data (3-1)
Syllabus for Fall 2019
Catalog description
A survey of methods for process large amounts of data and classifying and analyzing it using machine-learning methods.� Big-data topics examine the obstacles to processing including managerial obstacles, problems of data consistency, problems of data accuracy, data-reduction methods, and big-data distributed processing methods.� Topics on machine learning include concept learning, decision trees, Bayesian models, linear models, neural networks, case-based reasoning, genetic algorithms, sequence learning, and assessment techniques.� Students will do projects with software tools on military data. ��Prerequisites: CS3310, Artificial Intelligence, and one college level course in programming.
Instructor
Prof. Neil Rowe, ncrowe@nps.edu, (831) 656-2462, GE-328 faculty.nps.edu/ncrowe.� No official office hours but he is usually there 900-1700 when not teaching.
Grading
Three homework assignments, a test, and a term project. �They are weighted approximately equally.� Usually averages on the homework and test are 70%, so the questions are difficult. However, grades are assigned relative to the rest of the class so you are not penalized when a question is difficult for everyone.� Overall, a median grade in this course is between an A- and B+. �You will only get below a B if you do not appear to be doing the required work.
Homework should be submitted as a paper (hard) copy of a single document with your name at the top of the document, all material for each problem together (no appendices), and without the text of the problems (just your answers).� Generally homework questions are written so quoting or paraphrasing something written somewhere else will not answer the question.� Homework submitted after the due date but before solutions are made available is subject to a 15% penalty; homework will not be accepted after solutions are made available.
Homework 1 can be done in two-person teams without communicating with any other members of the class.� Homeworks 2 and 3 must be done individually without communicating with anyone except the instructor.� You can look up or discuss general knowledge of the subject, but you cannot discuss the homework questions with anyone besides the instructor.� Spreadsheet programs like Excel are not allowed for the assignments; you can use a calculator, but all the steps of your calculations need to be written out on the homework.
Since homework and test averages run around 70%, so expect them to be reasonably difficult and require some time to do.� It is important to pace yourself on the homework and start working on the homework as soon as it is handed out.� If you have trouble figuring out what to do for a homework question, discuss with the instructor well before it is due. �Do not expect to be able to only work on an assignment the night before it is due.
The term project can be done individually, in a two-person team, or a three-person team.� More details are given below.
Laboratory
The lab hours will be used for demonstrations as well as making up missed lectures due to instructor travel.
Textbook and notes
The required textbook is I. Witten, E. Frank, M. Pal, and C. Hall, Data Mining: Practical Machine Learning Tools and Techniques, fourth edition, Morgan Kaufmann, 2016. �The third edition is sufficient, but the fourth edition is better.� You should be able to get free access to this through Safari Books On Line.� Go to http://techbus.safaribooksonline.com/ ?uicode=dodnavy and register for an account. If you get the "no accounts available", send a request email to dod@safaribooksonline.com.
We will use a Sakai site for course information, slide revisions, and additional readings.� We will also email you copies of the slides before the first class; please report if you do not receive them.
This course is focused on algorithms, not training to use software.� There are many tools for machine learning and they are mostly easy to use, so we expect you can figure how to use them on your own as needed for the project.
Lectures
You may not use digital devices during lectures.
Schedule
By 10/7: Read chapter 3 of the Schonberger book (posted on the Sakai site)
By 10/14: Read Chapter 1 and sections 5.1, 5.2, 5.3, and 5.8 in both editions of the textbook; also skim chapter 9 of the Grus book to be familiar with what is there when doing your term project
By 10/21: Read Chapters 2 and 3, and sections 4.4, and 4.5 in both editions
On 10/17: No class
On 10/18: Homework 1 due by 1500
By 10/28: Read sections 7.2 in 3rd edition or 8.2 in 4th edition; read 4.1 and 6.2 in both editions
11/4-11/6: No class
By 11/5: Read section 4.3 and 6.1 in both editions
On 11/11: Homework 2 due by 1500
By 11/12: Read section 4.2 in both editions
By 11/19: Read section 4.6 in both editions
By 11/27: Read sections 4.7 and 4.8 in both editions, and 7.1 in the 3rd edition and 8.1 in 4th edition
By 12/3: Read chapters 1, 2, 7, and 8 of the Smith book (posted on the Sakai site)
On 12/5: Homework 3 due by 1500
On 12/13: Test in class (last day of class)
Term project writeup due electronically to ncrowe@nps.edu by 12/20 by 1500 PST
Outline of the course
Part 1: Introduction to big data
Part 2: Data setup
Part 3: Big-data processing
Part 4: Introduction to machine learning
Part 5: Logical learning
Part 6: Decision graphs
Part 7: Bayesian models
Part 8: Linear models
Part 9: Neural networks
Part 10: Distributional models
Part 11: Case-based reasoning
Part 12: Sequence learning
Part 13: Miscellaneous learning methods
Part 14: Further directions and conclusions
Guidelines for the term project
General guidelines:
For the writeup: