Dear Student,

This course is scheduled to be retired on Aug. 30, 2024. You may continue to work on this course until then. We are not replacing this course at this time.  Please browse this subject to find other comparable courses.

 

NOTICE: This is an older course recorded with Adobe Connect and/or Vimeo recordings. We are currently working to replace the recordings with new Zoom recordings.  Please don't hesitate to email us at homeschoolconnections@gmail.com with any questions.

How to get the most out of Data Science, Part One with Domenico Ruggiero:

  • Ensure all of the software needed for the course is installed and running as expected.
  • Anaconda with Python 3 including Jupyter Notebook
  • Closely follow along with the lectures.  Do as I do to learn the techniques and commands needed to perform the analysis.
  • Practice doing more data analysis on your own.  Experiment with variations of the work we do in the lectures.
  • Submit the Jupyter Notebooks.  Parents can get a copy of the teacher's Jupyter Notebook to compare against their student's Jupyter Notebook.  It need not be 100% identical (and can include more if they experimented with additional analysis), but the "essence" of the analysis should be there in the form of reaching particular conclusions (a chart, specific numerical results, etc.).
  • Familiarize yourself with the extra resources provided with the lectures.  It's impossible to cover every single aspect of the data analysis modules in a course like this.  Familiarization doesn't need to be memorization, but browse through the resources so that you are aware of what is possible.  Some quiz questions will require you to use these resources to find the answer.

 

PLEASE NOTE: Data Science, Part One can be taken without Data Science, Part Two.  However, it is highly recommended that Data Science, Part One be completed prior to Data Science, Part Two.

 

Total Classes: 15

Duration:  90 minutes

Prerequisite:

  • An understanding of algebra is recommended for an understanding of polynomial equations, algebraic reasoning, and problem-solving.
  • An understanding of matrix mathematics and statistics is helpful but NOT required – they will be discussed in the lectures.
  • Previous computer programming experience -- Python programming preferred but other programming languages are acceptable.  Computer Programming 101 (available as a recorded course through Unlimited Access) and/or Introduction to Computer Science (also available as a recorded course through Unlimited Access) would provide sufficient prerequisite experience.  Much of the analysis will take place using Python-based computer programs.
  • General familiarity with computers including the ability to open applications, use menu-driven commands, and type using the keyboard so that the emphasis of the lessons is on specific programming assignments and related data-science topics

Suggested Grade Level: 9th to 12th grade.

Suggested Credit: One full semester Computer Science or Math

 

Instructor: Domenico Ruggiero, MS-EM

 

Course Description: This valuable course is the first in a two-semester exploration of many topics associated with data science.  In many industries – agriculture, medical fields, cyber-security, manufacturing, and more – and from within the small-scale family business to big-data corporations like Google, the availability of data is almost everywhere.  The ability to work with that data to gain insights into correlations, the visualization of that data in a variety of charts and plots, to be able to identify data that appears to be an outlier from the larger dataset and/or from the trends, and to predict future outcomes based upon variable inputs, these are all just some of the ways that data is used to assist people in determining valuable insights in otherwise chaotic and disconnected pieces of information.

Because data science can be applied to so many working environments, the study of it is no longer just limited to those who are interested in a career in Information Technology (IT).  Data science is becoming one of the fastest growing professional careers available because of its ability to find a “home” in so many industries.

 

Course Outline:  Topics subject to minor changes.  Topics will be interspersed throughout lectures and will span multiple weeks.

  • Data Science
  • What is it?
  • Who uses it?
  • Workflows and methodologies used by data scientists
  • Python programming for data science
  • The development environment (Anaconda, Jupyter Notebooks, and Spyder)
  • Review of Python programming fundamentals and Python data types (variables, lists, dictionaries, etc.)
  • Python functions and some of the Python modules we will be using (Pandas, NumPy, scikit-learn, and more)
  • Data Analysis
  • Exploring data sets of various types (sales data, website visitor logs, user profile data, etc.)
  • Cleaning "dirty" datasets
  • Review of (or introduction to) statistical math methods
  • Data visualization in Python and spreadsheet applications

 

Course Materials:  All course materials are to be provided by the professor.  Software to be installed -- Anaconda (https://www.anaconda.com) with Python 2.7 version (NOT Anaconda with Python 3.x version) which is available for Windows, Mac, and Linux operating systems.  Within Anaconda, ensure that the Jupyter Notebook and Spyder add-in applications are installed. The open source Anaconda Distribution is the easiest way to do Python data science and machine learning.

 

Homework:  Computer-generated quizzes, at-home analytical exercises, and exploration of methodologies applied towards items of personal interest.  Spreadsheet applications like Microsoft Excel and/or Open Office (https://www.openoffice.org) may also be utilized. Students can expect 2-6 hours of studies outside of class depending upon their proficiency with programming in Python and their previous familiarity with algebra, matrix mathematics, and statistics.  If some of the math is new, then naturally there’s time that would need to be spent on learning math before it can be effectively programmed.