Modules

Module 5

The Data Science Life Cycle

Common Questions

What kind of data is web scraping used for?

Common use cases for web scrapping include but not limited to price monitoring, price intelligence (eg. collecting pricing information from competing sites), market research, and even some academic research in history and literature.

What does “csv” and “json” mean? Why do we need those data file types?

“CSV” stands for “Comma Separated Values”, and “JSON” stands for “JavaScript Object Notation”. Csv file holds plain text as a series of values (cells) separated by commas (,) in a series of lines (rows), and this is a very common format of large dataset, which can be read and written easily with built-in functions. JSON file is a language-independent, human-readable language used for its simplicity and is most commonly used in web-based applications. We have different data file formats because different data requires different way of processing, partitioning, compressing, etc, and it also depends on the data type.

Are there any conventions when doing EDA?

Each method is either non-graphical (calculation of summary statistics) or graphical (summarize the data in a diagrammatic or pictorial way). Each method is either univariate (looking at one variable only) or multivariate (looking at multiple variables, but mostly bivariate, meaning only looking at 2 variables). The generally process of doing EDA can be 1) distinguish attributes, 2) univariate analysis, bivariate or multivariate analysis, 3) detect interactions/relationships among attributes, 4) detect missing values, 5) detect outliers, 6) feature engineering

Resources

Lesson & Assignment Notebook

Modules

Module 0: Setting Up

Module 1: What is Data Science

Module 2: Python & Numpy

Module 3: Pandas

Module 4: Data Visualizations

Module 5: The Data Science Life Cycle

Module 6: Intro to Machine Learning

Module 7: Statistics in Data Science

Module 8: SQL

Module 5

The Data Science Life Cycle

Introduction to the Data Science Lifecycle

Data Cleaning

Data Collection

Defining the Question

Exploratory Data Analysis

Modeling and Making Predictions

Common Questions

What kind of data is web scraping used for?

What does “csv” and “json” mean? Why do we need those data file types?

Are there any conventions when doing EDA?

Resources

Lesson & Assignment Notebook

Module Feedback Form

Video Playlist

Presentation Slides