The Intermediate Guide to 180 Days Data Science Learning Plan
So, you have made your decision. You have decided to pursue a career as a data scientist. Great Choice! But where do you start? There are just too many unstructured materials out there, and this is a never-ending loop. How do you pick where to begin? When should you move on to a new topic? Is there anything you’ve missed?
What should you focus on? It’s not enough to know how to use machine learning algorithms. Among other things, it’s a mix of data engineering and machine learning. There’s a lot to digest.
Don’t worry. We have got this! We’re ecstatic to present the ideal chance for all aspiring data scientists: a learning plan on how to start data scientist journey in just 180 days. This learning plan will walk you through a systematic learning route so you can stay focused on the important aspects of your data science journey and avoid distractions.
Start today and offer yourself the gift of focused attention in your data science career!
OVERVIEW:
- 1. Your journey to learn how to become a data scientist in 180 days has come to a conclusion
- 2. Here’s a day on day plan that you can follow to help you achieve your goals here
- 3. Here’s how you can create a data science portfolio project
- 4. Machine Learning Project structure to avoid confusion
GOAL:
- 1. How much time should you devote?
- 2. What are the things you need to do?
- 3. What are the exact steps you need to follow
PLAN PREREQUISITES:
- 1. Laptop (Minimum 4 GB RAM)
- 2. Data Science Mentor
- 3. Quality Study Material (Internet & Resources)
PREPARATION TIME:
- 1. (10 – 15) hrs/week
- – Weekdays (1 – 2 hrs)
- – Weekends (6 – 7 hrs)
First Month:
- 1. Day 1 – 15: Learn Python for Data Science
- 2. Day 16 – 30: Master Statistics for Data Science
Second Month:
- 1. Day 31 – 45: Explore Python Package (NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn)
- 2. Day 46-60: Implement EDA on real-world datasets
Third Month:
- 1. Day 61 – 75: Focus on ML Algorithms (Linear Regression, Logistic Regression, Decision Tree, Random Forest)
- 2. Day 79-90: Implement ML prediction algorithms on real-world datasets
Fourth Month:
- 1. Day 91 – 105: Focus on Unsupervised ML Algorithms (Clustering, Principal Component Analysis)
- 2. Day 106- 120: Learn Apriori Algorithm, Recommender System, Anomaly Detection
Fifth Month:
- 1. Day 121 – 135: Ensemble Learning, Stacking, Optimization Techniques, Model Deployment
- 2. Day 106- 120: Day 136 – 150: Implement a deployable ML algorithm on real problems and datasets
Sixth Month:
- 1. Day 151 – 180
- – Build Github Profile
- – Build Strong Portfolio
- – Write well-researched articles on Medium, Linkedin, Blogs
After completing 180 days, you will have curiosity about how to create the best data science portfolio projects.
Step 01: Select the ML area you want to use
Step 02: Find the business problem to solve
Step 03: Find a dataset
Step 04: Create an educational jupyter notebook to show your project and some python files that will act as the libraries that you’ll call in the notebook
Step 05: Create the structure of the notebook using 6 sections: introduction, data prep, modeling, evaluation, deployment, and conclusion
Step 06: Explore the data and perform the necessary data transformations, explain in the notebook your data prep process.
Step 07: Operationalize the model on your local server, create a simple API and show the response of a query
Step 08: Create a final section with a summary, a discussion of future work, and references
MACHINE LEARNING PROJECT STRUCTURE:
Having a well-organized general Machine Learning project structure makes it easy to understand and make changes. Moreover, this structure can be the same for multiple projects, which avoids confusion.
├── Machine Learning Project Structure <- Project Main Directory
| |── api <- Consists of scripts that serialize the API calls and act as an endpoint facilitating project functions.
│ ├── data <- data in a different format
| | ├── external <- data from a third-party source
| | ├── interim <- Intermediate data that has been transformed
| | ├── processed <- The final, canonical data sets for modeling
| | ├── raw <- The original, immutable data dump
| ├── evaluation
| | ├── evaluate_model_01.py <- Different Matrices used to evaluate the model
| | ├── evaluate_model_02.py <- Different Matrices used to evaluate the model
│ ├── examples
| | ├── feature_01.md <- It consists of a doc and an example showing how we can use the project, different functions, etc.
| | ├── feature_02.md <- It consists of a doc and an example showing how we can use the project, different functions, etc.
│ ├── notebooks <- All the ipython notebooks used for EDA, visualization, and verification of concept (POC).
│ ├── src
| | ├── dataset
| | | ├── download_dataset.py <- Scripts to download the dataset or access the dataset from data folder
| | ├── model
| | | ├── train_model.py <- Scripts to train the model
| | | ├── test_model.py <- scripts to test the model
| | | ├── predict_model.py <- Scripts to predict the model
| | ├── network
| | | ├── approach_01.py <- Neural network schema
| | ├── weights
| | | ├── utils.py.py <- folder to save weights
| | ├── visualization
| | | ├── visaulization_model.py <- Scripts to visualize the model
| | ├── utils.py <- different utils functions
| | ├── project.py <- project pipeline
│ ├── project_cli <- Scripts which facilitates Command-line interface for training, testing, and other features.
| | ├── train_cli.py
| | ├── test_cli.py
│ ├── task <- Contains batch script which can be used for downloading files from web or batch to autotest, lint project.
| | ├── download.sh
| | ├── lint.sh
| | ├── est_api.sh
│ ├── training <- Contains all experiments preparation, way on auto running experiments and updating metadata.
| | ├── experiment
| | | ├── utils.py
| | ├── prepare_experiment.py
| | ├── run_experiment.py
| | ├── update_metadata.py
| ├── sqs
| | ├── SQSSender.py <- sending a message to Amazon SQS
| ├── aws
| | ├── download_files.py <- uploading and downloading files from Amazon S3 Bucket
│ ├── config.ini <- Contains configuration information of the project
│ ├── .pre-commit-config.yaml <- identifying simple issues before submission to code review
│ ├── .gitignore <- tells Git which files to ignore when committing your project to the GitHub repository
│ ├── .env <- used to hide confidential data like AWS Secret Key, AWS Access Key, S3 Bucket Name, etc...
│ ├── Dockerfile <- This helps in dockerizing the whole system
│ ├── requirements.txt <- requirements files contains all the module used while building the project.
│ ├── application.py <- python module that processes event i.e. function is invoked, Lambda runs the handler method.
│ ├── README.md <- The top-level README for developers using this project
Note: The data folder and .env file won’t appear in Github. It will be in your local folder. This is not pushed to GitHub as it will be in the ignore list (.gitignore file). If you want to check in that also, just comment out in the .gitignore file and add the data folder to Github.
☺ Thanks for your time ☺
What do you think of this “The Intermediate Guide to 180 Days Data Science Learning Plan“? Let us know by leaving a comment below. (Appreciation, Suggestions, and Questions are highly appreciated).