Course length: 5-days

Course overview

Data science is transforming industries---and CompTIA DataX proves you’re ready to lead. With over 1.05 million U.S. job postings requiring data science skills and a 35% projected job growing over the next decade, the demand for skilled professionals is only accelerating.

The CompTIA DataX course examines complex real-world tasks, from optimizing machine learning models to deploying data pipelines. With labs, live exercises, and hands on projects, you’ll gain the knowledge to solve meaningful business problems through data.

Course Objectives

  • Learn advanced data-science skills for deploying real-world solutions.
  • Apply mathematical and statistical techniques including linear algebra and calculus, in business contexts.
  • Navigate the data science lifecycle, for collection and transformation to communication and deployment.
  • Build and refine predictive models using machine learning and deep learning techniques.
  • Apply C:/CE, DevOps, and MLOps for enterprise grade data processing workflows.

Course Prerequisites

5+ years of experience in data science, computer science or a related field. Strong foundational knowledge in statistics, mathematics and machine learning.

Course Outline

Illustrating the Data Science Lifecycle

  • CRISP-DM and other common lifecycle frameworks
  • Folder structures, APIs, and code quality
  • Into to R/Python syntax
  • Live Lab Exploring the DataX Environment

Analyzing Business Problems

  • Identifying business needs and solutions
  • Cost-benefit analysis and model selection
  • Privacy, masking and ethical considerations
  • Lab: Predictive Cost Modeling

Collecting Data

  • Structured vs unstructured data
  • Synthetic data, lineage, and ingestion
  • Pipelines, storage, and error handling
  • Lab: Data Ingestion Optimization

Cleaning and Preparing Data

  • Wrangling, transformation, and feature engineering
  • Data processing infrastructure and scaling
  • Lab: EDA for Anomaly Detection

Describing Data Features

  • Time series, lag, seasonality, and granularity
  • Matrix/vectorization and multivariate issues
  • Lab: Feature Interpretation

Exploring Data

  • FDA tasks, visualization and statistical analysis
  • Regression tests and probability distributions

Utilizing Unsupervised Learning

  • Clustering dimensionality reduction and heuristics
  • Lab: Cluster Analysis for User Behavior

Navigating Model Selection

  • Research reviews, constraints, and mathematical and statistical techniques
  • Apply linear algebra and calculus in modeling
  • Time series forecasting and survival analysis
  • Lab: Longitudinal Prediction

Employing Machine Learning Methods

  • Supervised, unsupervised, and activation functions
  • Drift monitoring and model tuning
  • Lab: Logistical Regression, Decision Trees, Random Forest

Experimenting with Deep Learning

  • Neural networks, layers, and activation functions
  • Embeddings, OCR and image classification
  • Lab: Deep Learning Image Processing

Evaluating and Refining Data Models

  • Optimization, hyperparameter tuning, and benchmarking
  • Bandits, resource allocation, and prediction accuracy
  • Lab: Model Optimization

Communicating for Business Impact

  • Storytelling, stakeholder alignment, and data compliance
  • Lab: Reporting for Decision Makers

Deploying Data Models

  • CI/CD virtualization, containerization, and modeling
  • Infrastructure-as-Code and hybrid/edge deployments
  • Lab: Deploy ML Pipelines in AWS

Discovering Specialized Applications

  • Specialized applications. NLP, computer vision, graph analysis
  • Event detection, signal processing, edge AI