Skip to main content

Python Programming for Actuaries and Data Scientists

Register Today!





    Become proficient in Python with a 9-week course to develop practical skills in Python programming with a focus on actuarial related data. Students will acquire skills to perform actuarial tasks including model validation,  research, and risk analysis.  The course will also introduce  students to fundamental concepts of data mining and machine learning to model large amounts of data.

    Robert Roper graduated with a Master’s degree in Economics from California State University Long Beach.  His passion for computers has provided him with a thriving freelance career as an applied programming consultant, working on tasks including data processing, regression analysis, forecasting, numerical algorithm implementation, and research report design.

    Python Programming for Actuaries and Data Scientists Offers The Following:

    • Weekly 2 hour lectures to gain a proficient understanding and use of Python Programming
    • Suggested weekly assignments to reinforce skills learned in class
    • Weekly Online Office Hours for students to receive personal help from Robert Roper in any areas of confusion to master Python skills
    • Mini-Sessions in which students can schedule private meetings with Robert Roper to review areas of confusion to master Python skills
    • Students are invited to join The Introduction to Python for Actuaries and Data Scientists Discussion Group open only to program members
    • Q & A Email Correspondence throughout program in which students can email Robert Roper questions with quick reply back
    • All weekly lectures recorded and made available to students
    • Upon completing the program, students will receive a certificate recognized by employers to further validate Python skills

    Schedule

    Weekly Live Classes & Subjects

    June 6th 
    10:00 AM – 12:00 PM EDT

    Lecture 1
    1. Getting started with Jupyter Notebooks and OOP

    Jupyter is ideal for exploratory, iterative work where you want to mix code, output, and narrative. OOP helps when modeling real-world entities with shared behavior and state.

    • Notebook cells, kernels, and markdown
    • Classes, objects, attributes, and methods
    • Inheritance and encapsulation basics
    • Why OOP matters for data pipelines

    2. Data in Python

    Ingesting, storing, or manipulating data before analysis.

    • Primitive types: int, float, str, bool
    • Collections: lists, tuples, dicts, sets
    • NumPy arrays vs. Pandas DataFrames
    • Reading data from CSV, JSON, and APIs

    3. Fundamental data mining concepts and terminology

    The shared vocabulary for the whole course.

    • What is data mining vs. machine learning vs. statistics
    • KDD pipeline (Knowledge Discovery in Databases)
    • Supervised vs. unsupervised learning
    • Features, labels, instances, and dimensionality

    June 13th
    10:00 AM – 12:00 PM EDT

    Lecture 2
    1. Quality and structure of data

    Garbage in, garbage out. Assess your data before trusting it.

    • Structured vs. unstructured vs. semi-structured data
    • Common quality issues: missing values, noise, duplicates, inconsistencies
    • Data types: nominal, ordinal, interval, ratio
    • Metadata and data provenance

    2. Data preprocessing (cleaning)

    Almost always necessary before analysis or modeling; raw data is rarely ready to use.

    • Handling missing values (imputation, deletion)
    • Normalization and standardization
    • Encoding categorical variables
    • Outlier detection and treatment
    • Train/test split considerations

    3. Measures of similarity/dissimilarity

    Core to clustering, classification (e.g., KNN), and recommendation systems.

    • Euclidean, Manhattan, and Minkowski distance
    • Cosine similarity
    • Hamming distance for categorical data
    • Distance matrices

    June 20th
    10:00 AM – 12:00 PM EDT

    Lecture 3
    1. Creating plots/graphs in Python

    Whether you’re communicating findings or exploring distributions, visualization is a first step in nearly every project.

    • Matplotlib basics: figures, axes, subplots
    • Seaborn for statistical plots
    • Common chart types: line, bar, scatter, histogram, box plot
    • Formatting, labels, legends, and saving figures
    • Time-series line plots and rolling averages

    2. Visualizing spatial and spatio-temporal data

    When your data has a geographic component that standard charts don’t capture well.

    • Heatmaps for geographic patterns
    • Choropleth and scatter maps (e.g., with Folium or Plotly)
    • Handling datetime indexing in Pandas
    • Animation of spatial plots for spatio-temporal data

    June 27th
    10:00 AM – 12:00 PM EDT

    Lecture 4
    1. Exploratory data analysis (EDA)

    Before building any model, understand the shape, distribution, and relationships in your data first.

    • Summary statistics (mean, median, std, quartiles)
    • Distribution analysis and skewness
    • Pairplots and correlation heatmaps
    • Identifying data imbalances

    2. Correlation analysis

    Understand linear or monotonic relationships between variables before modeling.

    • Pearson vs. Spearman vs. Kendall correlation
    • Correlation matrices and heatmaps
    • Correlation vs. causation (critical distinction)
    • Multicollinearity concerns for regression

    3. Pattern recognition

    Identify recurring structures or rules in data without necessarily building a full model.

    • Frequency patterns and histograms
    • Association rules (Apriori concept)
    • Seasonal and trend decomposition
    • Visual vs. algorithmic pattern detection

    4. Anomaly detection

    Fraud detection, network intrusion, equipment failure, quality control, for use in any domain where rare events matter.

    • Statistical methods: Z-score, IQR-based detection
    • Isolation Forest and Local Outlier Factor (briefly)
    • Global vs. local anomalies
    • The rarity problem and false positive tradeoffs

    July 4th
    10:00 AM – 12:00 PM EDT

    Lecture 5
    1. Regression: OLS, Ridge, and Lasso

    Predicting a continuous output variable from one or more input features.

    • Simple vs. multiple linear regression
    • OLS assumptions (linearity, homoscedasticity, independence)
    • Ridge (L2) and Lasso (L1) regularization
    • Feature selection via Lasso

    2. Goodness-of-fit, bootstrapping, and aggregation

    After fitting a model, you need to know how well it generalizes, not just how well it fits training data.

    • R², adjusted R², RMSE, MAE
    • Overfitting vs. underfitting
    • Bootstrapping for confidence intervals
    • Ensemble aggregation concepts (bagging preview)

    July 11th
    10:00 AM – 12:00 PM EDT

    Lecture 6
    1. Classification: Logistic regression, KNN, and decision trees

    Predicting categorical outcomes (binary or multiclass).

    • Logistic regression: sigmoid function and decision boundary
    • KNN: choosing k, distance sensitivity, curse of dimensionality
    • Decision trees: splitting criteria (Gini, entropy), depth control
    • Strengths and weaknesses of each

    2. Accuracy, precision, recall, and F1-score

    Evaluating classifiers. Especially important when classes are imbalanced.

    • Confusion matrix walkthrough
    • Precision vs. recall tradeoff
    • F1-score as harmonic mean
    • When to prioritize precision vs. recall (e.g., medical vs. spam)

    3. Cross-validation

    Reliable estimation of model performance that isn’t sensitive to a single train/test split.

    • k-fold cross-validation
    • Stratified k-fold for imbalanced classes
    • Leave-one-out cross-validation
    • Cross-validation vs. a simple holdout set

    July 18th
    10:00 AM – 12:00 PM EDT

    Lecture 7
    1. Clustering: Centroids and K-means

    When you want to find natural groupings in unlabeled data. Useful for customer segmentation, document grouping, image compression.

    • Unsupervised vs. supervised distinction
    • K-means algorithm step by step (init, assign, update)
    • Choosing k: elbow method
    • Limitations: assumes spherical clusters, sensitive to initialization

    2. Silhouette coefficient and Jaccard index

    Evaluating the quality of clusters when there are no ground-truth labels to compare against.

    • Silhouette score: cohesion vs. separation, range and interpretation
    • Jaccard index: comparing two cluster assignments or sets
    • When to use each metric
    • Cluster validity in practice

    July 25th
    10:00 AM – 12:00 PM EDT

    Lecture 8
    1. Stochastic processes and simulation

    For modeling systems that evolve randomly over time finance, queuing, epidemiology, games.

    • Random variables and probability distributions
    • Markov chains and memoryless property
    • Monte Carlo simulation
    • Random walks and their relevance to financial modeling

    2. Time-series forecasting

    When data has a temporal ordering and you need to predict future values.

    • Trend, seasonality, and residuals (decomposition)
    • Autocorrelation and stationarity
    • ARIMA models (conceptually)
    • Train/test splitting for time series (no shuffling)

    August 1st
    10:00 AM – 12:00 PM EDT

    Lecture 9

    1. Review of course material and Q&A

    2. Capstone project: Write a detailed report on financial data w/ annotated Python code

    Join Python Programming for Actuaries and Data Scientists Today!

    REGISTER