W1
Beginner 3 sessions • 6 hours Python

Week 1: Python Fundamentals for Data Science

Learning Objectives

By the end of this week you will be able to:

  • Install Python and set up a data science development environment
  • Understand Python syntax and write your first programs
  • Work with all four basic data types: int, float, str, bool
  • Use comparison, arithmetic, and logical operators correctly
  • Control program flow with if/elif/else, for loops, and while loops
  • Define functions with parameters, return values, and default arguments
  • Push your code to a GitHub repository using Git

Why Python for Data Science?

Python is the dominant programming language in data science, topping the IEEE Spectrum programming language rankings for 8 consecutive years. Three reasons make it the standard:

  • Readable syntax that closely mirrors mathematical notation, reducing the gap between statistical theory and implementation
  • A mature, extensively tested ecosystem of scientific libraries: NumPy for numerical computing, pandas for data manipulation, scikit-learn for machine learning, matplotlib and seaborn for visualisation
  • Strong community support and adoption across industry and academia

Major employers including Google, Meta, Netflix, and the NHS (UK) use Python as their primary data science language. Learning Python opens doors to data analyst, data scientist, and ML engineer roles in both the Nigerian and UK job markets.

No prior programming experience is required for Week 1. Every concept is explained from first principles.

Installing Python: Step-by-Step

Method 1: Anaconda Distribution (Recommended for Beginners)

Anaconda bundles Python 3.11 with 250+ data science packages in one installer, including Jupyter Notebook, NumPy, pandas, matplotlib, and scikit-learn. This is the fastest way to get a complete data science environment.

  1. Go to anaconda.com/download
  2. Select your operating system: Windows, macOS, or Linux
  3. Download the 64-bit graphical installer (approximately 900 MB)
  4. Run the installer. On Windows: accept the license, choose Just Me, and leave the install path as default. Do NOT check Add Anaconda to PATH (Anaconda recommends against this)
  5. Open Anaconda Navigator from your Start menu or Applications folder
  6. Click Launch under Jupyter Notebook. Your browser will open at http://localhost:8888

Method 2: Direct Python + pip (Experienced Users)

  1. Download Python 3.11+ from python.org/downloads
  2. On Windows: check Add Python to PATH during installation
  3. Open a terminal and run: pip install numpy pandas matplotlib seaborn scikit-learn jupyter
  4. Launch Jupyter: jupyter notebook

Jupyter Notebook is the standard interactive environment for data science. You write code in cells, run them with Shift+Enter, and see results immediately below the cell.

Session 1: Variables and Data Types

1.1 Your First Python Program

Open Jupyter Notebook, click New → Python 3, and type the following into the first cell. Press Shift+Enter to run it.

Your first Python code
print("Hello, Data Science!")
print("My name is Adeleke")
print(2 + 3)

1.2 Variables and Assignment

A variable is a named container that stores a value. You create a variable by typing a name, then the equals sign (=), then the value. Python reads this right to left: evaluate the right side, store it in the name on the left.

Variable naming rules in Python:

  • Must start with a letter or underscore (not a number)
  • Can contain letters, numbers, and underscores
  • Case-sensitive: age and Age are different variables
  • Cannot be a Python keyword like for, if, while
  • Convention: use lowercase_with_underscores for variable names (called snake_case)
Variables and assignment
# Good variable names
student_name = "Amara Okafor"
age = 28
gpa = 3.85
is_enrolled = True

# Variables can be updated
age = 29          # overwrite the value
age = age + 1     # compute then store (now age is 30)

print(student_name)   # Amara Okafor
print(age)            # 30
print(gpa)            # 3.85

1.3 The Four Basic Data Types

Every piece of data in Python has a type. The four fundamental types are:

TypeDescriptionExampleUse in Data Science
intWhole numbers, no decimal42, -7, 0, 1000Counts, rankings, row numbers
floatNumbers with decimal point3.14, -0.5, 2.0Measurements, probabilities, statistics
strText, enclosed in quotes"Lagos", 'Nigeria'Column names, categories, labels
boolTrue or False onlyTrue, FalseFlags, conditions, binary targets
The four basic data types
# int: whole numbers
n_patients = 500
year = 2024
rank = 1

# float: decimal numbers (note: 1/3 in Python is 0.333...)
accuracy = 0.947
temperature = 36.8
pi = 3.14159

# str: text enclosed in single or double quotes (consistent choice)
country = "Nigeria"
city = 'Ibadan'
sentence = "There are 36 states in Nigeria."

# bool: only True or False (capital T and F, exactly)
is_diabetic = True
passed_test = False

# Use type() to check what type a variable is
print(type(n_patients))   # <class 'int'>
print(type(accuracy))     # <class 'float'>
print(type(country))      # <class 'str'>
print(type(is_diabetic))  # <class 'bool'>

1.4 Type Conversion

You can convert between types using built-in conversion functions. This is called explicit type conversion or casting.

Type conversion
# Convert string to number
age_str = "28"
age_int = int(age_str)     # "28" → 28
print(age_int + 2)          # 30 (arithmetic works now)

# Convert number to string (for printing/concatenation)
score = 94
message = "Your score is " + str(score) + "%"
print(message)

# Convert to float
items = 7
per_unit = float(items) / 3    # 7 / 3 = 2.333...
print(round(per_unit, 2))      # 2.33

# Boolean conversion: 0 and empty string are False, everything else is True
print(bool(0))      # False
print(bool(1))      # True
print(bool(""))     # False
print(bool("text")) # True
print(bool(3.14))   # True

A common beginner mistake: "5" + 3 raises a TypeError because you cannot add a string to an integer. Always convert types explicitly before arithmetic.

Exercise: Practice: Patient Record Variables

Create variables for a hospital patient record. Include: patient ID (integer), full name (string), temperature in Celsius (float), whether the patient is allergic to penicillin (boolean), and the patient's age (integer). Print all five variables. Then convert the temperature to Fahrenheit using the formula: F = (C × 9/5) + 32. Print the result.

Session 2: Operators and Control Flow

2.1 Arithmetic Operators

OperatorNameExampleResult
+Addition10 + 313
-Subtraction10 - 37
*Multiplication10 * 330
/Division (always float)10 / 33.333...
//Floor Division (truncates)10 // 33
%Modulo (remainder)10 % 31
**Exponentiation2 ** 101024
Arithmetic operators in practice
# Real-world arithmetic
monthly_salary = 250000          # NGN
tax_rate = 0.075                 # 7.5%
annual_salary = monthly_salary * 12
tax_payable = annual_salary * tax_rate
net_annual = annual_salary - tax_payable

print(f"Annual salary: NGN {annual_salary:,}")
print(f"Tax payable:   NGN {tax_payable:,}")
print(f"Net annual:    NGN {net_annual:,}")

# Statistical formula: z-score = (x - mean) / std
x = 72
mean = 65
std = 8
z_score = (x - mean) / std
print(f"z-score: {z_score:.3f}")  # 0.875

# Modulo: check if a number is even or odd
number = 47
remainder = number % 2
print(f"{number} is {'even' if remainder == 0 else 'odd'}")

2.2 Comparison and Logical Operators

Comparison and logical operators
# Comparison operators - always return True or False
score = 74

print(score >= 70)    # True  (greater than or equal to)
print(score == 74)    # True  (equality - note DOUBLE equals)
print(score != 100)   # True  (not equal to)
print(score < 50)     # False (less than)

# Logical operators combine boolean expressions
has_experience = True
has_degree = True
years = 5

# and: both must be True
is_eligible = has_degree and years >= 3
print(f"Eligible: {is_eligible}")   # True

# or: at least one must be True
can_apply = has_degree or has_experience
print(f"Can apply: {can_apply}")    # True

# not: flips True to False and vice versa
is_junior = not (years >= 5)
print(f"Junior role: {is_junior}")  # False

2.3 If, Elif, Else Statements

Conditional statements let your program make decisions. Python uses indentation (4 spaces) to define code blocks instead of curly braces like other languages. This indentation is mandatory, not optional.

If/elif/else - loan application decision
credit_score = 720
income = 480000   # NGN annual

# Basic if/elif/else
if credit_score >= 750 and income >= 500000:
    loan_tier = "Prime - Up to NGN 5,000,000"
    interest_rate = 12.0
elif credit_score >= 650 and income >= 300000:
    loan_tier = "Standard - Up to NGN 2,000,000"
    interest_rate = 18.5
elif credit_score >= 550:
    loan_tier = "Sub-prime - Up to NGN 500,000"
    interest_rate = 24.0
else:
    loan_tier = "Declined"
    interest_rate = None

print(f"Loan tier: {loan_tier}")
if interest_rate:
    print(f"Interest rate: {interest_rate}%")

2.4 For Loops

A for loop repeats a block of code for each item in a sequence. The variable after for takes each value in turn.

For loops
# Loop over a list
regions = ["South-West", "North-Central", "South-East", "North-West"]

for region in regions:
    print(f"Processing data for: {region}")

# Loop over a range of numbers
# range(start, stop, step) - stop is EXCLUDED
total = 0
for i in range(1, 11):     # 1, 2, 3, ... 10
    total = total + i
print(f"Sum of 1 to 10: {total}")   # 55

# enumerate() gives index AND value
subjects = ["Statistics", "Machine Learning", "Python", "SQL"]
for index, subject in enumerate(subjects, start=1):
    print(f"  Module {index}: {subject}")

# Nested loops
for week in range(1, 4):
    for session in range(1, 4):
        print(f"Week {week}, Session {session}")

2.5 While Loops

While loops
# While loop - repeat until condition is False
balance = 10000
monthly_rate = 0.015  # 1.5% monthly interest
month = 0

while balance < 20000:     # double the balance
    balance = balance * (1 + monthly_rate)
    month += 1             # same as: month = month + 1

print(f"Balance doubled after {month} months")
print(f"Final balance: NGN {balance:,.2f}")

# break - exit the loop immediately
print("\nSearching for a prime number above 100:")
n = 101
while True:        # infinite loop - must break manually
    is_prime = all(n % i != 0 for i in range(2, int(n**0.5)+1))
    if is_prime:
        print(f"Found: {n}")
        break
    n += 1

Exercise: Practice: Grade Classifier

Write a program that takes a student's exam score and prints the appropriate grade and classification:

  • 70-100: A - Distinction
  • 60-69: B - Merit
  • 50-59: C - Pass
  • 40-49: D - Borderline Pass
  • Below 40: F - Fail

Then extend it: write a for loop that classifies the following 5 scores and counts how many students passed (score >= 50): [72, 48, 88, 51, 35].

Session 3: Functions and Git Basics

3.1 Defining and Calling Functions

A function is a named, reusable block of code. Functions are the fundamental building block of clean, maintainable code. Rather than copying the same logic five times in your notebook, you write it once as a function and call it.

The anatomy of a function definition:

Defining a function
# def keyword, function name, parentheses, colon
def calculate_bmi(weight_kg, height_m):
    """
    Calculate BMI and return the value with WHO classification.
    
    Args:
        weight_kg (float): Body weight in kilograms
        height_m  (float): Height in metres
    
    Returns:
        dict: BMI value and WHO category string
    """
    bmi = weight_kg / (height_m ** 2)
    
    if bmi < 18.5:
        category = "Underweight"
    elif bmi < 25.0:
        category = "Normal weight"
    elif bmi < 30.0:
        category = "Overweight"
    else:
        category = "Obese"
    
    return {"bmi": round(bmi, 1), "category": category}

# Call the function
result = calculate_bmi(70, 1.75)
print(result)                          # {"bmi": 22.9, "category": "Normal weight"}
print(f"BMI: {result['bmi']}")        # BMI: 22.9
print(f"Class: {result['category']}") # Class: Normal weight

3.2 Parameters, Arguments and Return Values

Parameters, defaults, and return values
# Default parameters - used when the caller does not provide a value
def greet_student(name, course="Data Science"):
    return f"Welcome, {name}! You are enrolled in {course}."

print(greet_student("Amara"))                         # uses default
print(greet_student("Emeka", "Machine Learning"))      # overrides default

# Multiple return values (Python returns a tuple)
def summary_stats(numbers):
    """Return mean, minimum, and maximum of a list."""
    mean = sum(numbers) / len(numbers)
    return mean, min(numbers), max(numbers)

scores = [72, 88, 65, 91, 78, 55]
avg, lowest, highest = summary_stats(scores)   # unpack the tuple
print(f"Mean: {avg:.1f}  Min: {lowest}  Max: {highest}")

# Keyword arguments - pass by name in any order
def create_report(title, author, year=2024, pages=0):
    return f"{title} by {author} ({year}), {pages} pages"

print(create_report(author="Kmex", title="Data Science Guide", pages=120))

3.3 Variable Scope

Variable scope
# Local variables: only exist inside the function
def my_function():
    local_var = "I only exist inside the function"
    print(local_var)

my_function()
# print(local_var)  # NameError - local_var does not exist here

# Global variables: defined outside all functions
EXCHANGE_RATE = 1450   # NGN per USD (conventionally UPPERCASE for constants)

def convert_to_ngn(usd_amount):
    """Use the global exchange rate to convert USD to NGN."""
    return usd_amount * EXCHANGE_RATE   # reads global variable

print(convert_to_ngn(500))   # 725000

3.4 PEP 8: Python Style Guide

PEP 8 is Python's official style guide. Employers expect PEP 8-compliant code in data science roles. The key rules:

  • Indentation: 4 spaces (not tabs)
  • Line length: 79 characters maximum for code
  • Variable names: lowercase_with_underscores (snake_case)
  • Constant names: UPPERCASE_WITH_UNDERSCORES
  • Add a docstring to every function (triple-quoted string immediately after def)
  • Two blank lines between top-level functions
  • One blank line between methods inside a class
  • Spaces around operators: x = y + 2 not x=y+2

3.5 Introduction to Git and GitHub

Git is a version control system that tracks changes to your code over time. Every data science project must be version-controlled from day one. GitHub is the online platform where you store and share your Git repositories. UK and Nigerian data science employers check GitHub profiles during hiring.

Setting Up Git (run in your terminal/Anaconda Prompt)

Git setup and first commit
# Install Git from git-scm.com if not already installed
# Configure your identity (one-time setup)
git config --global user.name "Your Name"
git config --global user.email "your@email.com"

# Create a new repository for your data science work
mkdir data-science-course
cd data-science-course
git init                         # initialise an empty repository

# Create your first file
echo "# Data Science Course" > README.md

# Stage and commit
git add README.md                # stage the file
git add .                        # OR stage all changed files
git commit -m "Initial commit: add README"   # save a snapshot

# Connect to GitHub
# (First create a new repo on github.com, then:)
git remote add origin https://github.com/yourusername/data-science-course.git
git push -u origin main          # upload to GitHub

Daily Git Workflow

Daily git workflow
# Check what files have changed
git status

# See what changed inside files
git diff

# Stage specific files
git add analysis.py
git add notebooks/week1.ipynb

# Commit with a descriptive message (always describe WHAT and WHY)
git commit -m "Add BMI calculator function with WHO classification"

# Push to GitHub
git push

Write commit messages in the imperative mood, describing what the commit does: "Add BMI calculator" not "Added BMI calculator" or "Adding BMI calculator".

Week 1 Assignments

Submit both notebooks to your GitHub repository before Session 1 of Week 2. Instructor feedback will be provided within 48 hours.

Exercise: Assignment 1: BMI Calculator with Input Validation

Build a function called calculate_bmi(weight_kg, height_m) that:

  1. Validates inputs: weight must be between 20 and 300 kg; height must be between 0.5 and 2.5 m. Return an error message string for invalid inputs.
  2. Computes the BMI using the formula: weight / height squared
  3. Returns a dictionary with three keys: bmi (rounded to 1 decimal), category (WHO classification), and health_risk (Low, Moderate, High, or Very High)
  4. Test with 5 different inputs including at least 2 edge cases
  5. Print the results in a formatted table

Push the Jupyter notebook to your GitHub repository with a clear README.

Exercise: Assignment 2: Multiplication Table Generator

Write a function called print_times_table(n) that prints a formatted multiplication table for any number n from 1 to 12. Then write a second function celsius_to_fahrenheit_table(start, end, step) that prints a temperature conversion table from start to end degrees Celsius in increments of step. Test it with start=0, end=100, step=10.

Exercise: Bonus Challenge: FizzBuzz

Write a function fizzbuzz(n) that loops from 1 to n and prints: "Fizz" for multiples of 3, "Buzz" for multiples of 5, "FizzBuzz" for multiples of both 3 and 5, and the number itself for all other cases. This question appears in almost every junior data science technical interview in the UK and Nigeria.