By the end of this week you will be able to:
Python is the dominant programming language in data science, topping the IEEE Spectrum programming language rankings for 8 consecutive years. Three reasons make it the standard:
Major employers including Google, Meta, Netflix, and the NHS (UK) use Python as their primary data science language. Learning Python opens doors to data analyst, data scientist, and ML engineer roles in both the Nigerian and UK job markets.
No prior programming experience is required for Week 1. Every concept is explained from first principles.
Anaconda bundles Python 3.11 with 250+ data science packages in one installer, including Jupyter Notebook, NumPy, pandas, matplotlib, and scikit-learn. This is the fastest way to get a complete data science environment.
pip install numpy pandas matplotlib seaborn scikit-learn jupyterjupyter notebookJupyter Notebook is the standard interactive environment for data science. You write code in cells, run them with Shift+Enter, and see results immediately below the cell.
Open Jupyter Notebook, click New → Python 3, and type the following into the first cell. Press Shift+Enter to run it.
print("Hello, Data Science!")
print("My name is Adeleke")
print(2 + 3)A variable is a named container that stores a value. You create a variable by typing a name, then the equals sign (=), then the value. Python reads this right to left: evaluate the right side, store it in the name on the left.
Variable naming rules in Python:
age and Age are different variablesfor, if, while# Good variable names student_name = "Amara Okafor" age = 28 gpa = 3.85 is_enrolled = True # Variables can be updated age = 29 # overwrite the value age = age + 1 # compute then store (now age is 30) print(student_name) # Amara Okafor print(age) # 30 print(gpa) # 3.85
Every piece of data in Python has a type. The four fundamental types are:
| Type | Description | Example | Use in Data Science |
|---|---|---|---|
| int | Whole numbers, no decimal | 42, -7, 0, 1000 | Counts, rankings, row numbers |
| float | Numbers with decimal point | 3.14, -0.5, 2.0 | Measurements, probabilities, statistics |
| str | Text, enclosed in quotes | "Lagos", 'Nigeria' | Column names, categories, labels |
| bool | True or False only | True, False | Flags, conditions, binary targets |
# int: whole numbers n_patients = 500 year = 2024 rank = 1 # float: decimal numbers (note: 1/3 in Python is 0.333...) accuracy = 0.947 temperature = 36.8 pi = 3.14159 # str: text enclosed in single or double quotes (consistent choice) country = "Nigeria" city = 'Ibadan' sentence = "There are 36 states in Nigeria." # bool: only True or False (capital T and F, exactly) is_diabetic = True passed_test = False # Use type() to check what type a variable is print(type(n_patients)) # <class 'int'> print(type(accuracy)) # <class 'float'> print(type(country)) # <class 'str'> print(type(is_diabetic)) # <class 'bool'>
You can convert between types using built-in conversion functions. This is called explicit type conversion or casting.
# Convert string to number
age_str = "28"
age_int = int(age_str) # "28" → 28
print(age_int + 2) # 30 (arithmetic works now)
# Convert number to string (for printing/concatenation)
score = 94
message = "Your score is " + str(score) + "%"
print(message)
# Convert to float
items = 7
per_unit = float(items) / 3 # 7 / 3 = 2.333...
print(round(per_unit, 2)) # 2.33
# Boolean conversion: 0 and empty string are False, everything else is True
print(bool(0)) # False
print(bool(1)) # True
print(bool("")) # False
print(bool("text")) # True
print(bool(3.14)) # TrueA common beginner mistake: "5" + 3 raises a TypeError because you cannot add a string to an integer. Always convert types explicitly before arithmetic.
Create variables for a hospital patient record. Include: patient ID (integer), full name (string), temperature in Celsius (float), whether the patient is allergic to penicillin (boolean), and the patient's age (integer). Print all five variables. Then convert the temperature to Fahrenheit using the formula: F = (C × 9/5) + 32. Print the result.
| Operator | Name | Example | Result |
|---|---|---|---|
| + | Addition | 10 + 3 | 13 |
| - | Subtraction | 10 - 3 | 7 |
| * | Multiplication | 10 * 3 | 30 |
| / | Division (always float) | 10 / 3 | 3.333... |
| // | Floor Division (truncates) | 10 // 3 | 3 |
| % | Modulo (remainder) | 10 % 3 | 1 |
| ** | Exponentiation | 2 ** 10 | 1024 |
# Real-world arithmetic
monthly_salary = 250000 # NGN
tax_rate = 0.075 # 7.5%
annual_salary = monthly_salary * 12
tax_payable = annual_salary * tax_rate
net_annual = annual_salary - tax_payable
print(f"Annual salary: NGN {annual_salary:,}")
print(f"Tax payable: NGN {tax_payable:,}")
print(f"Net annual: NGN {net_annual:,}")
# Statistical formula: z-score = (x - mean) / std
x = 72
mean = 65
std = 8
z_score = (x - mean) / std
print(f"z-score: {z_score:.3f}") # 0.875
# Modulo: check if a number is even or odd
number = 47
remainder = number % 2
print(f"{number} is {'even' if remainder == 0 else 'odd'}")# Comparison operators - always return True or False
score = 74
print(score >= 70) # True (greater than or equal to)
print(score == 74) # True (equality - note DOUBLE equals)
print(score != 100) # True (not equal to)
print(score < 50) # False (less than)
# Logical operators combine boolean expressions
has_experience = True
has_degree = True
years = 5
# and: both must be True
is_eligible = has_degree and years >= 3
print(f"Eligible: {is_eligible}") # True
# or: at least one must be True
can_apply = has_degree or has_experience
print(f"Can apply: {can_apply}") # True
# not: flips True to False and vice versa
is_junior = not (years >= 5)
print(f"Junior role: {is_junior}") # FalseConditional statements let your program make decisions. Python uses indentation (4 spaces) to define code blocks instead of curly braces like other languages. This indentation is mandatory, not optional.
credit_score = 720
income = 480000 # NGN annual
# Basic if/elif/else
if credit_score >= 750 and income >= 500000:
loan_tier = "Prime - Up to NGN 5,000,000"
interest_rate = 12.0
elif credit_score >= 650 and income >= 300000:
loan_tier = "Standard - Up to NGN 2,000,000"
interest_rate = 18.5
elif credit_score >= 550:
loan_tier = "Sub-prime - Up to NGN 500,000"
interest_rate = 24.0
else:
loan_tier = "Declined"
interest_rate = None
print(f"Loan tier: {loan_tier}")
if interest_rate:
print(f"Interest rate: {interest_rate}%")A for loop repeats a block of code for each item in a sequence. The variable after for takes each value in turn.
# Loop over a list
regions = ["South-West", "North-Central", "South-East", "North-West"]
for region in regions:
print(f"Processing data for: {region}")
# Loop over a range of numbers
# range(start, stop, step) - stop is EXCLUDED
total = 0
for i in range(1, 11): # 1, 2, 3, ... 10
total = total + i
print(f"Sum of 1 to 10: {total}") # 55
# enumerate() gives index AND value
subjects = ["Statistics", "Machine Learning", "Python", "SQL"]
for index, subject in enumerate(subjects, start=1):
print(f" Module {index}: {subject}")
# Nested loops
for week in range(1, 4):
for session in range(1, 4):
print(f"Week {week}, Session {session}")# While loop - repeat until condition is False
balance = 10000
monthly_rate = 0.015 # 1.5% monthly interest
month = 0
while balance < 20000: # double the balance
balance = balance * (1 + monthly_rate)
month += 1 # same as: month = month + 1
print(f"Balance doubled after {month} months")
print(f"Final balance: NGN {balance:,.2f}")
# break - exit the loop immediately
print("\nSearching for a prime number above 100:")
n = 101
while True: # infinite loop - must break manually
is_prime = all(n % i != 0 for i in range(2, int(n**0.5)+1))
if is_prime:
print(f"Found: {n}")
break
n += 1Write a program that takes a student's exam score and prints the appropriate grade and classification:
Then extend it: write a for loop that classifies the following 5 scores and counts how many students passed (score >= 50): [72, 48, 88, 51, 35].
A function is a named, reusable block of code. Functions are the fundamental building block of clean, maintainable code. Rather than copying the same logic five times in your notebook, you write it once as a function and call it.
The anatomy of a function definition:
# def keyword, function name, parentheses, colon
def calculate_bmi(weight_kg, height_m):
"""
Calculate BMI and return the value with WHO classification.
Args:
weight_kg (float): Body weight in kilograms
height_m (float): Height in metres
Returns:
dict: BMI value and WHO category string
"""
bmi = weight_kg / (height_m ** 2)
if bmi < 18.5:
category = "Underweight"
elif bmi < 25.0:
category = "Normal weight"
elif bmi < 30.0:
category = "Overweight"
else:
category = "Obese"
return {"bmi": round(bmi, 1), "category": category}
# Call the function
result = calculate_bmi(70, 1.75)
print(result) # {"bmi": 22.9, "category": "Normal weight"}
print(f"BMI: {result['bmi']}") # BMI: 22.9
print(f"Class: {result['category']}") # Class: Normal weight# Default parameters - used when the caller does not provide a value
def greet_student(name, course="Data Science"):
return f"Welcome, {name}! You are enrolled in {course}."
print(greet_student("Amara")) # uses default
print(greet_student("Emeka", "Machine Learning")) # overrides default
# Multiple return values (Python returns a tuple)
def summary_stats(numbers):
"""Return mean, minimum, and maximum of a list."""
mean = sum(numbers) / len(numbers)
return mean, min(numbers), max(numbers)
scores = [72, 88, 65, 91, 78, 55]
avg, lowest, highest = summary_stats(scores) # unpack the tuple
print(f"Mean: {avg:.1f} Min: {lowest} Max: {highest}")
# Keyword arguments - pass by name in any order
def create_report(title, author, year=2024, pages=0):
return f"{title} by {author} ({year}), {pages} pages"
print(create_report(author="Kmex", title="Data Science Guide", pages=120))# Local variables: only exist inside the function
def my_function():
local_var = "I only exist inside the function"
print(local_var)
my_function()
# print(local_var) # NameError - local_var does not exist here
# Global variables: defined outside all functions
EXCHANGE_RATE = 1450 # NGN per USD (conventionally UPPERCASE for constants)
def convert_to_ngn(usd_amount):
"""Use the global exchange rate to convert USD to NGN."""
return usd_amount * EXCHANGE_RATE # reads global variable
print(convert_to_ngn(500)) # 725000PEP 8 is Python's official style guide. Employers expect PEP 8-compliant code in data science roles. The key rules:
x = y + 2 not x=y+2Git is a version control system that tracks changes to your code over time. Every data science project must be version-controlled from day one. GitHub is the online platform where you store and share your Git repositories. UK and Nigerian data science employers check GitHub profiles during hiring.
# Install Git from git-scm.com if not already installed # Configure your identity (one-time setup) git config --global user.name "Your Name" git config --global user.email "your@email.com" # Create a new repository for your data science work mkdir data-science-course cd data-science-course git init # initialise an empty repository # Create your first file echo "# Data Science Course" > README.md # Stage and commit git add README.md # stage the file git add . # OR stage all changed files git commit -m "Initial commit: add README" # save a snapshot # Connect to GitHub # (First create a new repo on github.com, then:) git remote add origin https://github.com/yourusername/data-science-course.git git push -u origin main # upload to GitHub
# Check what files have changed git status # See what changed inside files git diff # Stage specific files git add analysis.py git add notebooks/week1.ipynb # Commit with a descriptive message (always describe WHAT and WHY) git commit -m "Add BMI calculator function with WHO classification" # Push to GitHub git push
Write commit messages in the imperative mood, describing what the commit does: "Add BMI calculator" not "Added BMI calculator" or "Adding BMI calculator".
Submit both notebooks to your GitHub repository before Session 1 of Week 2. Instructor feedback will be provided within 48 hours.
Build a function called calculate_bmi(weight_kg, height_m) that:
Push the Jupyter notebook to your GitHub repository with a clear README.
Write a function called print_times_table(n) that prints a formatted multiplication table for any number n from 1 to 12. Then write a second function celsius_to_fahrenheit_table(start, end, step) that prints a temperature conversion table from start to end degrees Celsius in increments of step. Test it with start=0, end=100, step=10.
Write a function fizzbuzz(n) that loops from 1 to n and prints: "Fizz" for multiples of 3, "Buzz" for multiples of 5, "FizzBuzz" for multiples of both 3 and 5, and the number itself for all other cases. This question appears in almost every junior data science technical interview in the UK and Nigeria.