BioNT Intermediate course
Python for Bioinformatics
Course objectives
Overall Course Objective
By the end of this course, students will be able to effectively utilize NumPy and Pandas libraries to manipulate, analyze, and process complex numerical and tabular data in Python, demonstrating proficiency in advanced array operations, data structures, and data manipulation techniques. Additionally, students will apply these skills to real-world bioinformatics problems, gaining practical experience in genomics data analysis and handling.
Specific Learning Objectives
1. After completing the NumPy section, students will be able to:
Explain the purpose and advantages of using NumPy in scientific computing and data analysis
Create, manipulate, and understand the structure of NumPy arrays
Apply various indexing and slicing techniques to access and modify array elements efficiently
Perform sorting and advanced indexing operations on NumPy arrays
Utilize NumPy’s built-in functions and attributes for basic array operations
Implement vectorized operations to optimize code performance
Apply broadcasting techniques to perform operations between arrays of different shapes
Use NumPy’s splitting functions to divide arrays into multiple sub-arrays
Apply NumPy techniques to analyze gene expression data in a bioinformatics context
2. After completing the Pandas section, students will be able to:
Understand the relationship between Pandas and NumPy, and when to use each library
Create and manipulate Pandas Series and DataFrames effectively
Apply various data manipulation techniques using DataFrame functions
Manage DataFrame indexes, including setting, resetting, and using multi-level indexes
Perform advanced slicing and filtering operations on DataFrames
Handle and manipulate string data within DataFrames
Identify and address missing data in Pandas objects
Create DataFrames from various data sources and reshape them as needed
Combine multiple DataFrames using merging and concatenation techniques
3. After completing the hands-on sessions, students will be able to:
Integrate theoretical knowledge of NumPy and Pandas with practical applications in genomic data analysis
Apply NumPy techniques to perform bioinformatics analysis on real-world datasets (e.g., gene expression data)
Use Python Pandas to examine and analyze real-world datasets (e.g., samples used for the gene expression study)
Demonstrate proficiency in handling and manipulating bioinformatics datasets using Pandas (e.g., GTF files)
Test datasets
Download test datasets for Hands-on sessions:
Test_data.zip