BioNT Intermediate course

Course objectives

Overall Course Objective

By the end of this course, students will be able to effectively utilize NumPy and Pandas libraries to manipulate, analyze, and process complex numerical and tabular data in Python, demonstrating proficiency in advanced array operations, data structures, and data manipulation techniques. Additionally, students will apply these skills to real-world bioinformatics problems, gaining practical experience in genomics data analysis and handling.

Specific Learning Objectives

1. After completing the NumPy section, students will be able to:

  • Explain the purpose and advantages of using NumPy in scientific computing and data analysis

  • Create, manipulate, and understand the structure of NumPy arrays

  • Apply various indexing and slicing techniques to access and modify array elements efficiently

  • Perform sorting and advanced indexing operations on NumPy arrays

  • Utilize NumPy’s built-in functions and attributes for basic array operations

  • Implement vectorized operations to optimize code performance

  • Apply broadcasting techniques to perform operations between arrays of different shapes

  • Use NumPy’s splitting functions to divide arrays into multiple sub-arrays

  • Apply NumPy techniques to analyze gene expression data in a bioinformatics context

2. After completing the Pandas section, students will be able to:

  • Understand the relationship between Pandas and NumPy, and when to use each library

  • Create and manipulate Pandas Series and DataFrames effectively

  • Apply various data manipulation techniques using DataFrame functions

  • Manage DataFrame indexes, including setting, resetting, and using multi-level indexes

  • Perform advanced slicing and filtering operations on DataFrames

  • Handle and manipulate string data within DataFrames

  • Identify and address missing data in Pandas objects

  • Create DataFrames from various data sources and reshape them as needed

  • Combine multiple DataFrames using merging and concatenation techniques

3. After completing the hands-on sessions, students will be able to:

  • Integrate theoretical knowledge of NumPy and Pandas with practical applications in genomic data analysis

    • Apply NumPy techniques to perform bioinformatics analysis on real-world datasets (e.g., gene expression data)

    • Use Python Pandas to examine and analyze real-world datasets (e.g., samples used for the gene expression study)

    • Demonstrate proficiency in handling and manipulating bioinformatics datasets using Pandas (e.g., GTF files)

Test datasets