Special Topics in Data Science

Algorithms in Computational Biology

Spring 2022

CS 4364/CS 5364 (CRNs 27123/28040)
Instructor Dan DeBlasio Time TR 16:30-17:50
Email dfdeblasio _at_ utep.edu Location Chemistry and Computer Science Building 1.0202
OfficeCCSB 3.1008 and teamsChat.deblasiolab.org Office Hours:2-3pm Monday and Tuesday
or by appointment (calendly.deblasiolab.org).
Syllabus updated: February 08 2022
Source code for the syllabus and all homework assignments can be found on github and are licensed under Creative Commons (CC-BY-SA-4.0).

We will through the duration of the semester examine common algorithmic solutions to domain specific data science problems, and how to distill computational problems from questions asked in other domains (i.e. computational biology). While the specific applications we will use as examples are in biology, the approaches discussed are applicable to many interdisciplinary fields. That said, this course is self contained and no previous knowledge of biology is needed. The solution techniques include dynamic programming, computational optimization/integer linear programming, and machine learning to name a few.

Some of the specific biological problems to be discussed are:

While these are some of the topics we will discuss they are in no way exhaustive of the field; as with previous versions of the course I am open to suggestions of topics of interest to the students enrolled. The only prerequisite is CS 3 (CS 2302), only the minimum amount of biology will be included to understand the underlying question and the extraction of the computational problem, but it will be self contained and none is expected ahead of time.

We will use "Algorithms in Bioinformatics" by Wing-Kin Sung[1] as our primary text, but this will be supplemented with other literature soruces that will be provided.

[1] CRC Press, ISBN 9781420070330(Hardcover)/9780367659318(Paperback)

Date Slides Homework Other
18 January 2022 (W1T) Introduction Slides
Algorithm Refresher
20 January 2022 (W1R) Linear Programming Welcome Survey
25 January 2022 (W2T) Biology Primer Homework 1
(due 31 January 2022)
Example Solution
27 January 2022 (W2R) Sequence Similarity
1 February 2022 (W3T) Sequence Similarity (continued) Homework 2
(due 14 9 February 2022)
3 February 2022 (W3R) weather day
open help session
8 February 2022 (W4T) Sequence Similarity (continued) updated syllabus to include
late homework policy
10 February 2022 (W4R) Parametric Alignment Gusfield Chapter 13 posted on Teams
15 February 2022 (W5T) Suffix Trees in class: activity
No office hours 21 Feb 2022
17 February 2022 (W5R) Suffix Trees Homework 3
(due 23 February 2022)
Replacement Office Hours:
Monday 21 Feb @ 3pm
22 February 2022 (W6T) Suffix Trees
BWT/FM
24 February 2022 (W6R) BWT/FM
LCS
Homework 4
(due 7 March 2022)
Midterm Exam:
10 March 2022
1 March 2022 (W7T) Multiple Sequence Alignment
3 March 2022 (W7R) Multiple Sequence Alignment Spring 2021 Midterm
(with solution)
8 March 2022 (W8T) Review
10 March 2022 (W8R) Midterm Exam (Solutions) No Office Hours
21 March 2022
(Monday after SB)
15&17 March 2022: Spring Break
22 March 2022 (W9T) Project Description
24 March 2022 (W9R) Genome Alignment Homework 5
(due 30 March 2022)
29 March 2022 (W10T) Database Search
Hashing
31 March 2022 (W10R) Database Search
Hashing
Homework 6
(due 6 13 April 2022)
5 April 2022 (W11T) Hashing
7 April 2022 (W10R) Phylogenetics
12 April 2022 (W11T) Phylogenetics Homework 7
(due 18 April 2022)
14 April 2022 (W11R) Read Alignment
19 April 2022 (W12T) RNA-Seq Assembly
21 April 2022 (W12T) de novo Assembly Homework 8
(due 27 April 2022)
26 April 2022 (W13T) Biological Networks Homework 9
(due 9 May 2022)
28 April 2022 (W13R) Alignment-Free Genomics
3 May 2022 (W14T) Final Review updated HW9 with typo: runngin time is O(m^2n) not O(mn^2)
5 May 2022 (W14R) Term Paper Presentations

Useful External Links

Slides:

Homework