Bioinformatics Fall 2020
Location
- LECTURE: ONLINE, Tuesdays, 5:35pm – 6:55pm
- JOURNAL CLUB (grad section ONLY): ONLINE, time TBA
- LAB: ONLINE, Thursdays, 5:35pm - 8:35pm
Lecturer
Dr. Yana Bromberg
Lipman Hall 218
yanab@sebs.rutgers.edu
Lab Instructor
Zishuo Zeng
Lipman Hall 222
zz109@scarletmail.rutgers.edu
Office Hours: All done remotely and by appointment
Required Text: There is NO REQUIRED TEXT for the lab or lecture
Suggested textbooks are: Bioinformatics Algorithms: An Active Learning Approach, 2nd Ed. Vol. 1 and 2, by Philipp Compeau and Pavel Pevzner. Publisher: Active Learning Publishers; 2nd edition (2015); ISBN-13: 978-0990374619 and 978-0990374626
Suggested Online resource at: http://rosalind.info
Introduction
Bioinformatics aims to build computational models of biological systems. More specifically, bioinformatics involves creating algorithms, databases, systems, and web applications to solve problems in molecular biology. Here, all computational advances are “fair game”. Bioinformatics tools use artificial intelligence, rely on “cloud” computing, and borrow concepts from signal processing and circuit theory. All of these developments are necessary to deal with the inordinate amounts of data that is being produced by modern high-throughput experimental techniques.
Due to the drop in sequencing costs we are awash in DNA, RNA, and protein sequences. Massive genomics and metagenomics efforts are opening new horizons in variation analysis. Structural genomics efforts have produced a crystal structure representative of almost every protein family. Microarray technologies allow simultaneous studies of expression of thousands of genes on a single chip. The improvements keep on coming – more information, higher resolution. Yet the unintended result of improved experimental techniques is a flood data that we have yet to make sense of. What does our genome encode? What about the soil metagenome? Can we decipher the mechanisms of disease? How are we different from other organisms? How are we different from each other? Bioinformatics attempts to answer all these questions in detail and give the statistical significance, if possible.
What this course IS
This course is designed to introduce experimental (wet lab) biologists to bioinformatics concepts, principles, and techniques within the framework of basic command-line and web-based databases/tools.
Students that take the course are expected to know how to work in a command-line environment and have a basic understanding of programming/ scripting.
The course includes a brief introduction to working with UNIX/LINUX systems, writing Python scripts, and automating/using existing applications for the analysis of large datasets. All work will be done in a live development environment; i.e. students will have access to the same high-performance computational resources used by labs on campus. By the end of this course, students will possess a bioinformatics skill set, including an informed vocabulary and knowledge of basic script development, sufficient for a productive collaboration within a multi-disciplined research team.
What this course IS NOT
This is NOT an applied methods course; rather, this class is aimed at understanding of underlying algorithms. We will NOT attempt to list all available tools for every project or teach you how to use them. Method selection, along with the corresponding cutoffs, thresholds, and settings, is specific to each and every research project. If you keep up with the class material you will understand the method underpinnings and be able to able to optimize your project choices on your own.
This course is also NOT intended for computer science students looking to develop algorithmic expertise in bioinformatics.
Course Objectives
- Introduce students to the current bioinformatics algorithms/concepts and their implementations.
- Introduce students to the basics of working in a Linux environment, SLURM environment job submissions (for parallel computing), and Python scripting.
- Teach students to cast a molecular biology problem as a bioinformatic problem, provide them with the skills necessary to independently select relevant tools, optimize their settings, and build pipelines to solve the set problem.
- Prepare students for more advanced bioinformatics courses.
- Teach students a sufficient bioinformatics skill set, including an informed vocabulary and knowledge of basic script development, for productive collaboration within a multi-disciplined research team.
Lectures and slides
Lectures will be taught as a combination of PowerPoints and discussions. Slides will be posted before class, but will contain only an outline of the class lecture. They are intended to help you reconstruct the class-work, but are not intended as a substitute for listening and asking questions. This semester (Fall 2020) video recordings of the lecture will also be made available after the class. However, class participation, i.e. asking questions during lecture time is expected and will be included into the grade (see below).
Grading
Coursework will be weighted as follows:
What | Weight |
---|---|
Class Participation | 10% |
Lab Homework/Quizzes | 30% |
Midterm | 20% |
Final | 40% |
Journal Club (graduate student only) | Pass/Fail |
Lecture
Lectures will be given in sections. That is, I will present a portion of the lecture and we can then discuss any questions that arise. Class/discussion participation is necessary for understanding of the material. Ten percent of your final grade will depend on you asking questions and participating in class discussions. Class participation grade has nothing to do with being correct – it will only reflect your willingness to work towards a understanding.
Furthermore, since there is no textbook for this course, participating in the lectures is absolutely necessary for understanding of the material. You are responsible for all material covered in the lecture and discussion regardless of whether it is present in the slides. The entire semester consists of 13 inter-dependent lectures. Missing a lecture likely means that you will not understand the following lectures – please keep on schedule.
We will not be taking attendance, but being consistently late or absent from lecture will reduce your class-participation grade
Lab
(E-)Attendance in lab IS REQUIRED. Missing one lab without a valid (WRITTEN and/or DOCUMENTED explanation) means you miss one quiz (no make ups will be given). Missing more than one without an explanation will result in a FAILING GRADE for the class. Logging in late is also not acceptable as important information and quizzes will be typically given at the beginning of class without a make-up option.
Homework / Quizzes
Completed homework assignments are due at the beginning of lab or lecture one week from the date they are assigned, unless otherwise specified. Late submissions will NOT be accepted. Assignments containing scripts (written code) must run properly in the standard development environment. No submission, empty submissions, or “fake” submissions (i.e. scripts that are clearly not expected to do the assignment) will receive 0% grades. Properly documented/ commented scripts that produce errors/warnings and/or fail to provide the correct, formatted output will receive no more than 50% of the grade. That said, your programs will not be expected to handle user-input errors (unless otherwise specified) and will not be tested for such.
Quizzes will be given at the discretion of the lab instructor. Quizzes may be announced, but do not have to be. Quizzes may be written, coding, or both. They may cover lab and/or lecture material, but they will always relate to current topics. We are not looking to “burn” students with Linux questions in week 10, though you should get perfect scores if such a quiz was given. A quiz may be given at any time during any class period - immediately before or after a lecture, during a class, etc. There will be no make-up quizzes.
Midterm/Final
The midterm will have an exam portion (taken “in-class”; closed notes, no phones, no communication, etc.) and an off-line programming assignment.
It will be based on material covered in lecture AND lab. You will have one week to complete the take-home project. Your TA will be available during lab-time (Oct 29th, after lecture) to discuss assignment problems (NOT to help you solve them). **Late projects will NOT be accepted. **
The final will have both an algorithmic and a writing component. The algorithmic portion of the final will be a multi-tool workflow/pipeline exercise (very flexible in implementation, but necessarily well explained and documented), focusing on all techniques learned throughout class. The writing portion of the final will include designing, running, and writing up a computational analysis of some biological data, using techniques learned in class. The results will need to be described in scientific article format; i.e. introduction and background, results, materials and methods, and discussion. You will have at least three weeks to complete the take-home portion. Late reports will NOT be accepted.
Journal Club (Graduate Students Only)
Graduate students in the class will be required to attend journal club meetings. The number of sessions will be adjusted depending on the number of people in the class.
In the span of the semester you will be required to read and present at least one bioinformatics paper of your choice. You should choose the paper at least a week in advance of your presentation. You will be expected to explain and then defend or debunk the author’s choice of bioinformatics techniques.
The presentations will not be graded, but without presenting you will be assigned a FAILING grade in the course.
If you are not presenting on a particular week, you are expected to read the paper and contribute to the discussion. This contribution will affect class participation grades.
Undergraduates are encouraged to attend the journal club, read the papers, and potentially present. Note, however, that this will NOT count as extra credit.
Academic honesty
Academic honesty is an absolute requirement for students taking Bioinformatics. Dishonesty, in any form, will NOT be tolerated. This includes cheating on homework, quizzes, projects, as well as any form of plagiarism. Please note that working together on homework assignments and submitting identical work are NOT THE SAME; same goes for searching the web for solutions to problems, text for your project written components, and/or ready-made code. VERY IMPORTANT: We read (and write!!) papers and Wikipedia entries too and know where certain texts come from. It’s easy to tell when you’ve copied a sentence or two. It is even easier to tell if script code was copied – please keep this in mind. ALL CHEATING WILL BE REPORTED. Students contemplating cheating should consider the severe repercussions of getting caught. The Rutgers University Academic Integrity Policy can be found at: http://academicintegrity.rutgers.edu/integrity.shtml.
Group work policy
In order to facilitate learning, students are encouraged to discuss homework problems amongst themselves. Copying a solution is not, however, the same as “discussing”. According to one colleague, Dr. Iddo Friedberg, a good rule of thumb is the “cup of coffee” rule. After discussing a problem, you should not take away any written record or notes of the discussion. Go have a cup of coffee, and read the front page of the newspaper. If you can still re-create the problem solution afterward from memory, then you have learned something, and are not simply copying.
GROUP WORK ON MIDTERM AND FINAL PROJECTS IS NOT ALLOWED.
CLASSROOM RULES OF CONDUCT
- The class sessions will be on zoom. Turn OFF your mic when you enter the room.
- Please be on time.
- If your video is off (I would rather it was on in general, but it is your call), please turn it ON when asking questions (don’t forget to turn the mic ON too).
- Lab time is to be spent on lab work. Lab time is not free time. If you finish early, you should start on the associated homework assignment.
YOUR IDEAS, EVALUATIONS, ETC.
In general, your ideas, comments, suggestions, questions, grade challenges, etc. are welcome. Your discretion in these matters is expected, however. No part of your grade will be based on anything other than your coursework and class participation.
SUGGESTIONS FOR SUCCESS
Make sure you stay on top of your homework assignments. Waiting until the last minute to complete an assignment will not work in this course.
Tentative schedule (subject to change)
Lecture | content | Lab | content |
---|---|---|---|
1-Sep | Intro to Bioinformatics | 3-Sep | Intro to Linux (install R-studio, Weka) |
8-Sep | NO class, Monday schedule | 10-Sep | Intro to Python |
15-Sep | Gene Finding | 17-Sep | Python II |
22-Sep | Pairwise sequence alignment, deriving BLOSUM | 24-Sep | Alignments (SW/NW) |
29-Sep | BLAST, affine gap costs | 1-Oct | BLAST |
6-Oct | MSAs and domain families | 8-Oct | Pfam, MAFFT |
13-Oct | Sequence signatures and motifs | 15-Oct | InterPro, Python III Midterm programming portion assigned |
20-Oct | Synchronous Midterm | 22-Oct | Structural Bioinformatics Lecture Programming portion collected |
27-Oct | Structural Bioinformatics (cont’d) and Phylogenomics | 29-Oct | PyMol (with emphasis on ligand binding and motifs) |
3-Nov | Phylogenomics (cont’d) | 5-Nov | Tree-Building |
10-Nov | Metagenomics | 12-Nov | MG-RAST, Mi-faser |
17-Nov | Gene expression analysis | 19-Nov | Thanksgiving Break, no class |
24-Nov | Variation and molecular level natural selection | 26-Nov | VCF and annotations |
1-Dec | Disease gene prioritization | 3-Dec | Weka |
8-Dec | Personalized medicine Written portion of final assigned |
10-Dec | Machine learning |
18-Dec | Synchronous Final 12-3pm | Written portion of final due | |
HAVE A GREAT VACATION |