Bioinformatics Fall 2020


  • LECTURE: ONLINE, Tuesdays, 5:35pm – 6:55pm
  • JOURNAL CLUB (grad section ONLY): ONLINE, time TBA
  • LAB: ONLINE, Thursdays, 5:35pm - 8:35pm


Dr. Yana Bromberg
Lipman Hall 218

Lab Instructor

Zishuo Zeng
Lipman Hall 222

Office Hours: All done remotely and by appointment
Required Text: There is NO REQUIRED TEXT for the lab or lecture
Suggested textbooks are: Bioinformatics Algorithms: An Active Learning Approach, 2nd Ed. Vol. 1 and 2, by Philipp Compeau and Pavel Pevzner. Publisher: Active Learning Publishers; 2nd edition (2015); ISBN-13: 978-0990374619 and 978-0990374626
Suggested Online resource at:


Bioinformatics aims to build computational models of biological systems. More specifically, bioinformatics involves creating algorithms, databases, systems, and web applications to solve problems in molecular biology. Here, all computational advances are “fair game”. Bioinformatics tools use artificial intelligence, rely on “cloud” computing, and borrow concepts from signal processing and circuit theory. All of these developments are necessary to deal with the inordinate amounts of data that is being produced by modern high-throughput experimental techniques.

Due to the drop in sequencing costs we are awash in DNA, RNA, and protein sequences. Massive genomics and metagenomics efforts are opening new horizons in variation analysis. Structural genomics efforts have produced a crystal structure representative of almost every protein family. Microarray technologies allow simultaneous studies of expression of thousands of genes on a single chip. The improvements keep on coming – more information, higher resolution. Yet the unintended result of improved experimental techniques is a flood data that we have yet to make sense of. What does our genome encode? What about the soil metagenome? Can we decipher the mechanisms of disease? How are we different from other organisms? How are we different from each other? Bioinformatics attempts to answer all these questions in detail and give the statistical significance, if possible.

What this course IS

This course is designed to introduce experimental (wet lab) biologists to bioinformatics concepts, principles, and techniques within the framework of basic command-line and web-based databases/tools.

Students that take the course are expected to know how to work in a command-line environment and have a basic understanding of programming/ scripting.

The course includes a brief introduction to working with UNIX/LINUX systems, writing Python scripts, and automating/using existing applications for the analysis of large datasets. All work will be done in a live development environment; i.e. students will have access to the same high-performance computational resources used by labs on campus. By the end of this course, students will possess a bioinformatics skill set, including an informed vocabulary and knowledge of basic script development, sufficient for a productive collaboration within a multi-disciplined research team.

What this course IS NOT

This is NOT an applied methods course; rather, this class is aimed at understanding of underlying algorithms. We will NOT attempt to list all available tools for every project or teach you how to use them. Method selection, along with the corresponding cutoffs, thresholds, and settings, is specific to each and every research project. If you keep up with the class material you will understand the method underpinnings and be able to able to optimize your project choices on your own.

This course is also NOT intended for computer science students looking to develop algorithmic expertise in bioinformatics.

Course Objectives

  1. Introduce students to the current bioinformatics algorithms/concepts and their implementations.
  2. Introduce students to the basics of working in a Linux environment, SLURM environment job submissions (for parallel computing), and Python scripting.
  3. Teach students to cast a molecular biology problem as a bioinformatic problem, provide them with the skills necessary to independently select relevant tools, optimize their settings, and build pipelines to solve the set problem.
  4. Prepare students for more advanced bioinformatics courses.
  5. Teach students a sufficient bioinformatics skill set, including an informed vocabulary and knowledge of basic script development, for productive collaboration within a multi-disciplined research team.

Lectures and slides

Lectures will be taught as a combination of PowerPoints and discussions. Slides will be posted before class, but will contain only an outline of the class lecture. They are intended to help you reconstruct the class-work, but are not intended as a substitute for listening and asking questions. This semester (Fall 2020) video recordings of the lecture will also be made available after the class. However, class participation, i.e. asking questions during lecture time is expected and will be included into the grade (see below).


Coursework will be weighted as follows:

Class Participation10%
Lab Homework/Quizzes30%
Journal Club (graduate student only)Pass/Fail


Lectures will be given in sections. That is, I will present a portion of the lecture and we can then discuss any questions that arise. Class/discussion participation is necessary for understanding of the material. Ten percent of your final grade will depend on you asking questions and participating in class discussions. Class participation grade has nothing to do with being correct – it will only reflect your willingness to work towards a understanding.

Furthermore, since there is no textbook for this course, participating in the lectures is absolutely necessary for understanding of the material. You are responsible for all material covered in the lecture and discussion regardless of whether it is present in the slides. The entire semester consists of 13 inter-dependent lectures. Missing a lecture likely means that you will not understand the following lectures – please keep on schedule.

We will not be taking attendance, but being consistently late or absent from lecture will reduce your class-participation grade


(E-)Attendance in lab IS REQUIRED. Missing one lab without a valid (WRITTEN and/or DOCUMENTED explanation) means you miss one quiz (no make ups will be given). Missing more than one without an explanation will result in a FAILING GRADE for the class. Logging in late is also not acceptable as important information and quizzes will be typically given at the beginning of class without a make-up option.

Homework / Quizzes

Completed homework assignments are due at the beginning of lab or lecture one week from the date they are assigned, unless otherwise specified. Late submissions will NOT be accepted. Assignments containing scripts (written code) must run properly in the standard development environment. No submission, empty submissions, or “fake” submissions (i.e. scripts that are clearly not expected to do the assignment) will receive 0% grades. Properly documented/ commented scripts that produce errors/warnings and/or fail to provide the correct, formatted output will receive no more than 50% of the grade. That said, your programs will not be expected to handle user-input errors (unless otherwise specified) and will not be tested for such.

Quizzes will be given at the discretion of the lab instructor. Quizzes may be announced, but do not have to be. Quizzes may be written, coding, or both. They may cover lab and/or lecture material, but they will always relate to current topics. We are not looking to “burn” students with Linux questions in week 10, though you should get perfect scores if such a quiz was given. A quiz may be given at any time during any class period - immediately before or after a lecture, during a class, etc. There will be no make-up quizzes.


The midterm will have an exam portion (taken “in-class”; closed notes, no phones, no communication, etc.) and an off-line programming assignment.

It will be based on material covered in lecture AND lab. You will have one week to complete the take-home project. Your TA will be available during lab-time (Oct 29th, after lecture) to discuss assignment problems (NOT to help you solve them). **Late projects will NOT be accepted. **

The final will have both an algorithmic and a writing component. The algorithmic portion of the final will be a multi-tool workflow/pipeline exercise (very flexible in implementation, but necessarily well explained and documented), focusing on all techniques learned throughout class. The writing portion of the final will include designing, running, and writing up a computational analysis of some biological data, using techniques learned in class. The results will need to be described in scientific article format; i.e. introduction and background, results, materials and methods, and discussion. You will have at least three weeks to complete the take-home portion. Late reports will NOT be accepted.

Journal Club (Graduate Students Only)

Graduate students in the class will be required to attend journal club meetings. The number of sessions will be adjusted depending on the number of people in the class.

In the span of the semester you will be required to read and present at least one bioinformatics paper of your choice. You should choose the paper at least a week in advance of your presentation. You will be expected to explain and then defend or debunk the author’s choice of bioinformatics techniques.

The presentations will not be graded, but without presenting you will be assigned a FAILING grade in the course.

If you are not presenting on a particular week, you are expected to read the paper and contribute to the discussion. This contribution will affect class participation grades.

Undergraduates are encouraged to attend the journal club, read the papers, and potentially present. Note, however, that this will NOT count as extra credit.

Academic honesty

Academic honesty is an absolute requirement for students taking Bioinformatics. Dishonesty, in any form, will NOT be tolerated. This includes cheating on homework, quizzes, projects, as well as any form of plagiarism. Please note that working together on homework assignments and submitting identical work are NOT THE SAME; same goes for searching the web for solutions to problems, text for your project written components, and/or ready-made code. VERY IMPORTANT: We read (and write!!) papers and Wikipedia entries too and know where certain texts come from. It’s easy to tell when you’ve copied a sentence or two. It is even easier to tell if script code was copied – please keep this in mind. ALL CHEATING WILL BE REPORTED. Students contemplating cheating should consider the severe repercussions of getting caught. The Rutgers University Academic Integrity Policy can be found at:

Group work policy

In order to facilitate learning, students are encouraged to discuss homework problems amongst themselves. Copying a solution is not, however, the same as “discussing”. According to one colleague, Dr. Iddo Friedberg, a good rule of thumb is the “cup of coffee” rule. After discussing a problem, you should not take away any written record or notes of the discussion. Go have a cup of coffee, and read the front page of the newspaper. If you can still re-create the problem solution afterward from memory, then you have learned something, and are not simply copying.



  1. The class sessions will be on zoom. Turn OFF your mic when you enter the room.
  2. Please be on time.
  3. If your video is off (I would rather it was on in general, but it is your call), please turn it ON when asking questions (don’t forget to turn the mic ON too).
  4. Lab time is to be spent on lab work. Lab time is not free time. If you finish early, you should start on the associated homework assignment.


In general, your ideas, comments, suggestions, questions, grade challenges, etc. are welcome. Your discretion in these matters is expected, however. No part of your grade will be based on anything other than your coursework and class participation.


Make sure you stay on top of your homework assignments. Waiting until the last minute to complete an assignment will not work in this course.

Tentative schedule (subject to change)

1-SepIntro to Bioinformatics3-SepIntro to Linux (install R-studio, Weka)
8-SepNO class, Monday schedule10-SepIntro to Python
15-SepGene Finding17-SepPython II
22-SepPairwise sequence alignment, deriving BLOSUM24-SepAlignments (SW/NW)
29-SepBLAST, affine gap costs1-OctBLAST
6-OctMSAs and domain families8-OctPfam, MAFFT
13-OctSequence signatures and motifs15-OctInterPro, Python III
Midterm programming portion assigned
20-OctSynchronous Midterm22-OctStructural Bioinformatics Lecture
Programming portion collected
27-OctStructural Bioinformatics (cont’d) and Phylogenomics29-OctPyMol (with emphasis on ligand binding and motifs)
3-NovPhylogenomics (cont’d)5-NovTree-Building
10-NovMetagenomics12-NovMG-RAST, Mi-faser
17-NovGene expression analysis19-NovThanksgiving Break, no class
24-NovVariation and molecular level natural selection26-NovVCF and annotations
1-DecDisease gene prioritization3-DecWeka
8-DecPersonalized medicine
Written portion of final assigned
10-DecMachine learning
18-DecSynchronous Final 12-3pmWritten portion of final due