The SEQrets of sequencing.

Hej again! Igor here, back for another talk about the exciting world of sequencing. In my science exhibition, I showed you a little around the equipment that we use for sequencing and a tiny sneak peek into how the technique works.

In this lecture we will dive a little deeper into this rich world of sequencing by giving you a general outline of how these techniques work and what you can do with them.

First off, let us consider what exactly is DNA sequencing. In a nutshell, DNA sequencing is a technique in which you “read” the DNA code. Now the reason you might want to do this differs from experiment to experiment. Whether you want to reconstruct the genome of a cell or organism, see how gene expression changes when adding a drug or stimulus, or to plainly see what the sequence is of the DNA strand you are working with, all these questions can be answered with sequencing.

Now reading this DNA code is a tricky matter, but can be done in multiple ways, some more applicable than others in certain contexts, each having their own advantages and disadvantages. I will not go into too much detail with all sequencing methods and will just briefly describe three of the more commonly used techniques.

Types of sequencing
Sanger Sequencing
When considering sequencing, the first method that is usually explained is Sanger sequencing. This technique was created by one of the pioneers in the sequencing field: Fredrick Sanger.
In a nutshell, the technique relies on termination of your sequence during elongation. So originally, this was done by running an extension reaction with a DNA primer, DNA polymerase and a mixture of all four deoxynucleotidetriphosphates (dNTPs). This would normally result in a full-length copy of your sequence of interest, like in a PCR reaction. Now the trick is that to these mixtures you add a modified version of one of the bases (modified di-deoxynucleotidetriphosphates (ddNTPs)). Now as these bases lack a 3’ OH group, they will terminate the reaction. In essence this is a chance process, part of the sequences will incorporate a regular base and continue the extension reaction and part will incorporate these modified ones, causing the reaction to stop for this specific strand at that specific base. You do this reaction for each of the bases (i.e. ddATP, ddGTP, ddCTP and ddTTP) and run these reactions on a gel, you will see bands appearing at each point the reaction was terminated. Now as you know what base causes the termination, you can quite literally read off the base sequence from your gel. This was the original approach. Currently you can also use fluorescent ddNTPs, which make the readout easier. Advantages of this technique is that it is relatively cheap to do and the data that comes out is usually not bad. However, this technique is hard to adapt to high throughput experiments, as you can only detect a single sequence per reaction. [1]

The graphical representation of Sanger Sequencing output, from radioactive sequencing ladder on the left to the more currently used fluorescent labels on the right. Attribution: Abizar at English Wikipedia, CC BY-SA 3.0, via Wikimedia Commons

Illumina sequencing
Another, more high-throughput method for sequencing is by using one of the various Illumina sequencers. This form of sequencing focusses on sequencing many small sequences. These machines are currently quite extensively used for many high-throughput sequencing projects. In principle, Illumina machines will first allow your adapter-containing sequences to bind to the chip by DNA hybridization. Then this short sequence will be amplified using bridge PCR. This differs from regular PCR in only one way: the primers are not floating around in solution but are instead attached to the chip. As a result, copies of the same sequence will only form nearby, creating a little polymerase colony (polony) on the chip. Once your polonies are big enough, you flush in fluorescent dNTPs. Now a single polony is made up of identical sequences and will thus all incorporate the same fluorescent bases at every step, making the polonies light up according to the base they incorporate, which you can detect with a camera. This approach allows for the identification of many sequences at once, making it suitable for high-throughput sequencing. Another advantage is that Illumina is widely used and is quite standardized, so many commercial kits exist for different uses and thus can be applied in various fields. However, this technique does come with a notable disadvantage, namely, it can only handle short reads, not more than 600bp in certain machines. In most cases you can still rather easily reconstruct your genome or sequence of interest using bioinformatic tools. However, reconstructing genomes with many repetitive regions or effectively detecting small and rare variations becomes very difficult. [2][3][4][5]

For more information on Illumina sequencing, check out their video on the explanation of the technique: Illumina Sequencing by Synthesis

Oxford Nanopore sequencing
For longer reads, a few options exist, although not many. One notable type of sequencing is Nanopore Sequencing. The method by which it sequences is different and not optics dependent. The way it works is by letting the DNA go through a nanoscopic pore and measure the change in voltage over that pore. As the sequence affects the voltage change, you can measure which sequence passes through the pore at that moment. Now as this is solely voltage dependent, there is no real maximum for the size of the reads, making it virtually limitless. However, the downside is that the amount of reads and sequence accuracy is somewhat low compared to more established high throughput techniques. [6][7][8]

For an overview on how Oxford Nanopore Sequencing works, check out their video explanation on the technology: How nanopore sequencing works

Library preparation
In the sequencing lingo, the prepared sample that you want to sequence is referred to as the “library”. Now the way you prepare your library depends on your sample, the question you want to answer and of course the sequencing method that you will use. Ultimately, this is the most defining step of the whole sequencing process, mainly as this is the only part that is not fully standardized and automated. Now as all these library preparation methods can span a lecture on their own, for the sake of simplicity I will only give a quick overview of Illumina-related library preparations.

For Illumina libraries, most sequencing preparations bottle down to “get your DNA down to the correct size and add Illumina adapters to them”. If your sample of interest is DNA, you can basically just do those steps. Now in many experiments, it is not DNA that is of interest, but usually mRNA. Luckily, the procedure is almost identical, only difference you first create a copy DNA (cDNA) of your mRNA molecule using reverse transcription.

Once you got your little collection of DNAs, you need to get it to a desired size. Remember that Illumina only supports relatively small reads. This means we first break our sequences down to sizes for our respective machines and protocols, which can be done in many ways, depending on your protocol of interest. Then, when all is said and done (and broken down), we add adapters to the mix for them to be added to your sequences of interest. The reason you do this, is because these adapters will allow your sequences to attach to the sequencing chip and will allow for all sequencing reactions to be performed. Again, this step can be done in many ways, but the most common ones are using ligation or PCR.

Now once that is complete and your unreacted adapters are washed away, and your library preparation is finished. It is very important you properly check if your library is fine and how much you have, because Illumina sequencing is very sensitive. This means if you add too much, you will overload your chip and your Illumina machine will simply stop the run, being unable to use your sample. Now for Illumina specifically, as this is based on microscopy, it needs to distinguish polonies from each other. This means that is you have too little sample, you will get few reads, but you will still get them. If you overdo it though, all polonies will overlap and the machine cannot distinguish anything anymore, meaning you will get no reads whatsoever. So, keep that in mind when sequencing yourself!

Now this brings us to how you check your library. Often you can do a combination of two things. First you check the quality of your library, for example using Bioanalyzer, which will show you a length distribution of your library, the goal is to have a single peak of the correct length (meaning all sequences were correctly trimmed). Secondly you check your quantity, for this you can use quantitative techniques like Qubit or qPCR. [9]

Once you checked your library and you are satisfied with it, you can start sequencing!

The actual sequencing
Now the actual sequencing is very simple and very straightforward, almost anti-climatically so. It really bottles down to three steps:

  1. Dilute library and pipette it in the correct well.
  2. Put cartridges in the machine.
  3. Press the big “Start” button.

Now obviously you also need to set up your run and make sure you selected the correct sizes and read lengths in your setup. But once you did that, you are done, and it will be the machine’s turn to deliver.

Now optional things you can do include swinging by the machine from time to time to check up on it, worrying if you set up the run correctly, checking if you calculated the dilution correctly and, in some cases, praying for a good run.

Results!
Now after a long night of potential stress you check up on the machine and it could say “run completed without errors”. Once that little sliver of hope has been given, you can get your data off the machine and move on to the next big thing: analysis. Now, analysis is a theme on its own, which depends again on what you are looking at exactly. So, for now, I will leave it at that.

I hope you enjoyed my little lecture on sequencing, and I hope you learned something from it! Stay tuned for more lectures from my network fellows, and I will see you in my next blog!

References and further reading:

  1. Sanger F; Nicklen S; Coulson AR (1977). “DNA sequencing with chain-terminating inhibitors”. Proc. Natl. Acad. Sci. U.S.A. 74 (12): 5463–7.
  2. Bentley DR, Balasubramanian S, Swerdlow HP, et al. (2008). “Accurate whole human genome sequencing using reversible terminator chemistry”. Nat. 2008;456(7218):53– 59.
  3. Illumina Next Generation Sequencing (https://www.illumina.com/science/technology/next-generation-sequencing.html)
  4. Illumina sequencing introduction (https://www.illumina.com/content/dam/illumina-marketing/documents/products/illumina_sequencing_introduction.pdf)
  5. Illumina Maximum real length (https://emea.support.illumina.com/bulletins/2020/04/maximum-read-length-for-illumina-sequencing-platforms.html )
  6. Oxford Nanopore how it works (https://nanoporetech.com/how-it-works)
  7. Oxford Nanopore, types of nanopores (https://nanoporetech.com/how-it-works/types-of-nanopores)
  8. Oxford Nanopore sequencing workflow (https://nanoporetech.com/how-it-works/nanopore-sequencing-workflow)
  9. Illumina TruSeq Targeted RNA Expression Reference Guide (https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_truseq/truseqtargetedrna/truseq-targeted-rna-expression-reference-guide-15034665-01.pdf)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: