How to analyze CRISPR-Cas9 sample using CrispRVariants

Hi sir,
Winter is comming. I am thinking about setting kotatsu.

Today, I introduce how to analyze CRISPR-Cas9 sample using CrispRVariants. which is a accurate NGS analysis tool. This is specialized for analyzing genome editing data.

Citation info: Lindsay, Helen, et al. “CrispRVariants charts the mutation spectrum of genome engineering experiments.” Nature biotechnology 34.7 (2016): 701..

In previous posts, I introduced how to map NGS data using BWA-MEM. I will use this bam file to introduce it. If you have not seen it, you can check the post before.

I will show an alignment map of the region indicated as the following “focusing_range” in the wildtype amplicon sequence.

Whole sequence of amplicon
Focusing range

OK, I will put a simple example…

STEP1: Searching the region of “focusing_range” using Blat

1: Create visual environment

1
conda create --name blat_env --channel bioconda blat;

2: Activate visual environment

1
source activate blat_env;

3: Make this fasta file as focusing_range.fa

1
2
>focusing_range
gtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccc

The above sequence is saved as focusing_range.fa into /Volumes/databank1/ngs/.

4: Searching the region of “focusing_range” using Blat

1
blat /Volumes/databank1/ngs/hg38.fa /Volumes/databank1/ngs/focusing_range.fa -minMatch=0 -minScore=80 /Volumes/databank1/ngs/focusing_range.fa.psl;

The parameter of -minScore is to be the sequence length of focusing_range.fa

When the below sequence is shown, the process is end. focusing_range.fa.psl is automatically Saved in /Volumes/databank1/ngs/.

1
2
Loaded 3088269832 letters in 24 sequences
Searched 80 bases in 1 sequences

5: Make tsv file to open the result of blat (focusing_range.fa.psl)

1
cp /Volumes/databank1/ngs/focusing_range.fa.psl /Volumes/databank1/ngs/focusing_range.fa.psl.tsv;

Open /Volumes/databank1/ngs/focusing_range.fa.psl.tsv using Excel or the other spreadsheet.
You have to memorilize the value of highlighted cells (strand : - / T name : chr12 / T start : 56638284 / T end : 56638364)

Result of Blat

STEP2: Install CrispRVariants using RStudio.

CrispRVariants sometimes fails to be installed now (at 2019/11/07). Especially, Anaconda version is unstable. I hope that it will be improved.
Anyway, to install CrispRVariants, you can do the following procedure in video. It is important to type no when the RStudio asked you Do you want to install from sources the packages which need compilation? (Yes/no/cancel).

STEP3: Analyzing CRISPR-Cas9 sample using CrispRVariants

Open the RStudio.app, and type the following script.

6: Install and load packages

1
2
3
4
5
6
7
8
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("CrispRVariants")
BiocManager::install("GenomicRanges")

library("CrispRVariants")
library("GenomicRanges")

7: Name the analysis

1
2
# Name of this analysis (Free)
dataname.char <- "my NGS data"

8: Enter the file path of the bam file generated using BWA-MEM

1
2
# Place of your bam file
bam.file.path <- "/Volumes/databank1/ngs/out.extendedFrags.fastq.bam"

9: Enter the sequence of focusing_range

1
2
# Nuclease target site (This is protospacer sequence in CRISPR-Cas9.)
focusing.sequence.char <- "gtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccc"

10: Enter the position of cut point in focusing_range

1
2
# Cut point in the focusing range (5'->3')
target.location <- 40

11: Enter the range where you consider a mismatch base as Single Nucleotide Variants

1
2
3
# How long region (upstream (5'side from the cut site) / downstream (3'side from the cut site)) do you consider as Single Nucleotide Variants?
upstream.snv.num <- 17
downstream.snv.num <- 6

12: Enter the flag whether you can treat chimera sequence.

1
2
3
# Flag to determine how chimeric reads are treated. One of "ignore", "exclude",and "merge".
# I reccomend to use "ignore" in first time.
treat.chimeras <- "ignore"

The function recognizing a chimeric sequence is outstanding advantage of CrispRVarinats.
However, I recommend to ignore that in the first time because recognizing a chimeric sequence cause the loss of detected reads.

13: Enter the information of focusing_range.fa.psl

1
2
3
4
blat.strand <- "-"
blat.T.name <- "chr12"
blat.T.start <- 56638284
blat.T.end <- 56638364

14: Create the GRange object of focusing_range sequence

1
focusing.grange = GRanges(blat.T.name, blat.strand, ranges = IRanges(blat.T.start + 1, blat.T.end))

There is a difference between GRange and range of the Blat result. I adjusted it by adding + 1 to blat.T.start.

15: Create crispr set object

1
2
3
4
5
6
7
crispr.set <- readsToTarget(bam.file.path
, focusing.grange, reference = focusing.sequence.char
, target.loc = target.location
, names = dataname.char
, upstream.snv = upstream.snv.num
, downstream.snv = downstream.snv.num
, chimeras = treat.chimeras)

output

1
2
3
4
5
6
7
8
9
10
Read 158666 alignments, excluded 0

147559 of 158666 nonchimeric reads span the target range

narrowing alignments


Initialising CrisprRun my NGS data

Initialising CrisprSet chr12:56638285-56638364 with 1 samples

16: Create alignment map

1
plotVariants(crispr.set)

Alignment map

17: Estimate the editing efficiency

1
mutationEfficiency(crispr.set)

output

1
2
my NGS data     Average      Median     Overall       StDev   ReadCount 
61.37 61.37 61.37 61.37 NA 147559.00

The editing efficiency is 61.37%. SNV is not counted into the efficiency.
In this sample, there is only one sample type. So, The “Average” is same as “my NGS data”.

18: Show the consensus sequences of variant alleles

1
consensusSeqs(crispr.set)

output

1
2
3
4
5
6
7
8
9
10
11
12
13
  A DNAStringSet instance of length 629
width seq names
[1] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC no variant
[2] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAATGCCCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:-12G
[3] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGAGGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:-5C
[4] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTAACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:1T
[5] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAAGGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:-9C
... ... ...
[625] 123 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCT...CGATGAATGCTCTTCAGCCCGCTTATCAGCTTTTGCCAC -1:43I
[626] 136 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACACAT...CGATGAATGCTCTTCTGCCAGCTTATCAGCTTTTGCCAC 2:1D,6:57I
[627] 123 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCG...CGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC 3:43I
[628] 175 GGGGCAAGGAGAGAGACAGTACAGCAGACAAAGACCCCT...CAGCTTATCAGCTTTAGCCAGATTATCATCTTTTGCCAC -30:18I,1:77I
[629] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAGACCCCTCAACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC 1:1I,11:1D

You can save the full sequence infromation using the following command

1
write.csv(as.data.frame(consensusSeqs(crispr.set)), file.path("/Volumes/databank1/ngs/", "consensus.seqs.csv"))

19: Show the count data of variant alleles

1
variantCounts(crispr.set)

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
                           my NGS data
no variant 55017
SNV:-12G 136
SNV:-5C 104
SNV:1T 86
SNV:-9C 69
SNV:-17G 65
...
1:1I 44667
2:77I 26772
-1:1D 1491
-10:13D 1417
1:7D 892
...
-1:43I 1
2:1D,6:57I 1
3:43I 1
-30:18I,1:77I 1
1:1I,11:1D 1

You can save the full variants infromation using the following command

1
write.csv(as.data.frame(variantCounts(crispr.set)), file.path("/Volumes/databank1/ngs/", "variantcounts.csv"))

20: Create barplot of variants classification by size

1
barplotAlleleFreqs(variantCounts(crispr.set))

Alignment map

This is a simple example. You can change the procedure.

Environment

Hardware

  • MacBook Air (2017 model)
  • Processor 1.8 GHz Intel Core i5
  • Memory 8GB
  • 500 GB external hard drive

Software

  • macOS Catalina
  • GNU bash version : 3.2.57(1)-release (x86_64-apple-darwin18)

Anaconda

  • conda version : 4.7.11
  • conda-build version : 3.17.6
  • python version : 2.7.15.final.0
  • Standalone BLAT v. 36

RStudio

  • R version 3.6.1 (2019-07-05)
  • Platform: x86_64-apple-darwin15.6.0 (64-bit)
  • RStudio Version 1.2.5019

Thank you so much…