2019-11-07

How to analyze CRISPR-Cas9 sample using CrispRVariants

Hi sir,
Winter is comming. I am thinking about setting kotatsu.

Today, I introduce how to analyze CRISPR-Cas9 sample using CrispRVariants. which is a accurate NGS analysis tool. This is specialized for analyzing genome editing data.

Citation info: Lindsay, Helen, et al. “CrispRVariants charts the mutation spectrum of genome engineering experiments.” Nature biotechnology 34.7 (2016): 701..

In previous posts, I introduced how to map NGS data using BWA-MEM. I will use this bam file to introduce it. If you have not seen it, you can check the post before.

I will show an alignment map of the region indicated as the following “focusing_range” in the wildtype amplicon sequence.

Whole sequence of amplicon
Focusing range

OK, I will put a simple example…

STEP1: Searching the region of “focusing_range” using Blat

1: Create visual environment

1	conda create --name blat_env --channel bioconda blat;

2: Activate visual environment

1	source activate blat_env;

3: Make this fasta file as focusing_range.fa

1 2	>focusing_range gtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccc

The above sequence is saved as focusing_range.fa into /Volumes/databank1/ngs/.

4: Searching the region of “focusing_range” using Blat

1	blat /Volumes/databank1/ngs/hg38.fa /Volumes/databank1/ngs/focusing_range.fa -minMatch=0 -minScore=80 /Volumes/databank1/ngs/focusing_range.fa.psl;

The parameter of -minScore is to be the sequence length of focusing_range.fa

When the below sequence is shown, the process is end. focusing_range.fa.psl is automatically Saved in /Volumes/databank1/ngs/.

1 2	Loaded 3088269832 letters in 24 sequences Searched 80 bases in 1 sequences

5: Make tsv file to open the result of blat (focusing_range.fa.psl)

1	cp /Volumes/databank1/ngs/focusing_range.fa.psl /Volumes/databank1/ngs/focusing_range.fa.psl.tsv;

Open /Volumes/databank1/ngs/focusing_range.fa.psl.tsv using Excel or the other spreadsheet.
You have to memorilize the value of highlighted cells (strand : - / T name : chr12 / T start : 56638284 / T end : 56638364)

Result of Blat

STEP2: Install CrispRVariants using RStudio.

CrispRVariants sometimes fails to be installed now (at 2019/11/07). Especially, Anaconda version is unstable. I hope that it will be improved.
Anyway, to install CrispRVariants, you can do the following procedure in video. It is important to type no when the RStudio asked you Do you want to install from sources the packages which need compilation? (Yes/no/cancel).

STEP3: Analyzing CRISPR-Cas9 sample using CrispRVariants

Open the RStudio.app, and type the following script.

6: Install and load packages

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("CrispRVariants")
BiocManager::install("GenomicRanges")

library("CrispRVariants")
library("GenomicRanges")

7: Name the analysis

1 2	# Name of this analysis (Free) dataname.char <- "my NGS data"

8: Enter the file path of the bam file generated using BWA-MEM

1 2	# Place of your bam file bam.file.path <- "/Volumes/databank1/ngs/out.extendedFrags.fastq.bam"

9: Enter the sequence of focusing_range

1 2	# Nuclease target site (This is protospacer sequence in CRISPR-Cas9.) focusing.sequence.char <- "gtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccc"

10: Enter the position of cut point in focusing_range

1 2	# Cut point in the focusing range (5'->3') target.location <- 40

11: Enter the range where you consider a mismatch base as Single Nucleotide Variants

1
2
3

# How long region (upstream (5'side from the cut site) / downstream (3'side from the cut site)) do you consider as Single Nucleotide Variants?
upstream.snv.num <- 17
downstream.snv.num <- 6

12: Enter the flag whether you can treat chimera sequence.

1
2
3

# Flag to determine how chimeric reads are treated. One of "ignore", "exclude",and "merge".
# I reccomend to use "ignore" in first time.
treat.chimeras <- "ignore"

The function recognizing a chimeric sequence is outstanding advantage of CrispRVarinats.
However, I recommend to ignore that in the first time because recognizing a chimeric sequence cause the loss of detected reads.

13: Enter the information of focusing_range.fa.psl

blat.strand <- "-"
blat.T.name <- "chr12"
blat.T.start <- 56638284
blat.T.end <- 56638364

14: Create the GRange object of focusing_range sequence

1	focusing.grange = GRanges(blat.T.name, blat.strand, ranges = IRanges(blat.T.start + 1, blat.T.end))

There is a difference between GRange and range of the Blat result. I adjusted it by adding + 1 to blat.T.start.

15: Create crispr set object

crispr.set <- readsToTarget(bam.file.path
, focusing.grange, reference = focusing.sequence.char
, target.loc = target.location
, names = dataname.char
, upstream.snv = upstream.snv.num
, downstream.snv = downstream.snv.num 
, chimeras = treat.chimeras)

output

Read 158666 alignments, excluded 0

147559 of 158666 nonchimeric reads span the target range

narrowing alignments


Initialising CrisprRun my NGS data

Initialising CrisprSet chr12:56638285-56638364 with 1 samples

16: Create alignment map

1	plotVariants(crispr.set)

Alignment map

17: Estimate the editing efficiency

1	mutationEfficiency(crispr.set)

output

1 2	my NGS data Average Median Overall StDev ReadCount 61.37 61.37 61.37 61.37 NA 147559.00

The editing efficiency is 61.37%. SNV is not counted into the efficiency.
In this sample, there is only one sample type. So, The “Average” is same as “my NGS data”.

18: Show the consensus sequences of variant alleles

1	consensusSeqs(crispr.set)

output

  A DNAStringSet instance of length 629
      width seq                                                                               names               
  [1]    80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC  no variant
  [2]    80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAATGCCCTTCAGCCAGCTTATCAGCTTTTGCCAC  SNV:-12G
  [3]    80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGAGGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC  SNV:-5C
  [4]    80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTAACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC  SNV:1T
  [5]    80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAAGGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC  SNV:-9C
  ...   ... ...
[625]   123 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCT...CGATGAATGCTCTTCAGCCCGCTTATCAGCTTTTGCCAC -1:43I
[626]   136 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACACAT...CGATGAATGCTCTTCTGCCAGCTTATCAGCTTTTGCCAC 2:1D,6:57I
[627]   123 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCG...CGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC 3:43I
[628]   175 GGGGCAAGGAGAGAGACAGTACAGCAGACAAAGACCCCT...CAGCTTATCAGCTTTAGCCAGATTATCATCTTTTGCCAC -30:18I,1:77I
[629]    80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAGACCCCTCAACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC  1:1I,11:1D

You can save the full sequence infromation using the following command

1	write.csv(as.data.frame(consensusSeqs(crispr.set)), file.path("/Volumes/databank1/ngs/", "consensus.seqs.csv"))

19: Show the count data of variant alleles

1	variantCounts(crispr.set)

output

                           my NGS data
no variant                       55017
SNV:-12G                           136
SNV:-5C                            104
SNV:1T                              86
SNV:-9C                             69
SNV:-17G                            65
...
1:1I                             44667
2:77I                            26772
-1:1D                             1491
-10:13D                           1417
1:7D                               892
...
-1:43I                               1
2:1D,6:57I                           1
3:43I                                1
-30:18I,1:77I                        1
1:1I,11:1D                           1

You can save the full variants infromation using the following command

1	write.csv(as.data.frame(variantCounts(crispr.set)), file.path("/Volumes/databank1/ngs/", "variantcounts.csv"))

20: Create barplot of variants classification by size

1	barplotAlleleFreqs(variantCounts(crispr.set))

Alignment map

This is a simple example. You can change the procedure.

Environment

Hardware

MacBook Air (2017 model)
Processor 1.8 GHz Intel Core i5
Memory 8GB
500 GB external hard drive

Software

macOS Catalina
GNU bash version : 3.2.57(1)-release (x86_64-apple-darwin18)

Anaconda

conda version : 4.7.11
conda-build version : 3.17.6
python version : 2.7.15.final.0
Standalone BLAT v. 36

RStudio

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
RStudio Version 1.2.5019

Thank you so much…

n|nja

Bioinfomatics & Genome editing

How to analyze CRISPR-Cas9 sample using CrispRVariants

STEP1: Searching the region of “focusing_range” using Blat

1: Create visual environment

2: Activate visual environment

3: Make this fasta file as focusing_range.fa

4: Searching the region of “focusing_range” using Blat

5: Make tsv file to open the result of blat (focusing_range.fa.psl)

STEP2: Install CrispRVariants using RStudio.

STEP3: Analyzing CRISPR-Cas9 sample using CrispRVariants

6: Install and load packages

7: Name the analysis

8: Enter the file path of the bam file generated using BWA-MEM

9: Enter the sequence of focusing_range

10: Enter the position of cut point in focusing_range

11: Enter the range where you consider a mismatch base as Single Nucleotide Variants

12: Enter the flag whether you can treat chimera sequence.

13: Enter the information of focusing_range.fa.psl

14: Create the GRange object of focusing_range sequence

15: Create crispr set object

16: Create alignment map

17: Estimate the editing efficiency

18: Show the consensus sequences of variant alleles

19: Show the count data of variant alleles

20: Create barplot of variants classification by size

Environment