How to analyze CRISPR-Cas9 sample using CRISPResso in Anaconda environment

Hi sir,
I feel like eating satsuma due to Japanese cool climate. What is traditional winter fruit in your living place?

Anyway, today, I introduce how to analyze the CRISPR-Cas9 sample using CRISPResso. which is a user-friendly NGS analysis tool. This is specialized for analyzing genome editing data generated with CRISPR.

Citation info: Pinello, Luca, et al. “Analyzing CRISPR genome-editing experiments with CRISPResso.” Nature biotechnology 34.7 (2016): 695..

I have to note that this is an old version of CRISPResso. The author does not recommend to use it. However, it has been used in several papers for a few years. It can be used to reproduce the data and perform the comparison.

I use the published raw NGS amplicon data (DRR147084_1.fastq.gz / DRR147084_2.fastq.gz) which was downloaded in step 4 of the post If you want to use the data, please check it.
This example NGS data (DRR147084) is amplicon sequencing data generated with MiSeq. The sample is a 77bp fragment MMEJ-assisted knock-in sample, which contains wildtype amplicon sequence reads, indel amplicon sequence reads, unexpected knock-in sequence reads (may be generated by NHEJ), and expected knock-in amplicon sequence reads generated by MMEJ.
Here are sequnece maps.

Wildtype amplicon sequence
Zoom image of Wildtype amplicon sequence
Expected knock-in amplicon sequence

I will show the example using it.

STEP1: Searching the region of “focusing_range” using Blat

1: Create visual environment

1
conda create --name crispresso_env --channel bioconda crispresso;

2: Activate visual environment

1
source activate crispresso_env;

3: Run CRSIPResso

I applied the parameters used in the original paper to analyze the data with CRISPResso.

1
2
3
4
5
6
7
8
9
10
11
crispresso \
-r1 /Volumes/databank1/ngs/DRR147084_1.fastq.gz \
-r2 /Volumes/databank1/ngs/DRR147084_2.fastq.gz \
-a ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-g tgaagagcattcatcgtgag \
-e ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAAgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-d ggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAA \
-o /Volumes/databank1/ngs \
--name crispresso_example1 \
--hide_mutations_outside_window_NHEJ \
--hdr_perfect_alignment_threshold 99.5;

-o indicates OUTPUT_FOLDER (“/Volumes/databank1/ngs”). You can choose it freely.
The explanation of the other parameter is shown in author’s github

The result files is saved into OUTPUT_FOLDER/“crispresso_example1” folder.

The full explanation of output is described in the author’s paper.
I pick up some results.

2.Unmodified_NHEJ_HDR_pie_chart.pdf
2.Unmodified_NHEJ_HDR_pie_chart.pdf
My understanding is that “Unmodified”, “NHEJ”, “Mixed HDR-NHEJ”, and “HDR” are wildtype amplicon sequence, indel amplicon sequence, unexpected knock-in sequence (may be generated by NHEJ), and expected knock-in amplicon sequence generated by MMEJ.
The mutation rate is 10.8 + 37.1 + 14.3 = 62.2%.
For your information, the value of Cas-Analyzer and CrispRVariants was 61.0% and 61.4%, respectively, according to the following posts.

9.Alleles_around_cut_site_for_.pdf
9.Alleles_around_cut_site_for_<guide>.pdf

Quantification_of_editing_frequency.txt
Quantification_of_editing_frequency.txt

You can see the list of reads using Excel.

Alleles_frequency_table.txt
Alleles_frequency_table.txt

(Supplementary): Run CRSIPResso with TruSeq3-PE.fa

CRISPResso contains Trimmomatic to adapter trimming and quality filtering.
As I mentioned in tha past post, TruSeq3-PE.fa is used in the MiSeq machine which performed sequencing the example data. Trimmomatic privides the adapter sequence data.

I typed the previous CRISPResso command with the TruSeq3-PE.fa to get results with a more correct way.

1
2
3
4
5
6
7
8
9
10
11
12
13
crispresso \
-r1 /Volumes/databank1/ngs/DRR147084_1.fastq.gz \
-r2 /Volumes/databank1/ngs/DRR147084_2.fastq.gz \
-a ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-g tgaagagcattcatcgtgag \
-e ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAAgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-d ggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAA \
-o /Volumes/databank1/ngs \
--name crispresso_example2 \
--hide_mutations_outside_window_NHEJ \
--hdr_perfect_alignment_threshold 99.5 \
--trim_sequences \
--trimmomatic_options_string " ILLUMINACLIP:/Volumes/databank1/ngs/trimmomatic/adapters/TruSeq3-PE.fa:0:90:10:0:true MINLEN:40";

You have to remenber adding “:2:30:10 MINLEN:36” to the end of adapter sequence file path. The explanetion of “:2:30:10 MINLEN:36” is shown in the Trimmomatic manual. If you understand what it says, you can adjust the value.

The dropped sequences were only 5 reads.
Trimmomatic log in CRISPResso_RUNNING_LOG.txt

2.Unmodified_NHEJ_HDR_pie_chart.png

The classification result was almost same as the data without adapter trimming. In some published reports, the trimming was not practiced in CRISPResso. However, the impact might be not large (I do NOT mean that it is not necessary to trim adapters…).

Thank you so much…