How to predict microhomology using Microhomology-Predictor

Writer: Kazuki Nakamae

Hi sir,
Happy New Year!

Today, I introduce Microhomology-Predictor that predict microhomology site used for MMEJ, which estimate MMEJ-based deletion pattrns and frequency of frame-shift.

Citation info: Bae, Sangsu, et al. “Microhomology-based choice of Cas9 nuclease target sites.” Nature methods 11.7 (2014): 705..
Note: I am NOT the author, just the user.

I uploaded the video. You can check it.

Microhomology-Predictor resovgnizes the center of the entered sequence as cut site by RGEN (e.g. CRISPR-Cas9). And, the PAM-proximal region was right half-site in this example.

The result contains Microhomology Score and Out-of-frame Score. If these score are high, the frequencies of MMEJ and frame-shift are expected to become high.
If you want to get frame-shift mutation, a target where the Out-of-frame Score is >66 is better for that.

When you click the target, the predicted pattrns, including microhomology sequence and deletion length,will be shown.

Thank you so much…

How to design knock-in donor using GTagHD

Writer: Kazuki Nakamae

Hi sir,
How are you?

Today, I introduce GTagHD that helps knock-in design using short homology arms. This software can design knock-in donors even if you do not have sequence information of the interested genomic site.

Citation info: Wierson, Wesley A., et al. “GeneWeld: a method for efficient targeted integration directed by short homology.” bioRxiv (2018): 431627..
Note: I am NOT the author, just the user.

I uploaded the video. You can check it.

When you use it, the UCSC genome browser is helpful, because it can provide potential sgRNAs and RefSeq Gene ID of targeting gene. In this example, I design the knock-in donor sequence for VEGFA.

Thank you so much…

How to find sgRNA target sequences using CRISPOR

Writer: Kazuki Nakamae

Hi sir,
What are you planning in the winter vacation?

Today, I introduce CRISPOR that accommodates on/off-target scoring algorithms. And also CRISPOR can provide protocols, primer sequences, information of off-target sites, and analysis commands of CRISPResso, which meet the sgRNA you chose.

Citation info: Haeussler, Maximilian, et al. “Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR.” Genome biology 17.1 (2016): 148..

I uploaded the video. You can check it.

I usually enter the sequence name because an error occurs unless I enter it.
In this video, I used the VEGFA sequence of hg19. If you want to know how to get the information, you can check the previous article.

Thank you so much…

How to obtain gene information of hg19 from NCBI

Hi sir,
I am Kazuki Nakamae from Japan.

In genome analysis, software sometimes requires gene information of hg19. What is hg19? You can see the explain in the link

I introcuce how to obtain a gene infromation of hg19 using NCBI database.
Please watch the following video.

In this video, I obtained the Human VEGFA information of hg19. I chose “GRCh37.p13 Primary Assembly” at 1:00 of the video. It is important because GRCh37.p13 Primary Assembly is almost equal to hg19. You can see the link.

Thank you so much…

How to find sgRNA target sequences using CRISPRdirect

Hi sir,
How is the weekend?

Today, I introduce CRISPRdirect which is a quite fast search tool of sgRNA. Now, CRISPRdirect supports many (>200) species. If you design sgRNA targeting the genome of organisms that are not a model organism, this tool would be helpful.

Citation info: Naito, Yuki, et al. “CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites.” Bioinformatics 31.7 (2014): 1120-1123..

An introduction video of CRISPRdirect has been uploaded by the author group on YouTude . Please check it.
“CRISPRdirect: designing CRISPR/Cas guide RNA sequence” on YouTude

You can see GGGenome also. This is a very fast DNA search tool in the >200 species as well as CRISPRdirect. This system shows the specificity of the target sequence including mismatch sequences. In my opinion, this is useful for designing primers.

The introduction video by the author group
“GGGenome: a fast and simple DNA sequence search engine” on YouTude

Thank you so much…

How to find sgRNA target sequences using CHOPCHOP

Hi sir,
I am fine.

Today, I introduce CHOPCHOP which is web-based search tool of CRISPR–Cas single guide RNA (sgRNA) targets.

Citation info: Labun, Kornel, et al. “CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing.” Nucleic acids research (2019)..

Thanks to many updates and author’s efforts, CHOPCHOP accommodates various sgRNA designs such as knock-out, knock-down, and knock-in…
In this post, I put a simple example. Please watch the following video.

I strongly recommend choosing the specific target where MM(MisMatch)0 is 0. And also, you can choose the target with MM1=0, MM2=0, and MM3=0 as long as you can.
CHOPCHOP can offer primer set for further analysis by deep sequencing or a T7E1 assay, and it can predict indel frequency based on Shen et al. 2018. It would be helpful for your experiment.

Thank you so much…

How to analyze CRISPR-Cas9 sample using CRISPResso in Anaconda environment

Hi sir,
I feel like eating satsuma due to Japanese cool climate. What is traditional winter fruit in your living place?

Anyway, today, I introduce how to analyze the CRISPR-Cas9 sample using CRISPResso. which is a user-friendly NGS analysis tool. This is specialized for analyzing genome editing data generated with CRISPR.

Citation info: Pinello, Luca, et al. “Analyzing CRISPR genome-editing experiments with CRISPResso.” Nature biotechnology 34.7 (2016): 695..

I have to note that this is an old version of CRISPResso. The author does not recommend to use it. However, it has been used in several papers for a few years. It can be used to reproduce the data and perform the comparison.

I use the published raw NGS amplicon data (DRR147084_1.fastq.gz / DRR147084_2.fastq.gz) which was downloaded in step 4 of the post If you want to use the data, please check it.
This example NGS data (DRR147084) is amplicon sequencing data generated with MiSeq. The sample is a 77bp fragment MMEJ-assisted knock-in sample, which contains wildtype amplicon sequence reads, indel amplicon sequence reads, unexpected knock-in sequence reads (may be generated by NHEJ), and expected knock-in amplicon sequence reads generated by MMEJ.
Here are sequnece maps.

Wildtype amplicon sequence
Zoom image of Wildtype amplicon sequence
Expected knock-in amplicon sequence

I will show the example using it.

STEP1: Searching the region of “focusing_range” using Blat

1: Create visual environment

1
conda create --name crispresso_env --channel bioconda crispresso;

2: Activate visual environment

1
source activate crispresso_env;

3: Run CRSIPResso

I applied the parameters used in the original paper to analyze the data with CRISPResso.

1
2
3
4
5
6
7
8
9
10
11
crispresso \
-r1 /Volumes/databank1/ngs/DRR147084_1.fastq.gz \
-r2 /Volumes/databank1/ngs/DRR147084_2.fastq.gz \
-a ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-g tgaagagcattcatcgtgag \
-e ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAAgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-d ggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAA \
-o /Volumes/databank1/ngs \
--name crispresso_example1 \
--hide_mutations_outside_window_NHEJ \
--hdr_perfect_alignment_threshold 99.5;

-o indicates OUTPUT_FOLDER (“/Volumes/databank1/ngs”). You can choose it freely.
The explanation of the other parameter is shown in author’s github

The result files is saved into OUTPUT_FOLDER/“crispresso_example1” folder.

The full explanation of output is described in the author’s paper.
I pick up some results.

2.Unmodified_NHEJ_HDR_pie_chart.pdf
2.Unmodified_NHEJ_HDR_pie_chart.pdf
My understanding is that “Unmodified”, “NHEJ”, “Mixed HDR-NHEJ”, and “HDR” are wildtype amplicon sequence, indel amplicon sequence, unexpected knock-in sequence (may be generated by NHEJ), and expected knock-in amplicon sequence generated by MMEJ.
The mutation rate is 10.8 + 37.1 + 14.3 = 62.2%.
For your information, the value of Cas-Analyzer and CrispRVariants was 61.0% and 61.4%, respectively, according to the following posts.

9.Alleles_around_cut_site_for_.pdf
9.Alleles_around_cut_site_for_<guide>.pdf

Quantification_of_editing_frequency.txt
Quantification_of_editing_frequency.txt

You can see the list of reads using Excel.

Alleles_frequency_table.txt
Alleles_frequency_table.txt

(Supplementary): Run CRSIPResso with TruSeq3-PE.fa

CRISPResso contains Trimmomatic to adapter trimming and quality filtering.
As I mentioned in tha past post, TruSeq3-PE.fa is used in the MiSeq machine which performed sequencing the example data. Trimmomatic privides the adapter sequence data.

I typed the previous CRISPResso command with the TruSeq3-PE.fa to get results with a more correct way.

1
2
3
4
5
6
7
8
9
10
11
12
13
crispresso \
-r1 /Volumes/databank1/ngs/DRR147084_1.fastq.gz \
-r2 /Volumes/databank1/ngs/DRR147084_2.fastq.gz \
-a ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-g tgaagagcattcatcgtgag \
-e ttacatgatgcagaaagttgatatccctccgcttcttactctttttttttttctcccccatcatacaggtgaatatgaccatctcccagaacaggccttctatatggtgggacccattgaagaagctgtggcaaaagctgataagctggctgaagagcattcatcgtggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAAgaggggtctttgtcctctgtactgtctctctccttgcccctaacccaaaaagcttcatttttctgtgtaggctgcacaagagccttgattgaagatatattctttctgaacagtatttaaggtttccaataaaatgtacacccctcagaatttgtctgattctcttggtt \
-d ggGGATCCGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGTAA \
-o /Volumes/databank1/ngs \
--name crispresso_example2 \
--hide_mutations_outside_window_NHEJ \
--hdr_perfect_alignment_threshold 99.5 \
--trim_sequences \
--trimmomatic_options_string " ILLUMINACLIP:/Volumes/databank1/ngs/trimmomatic/adapters/TruSeq3-PE.fa:0:90:10:0:true MINLEN:40";

You have to remenber adding “:2:30:10 MINLEN:36” to the end of adapter sequence file path. The explanetion of “:2:30:10 MINLEN:36” is shown in the Trimmomatic manual. If you understand what it says, you can adjust the value.

The dropped sequences were only 5 reads.
Trimmomatic log in CRISPResso_RUNNING_LOG.txt

2.Unmodified_NHEJ_HDR_pie_chart.png

The classification result was almost same as the data without adapter trimming. In some published reports, the trimming was not practiced in CRISPResso. However, the impact might be not large (I do NOT mean that it is not necessary to trim adapters…).

Thank you so much…

How to analyze CRISPR-Cas9 NGS sample using Cas-Analyzer

Hi sir,
I have a headache. I took Pocari Sweat and the medicine. I wish to be better…

Anyway, today, I introduce Cas-analyzer which is a web-based NGS analysis tool. This is specialized for analyzing knock-out/knock-in data generated with CRISPR.

Citation info: Park J. et al. Cas-Analyzer: an online tool for assessing genome editing results using NGS data. Bioinformatics 33, 286-288 (2017).

I use the published filtered NGS amplicon data (qfltr_DRR147084_1.fastq / qfltr_DRR147084_2.fastq) which was generated in the step 6 of the post

The data is a knock-in sample of ATP5B locus in a human cell according to the paper. You can see the sequence infromation in the “Methods” block.

We show the analysis example. Please watch the following video.

Donor sequence is optional parameter. If you analize knock-out data, the information is not required.

As I showed the result in this video, the indel frequency is 61.0%. CrispRVariants showed that mutation efficiency of the same sample was 61.37% in the past post.
I feel that the algorithm of Cas-Analyzer is quite simple, but it is effective.

Thank you so much…

How to analyze CRISPR-Cas9 sample using CrispRVariants

Hi sir,
Winter is comming. I am thinking about setting kotatsu.

Today, I introduce how to analyze CRISPR-Cas9 sample using CrispRVariants. which is a accurate NGS analysis tool. This is specialized for analyzing genome editing data.

Citation info: Lindsay, Helen, et al. “CrispRVariants charts the mutation spectrum of genome engineering experiments.” Nature biotechnology 34.7 (2016): 701..

In previous posts, I introduced how to map NGS data using BWA-MEM. I will use this bam file to introduce it. If you have not seen it, you can check the post before.

I will show an alignment map of the region indicated as the following “focusing_range” in the wildtype amplicon sequence.

Whole sequence of amplicon
Focusing range

OK, I will put a simple example…

STEP1: Searching the region of “focusing_range” using Blat

1: Create visual environment

1
conda create --name blat_env --channel bioconda blat;

2: Activate visual environment

1
source activate blat_env;

3: Make this fasta file as focusing_range.fa

1
2
>focusing_range
gtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccc

The above sequence is saved as focusing_range.fa into /Volumes/databank1/ngs/.

4: Searching the region of “focusing_range” using Blat

1
blat /Volumes/databank1/ngs/hg38.fa /Volumes/databank1/ngs/focusing_range.fa -minMatch=0 -minScore=80 /Volumes/databank1/ngs/focusing_range.fa.psl;

The parameter of -minScore is to be the sequence length of focusing_range.fa

When the below sequence is shown, the process is end. focusing_range.fa.psl is automatically Saved in /Volumes/databank1/ngs/.

1
2
Loaded 3088269832 letters in 24 sequences
Searched 80 bases in 1 sequences

5: Make tsv file to open the result of blat (focusing_range.fa.psl)

1
cp /Volumes/databank1/ngs/focusing_range.fa.psl /Volumes/databank1/ngs/focusing_range.fa.psl.tsv;

Open /Volumes/databank1/ngs/focusing_range.fa.psl.tsv using Excel or the other spreadsheet.
You have to memorilize the value of highlighted cells (strand : - / T name : chr12 / T start : 56638284 / T end : 56638364)

Result of Blat

STEP2: Install CrispRVariants using RStudio.

CrispRVariants sometimes fails to be installed now (at 2019/11/07). Especially, Anaconda version is unstable. I hope that it will be improved.
Anyway, to install CrispRVariants, you can do the following procedure in video. It is important to type no when the RStudio asked you Do you want to install from sources the packages which need compilation? (Yes/no/cancel).

STEP3: Analyzing CRISPR-Cas9 sample using CrispRVariants

Open the RStudio.app, and type the following script.

6: Install and load packages

1
2
3
4
5
6
7
8
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("CrispRVariants")
BiocManager::install("GenomicRanges")

library("CrispRVariants")
library("GenomicRanges")

7: Name the analysis

1
2
# Name of this analysis (Free)
dataname.char <- "my NGS data"

8: Enter the file path of the bam file generated using BWA-MEM

1
2
# Place of your bam file
bam.file.path <- "/Volumes/databank1/ngs/out.extendedFrags.fastq.bam"

9: Enter the sequence of focusing_range

1
2
# Nuclease target site (This is protospacer sequence in CRISPR-Cas9.)
focusing.sequence.char <- "gtggcaaaagctgataagctggctgaagagcattcatcgtgaggggtctttgtcctctgtactgtctctctccttgcccc"

10: Enter the position of cut point in focusing_range

1
2
# Cut point in the focusing range (5'->3')
target.location <- 40

11: Enter the range where you consider a mismatch base as Single Nucleotide Variants

1
2
3
# How long region (upstream (5'side from the cut site) / downstream (3'side from the cut site)) do you consider as Single Nucleotide Variants?
upstream.snv.num <- 17
downstream.snv.num <- 6

12: Enter the flag whether you can treat chimera sequence.

1
2
3
# Flag to determine how chimeric reads are treated. One of "ignore", "exclude",and "merge".
# I reccomend to use "ignore" in first time.
treat.chimeras <- "ignore"

The function recognizing a chimeric sequence is outstanding advantage of CrispRVarinats.
However, I recommend to ignore that in the first time because recognizing a chimeric sequence cause the loss of detected reads.

13: Enter the information of focusing_range.fa.psl

1
2
3
4
blat.strand <- "-"
blat.T.name <- "chr12"
blat.T.start <- 56638284
blat.T.end <- 56638364

14: Create the GRange object of focusing_range sequence

1
focusing.grange = GRanges(blat.T.name, blat.strand, ranges = IRanges(blat.T.start + 1, blat.T.end))

There is a difference between GRange and range of the Blat result. I adjusted it by adding + 1 to blat.T.start.

15: Create crispr set object

1
2
3
4
5
6
7
crispr.set <- readsToTarget(bam.file.path
, focusing.grange, reference = focusing.sequence.char
, target.loc = target.location
, names = dataname.char
, upstream.snv = upstream.snv.num
, downstream.snv = downstream.snv.num
, chimeras = treat.chimeras)

output

1
2
3
4
5
6
7
8
9
10
Read 158666 alignments, excluded 0

147559 of 158666 nonchimeric reads span the target range

narrowing alignments


Initialising CrisprRun my NGS data

Initialising CrisprSet chr12:56638285-56638364 with 1 samples

16: Create alignment map

1
plotVariants(crispr.set)

Alignment map

17: Estimate the editing efficiency

1
mutationEfficiency(crispr.set)

output

1
2
my NGS data     Average      Median     Overall       StDev   ReadCount 
61.37 61.37 61.37 61.37 NA 147559.00

The editing efficiency is 61.37%. SNV is not counted into the efficiency.
In this sample, there is only one sample type. So, The “Average” is same as “my NGS data”.

18: Show the consensus sequences of variant alleles

1
consensusSeqs(crispr.set)

output

1
2
3
4
5
6
7
8
9
10
11
12
13
  A DNAStringSet instance of length 629
width seq names
[1] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC no variant
[2] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAATGCCCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:-12G
[3] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGAGGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:-5C
[4] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTAACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:1T
[5] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCTCACGATGAAGGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC SNV:-9C
... ... ...
[625] 123 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCT...CGATGAATGCTCTTCAGCCCGCTTATCAGCTTTTGCCAC -1:43I
[626] 136 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACACAT...CGATGAATGCTCTTCTGCCAGCTTATCAGCTTTTGCCAC 2:1D,6:57I
[627] 123 GGGGCAAGGAGAGAGACAGTACAGAGGACAAAGACCCCG...CGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC 3:43I
[628] 175 GGGGCAAGGAGAGAGACAGTACAGCAGACAAAGACCCCT...CAGCTTATCAGCTTTAGCCAGATTATCATCTTTTGCCAC -30:18I,1:77I
[629] 80 GGGGCAAGGAGAGAGACAGTACAGAGGACAAGACCCCTCAACGATGAATGCTCTTCAGCCAGCTTATCAGCTTTTGCCAC 1:1I,11:1D

You can save the full sequence infromation using the following command

1
write.csv(as.data.frame(consensusSeqs(crispr.set)), file.path("/Volumes/databank1/ngs/", "consensus.seqs.csv"))

19: Show the count data of variant alleles

1
variantCounts(crispr.set)

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
                           my NGS data
no variant 55017
SNV:-12G 136
SNV:-5C 104
SNV:1T 86
SNV:-9C 69
SNV:-17G 65
...
1:1I 44667
2:77I 26772
-1:1D 1491
-10:13D 1417
1:7D 892
...
-1:43I 1
2:1D,6:57I 1
3:43I 1
-30:18I,1:77I 1
1:1I,11:1D 1

You can save the full variants infromation using the following command

1
write.csv(as.data.frame(variantCounts(crispr.set)), file.path("/Volumes/databank1/ngs/", "variantcounts.csv"))

20: Create barplot of variants classification by size

1
barplotAlleleFreqs(variantCounts(crispr.set))

Alignment map

This is a simple example. You can change the procedure.

Environment

Hardware

  • MacBook Air (2017 model)
  • Processor 1.8 GHz Intel Core i5
  • Memory 8GB
  • 500 GB external hard drive

Software

  • macOS Catalina
  • GNU bash version : 3.2.57(1)-release (x86_64-apple-darwin18)

Anaconda

  • conda version : 4.7.11
  • conda-build version : 3.17.6
  • python version : 2.7.15.final.0
  • Standalone BLAT v. 36

RStudio

  • R version 3.6.1 (2019-07-05)
  • Platform: x86_64-apple-darwin15.6.0 (64-bit)
  • RStudio Version 1.2.5019

Thank you so much…

How to trim adapters using Trimmomatic

Hi sir,
How are you?

Today, I introduce how to trim adapters using Trimmomatic.

Citation info: Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.

Honestly, I cannot force you to do it because we can map NGS data without trimming adapters.

But, this article and discussion in Biosters are interesting. Let see it.

Anyway, I will put a simple example of trimming adapters using Trimmomatic and the published amplicon sequencing data.

STEP1: Creating a visual environment

1
2
3
4
5
6
conda create --name trimmomatic_env;
source activate trimmomatic_env;

conda config --add channels bioconda;
conda install --channel bioconda sra-tools trimmomatic;
conda config --remove channels bioconda;

STEP2: Downloading the example data.

*In this example, I use published amplicon sequence data. If you have your FASTQ files, skip this section.

1
fastq-dump DRR147084 --split-files --gzip --outdir /Volumes/databank1/ngs; # Download from SRR(DRR)

STEP3: Preparing the adapter sequences

You have to get FASTA sequences of adapters to use Trimmomatic. Please check the Authors Page.
According to that, TruSeq3-PE.fa is used in the MiSeq machine which performed sequencing the example data.

Download the Trimmomatic data from GitHub.

1
git clone https://github.com/timflutre/trimmomatic /Volumes/databank1/ngs/trimmomatic;

“TruSeq3-PE.fa” exists in /Volumes/databank1/ngs/trimmomatic/adapters/.

STEP4: Executing Trimmomatic to trim the adapters

1
2
3
4
5
6
trimmomatic PE -phred33 \
-trimlog log.txt /Volumes/databank1/ngs/DRR147084_1.fastq.gz /Volumes/databank1/ngs/DRR147084_2.fastq.gz \
/Volumes/databank1/ngs/paired_DRR147084_1.fq /Volumes/databank1/ngs/unpaired_DRR147084_1.fq \
/Volumes/databank1/ngs/paired_DRR147084_2.fq /Volumes/databank1/ngs/unpaired_DRR147084_2.fq \
ILLUMINACLIP:/Volumes/databank1/ngs/trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 \
LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:36;

“paired_DRR147084_1.fq”, “paired_DRR147084_2.fq” are the trimmed FASTQ files.

Thank you so much…