Configuration
This page covers the container file structure, config.yaml settings, input file formats, and custom primer/index setup.
Container File Structure
The Docker and Singularity images create the following directory layout:
Within the main 16s-demux directory:
config/— configuration files, fastq file list, samplesheet, and index filesworkflow/rules/— analysis codeworkflow/out/— output directory for real analysesworkflow/test_out/— output directory for test runsfastq_data/— test sequencing data
The Snakefile and Slurm submission scripts are in the top-level 16s-demux directory.
config.yaml Reference
The config/config.yaml file controls all pipeline settings. The key fields are:
| Field | Description | Default |
|---|---|---|
samplesheet |
Path to the samplesheet file | samplesheet.txt |
indices |
Path to the index file for demultiplexing | indexfordemux.txt |
fastqlist |
Path to the fastq file list | fastq.txt |
lenR1index |
Length of the read 1 index (longest phase) | 7 |
lenR2index |
Length of the read 2 index (longest phase) | 7 |
lenR1primer |
Length of read 1 primer (gene-specific + spacer) | 23 |
lenR2primer |
Length of read 2 primer (gene-specific + spacer) | 24 |
All paths are relative to the config/ directory unless absolute paths are provided.
Input Files
Three inputs are required beyond the sequencing data itself: a fastq file list, a samplesheet, and the sequencing fastq data.
Note
Most common pipeline errors arise from input file formatting issues. The pipeline validates inputs automatically — check workflow/out/inputCheck_log.txt for any warnings.
Fastq Data
Naming rules:
- Sample names must not contain spaces, underscores, or periods (hyphens are fine)
- Rename files before demultiplexing if they violate these rules
Format:
- Gzipped fastq files (
.fastq.gz)
Location:
Fastq files can be located anywhere accessible to the container. Their paths are defined by combining the fastqdir variable in config.yaml with the paths in the fastq file list.
fastqdir (config.yaml) |
Fastq file list entry |
|---|---|
"" |
/home/users/TEST/sequencing/exp01/fastqs/TEST_R1_001.fastq.gz |
/home/users/TEST/sequencing/exp01/fastqs/ |
TEST_R1_001.fastq.gz |
Tip
Periods in sample names are accepted for demultiplexing but will cause issues with downstream tools like DADA2. Avoid them if you plan to use DADA2.
Fastq File List
Location: Place in the config/ directory and update the fastqlist field in config.yaml (default name: fastq.txt).
Format: Tab-delimited text file with three columns:
| Column | Description |
|---|---|
read1 |
Path to read 1 fastq.gz file |
read2 |
Path to read 2 fastq.gz file |
file |
Shortened file identifier |
The file column should use the format {RunName}-{round2plate}-{well}, where:
RunName— any identifier without underscores, spaces, or periodsround2plate— plate identifier for round 2 barcodeswell— well number
Example:
| read1 | read2 | file |
|---|---|---|
../fastq_data/test_inputs/KKRP-001_S441_R1_001.fastq.gz |
../fastq_data/test_inputs/KKRP-001_S441_R2_001.fastq.gz |
15mc-003-P08B01-A01 |
../fastq_data/test_inputs/KKRP-002_S442_R1_001.fastq.gz |
../fastq_data/test_inputs/KKRP-002_S442_R2_001.fastq.gz |
15mc-003-P08B01-A02 |
Important
The file field in the fastq file list must match the first part of the filename field in the samplesheet.
Samplesheet
Location: Place in the config/ directory and update the samplesheet field in config.yaml (default name: samplesheet.txt).
Format: Tab-delimited file (.tsv or .txt) with a header row. Three columns are required:
| Column | Description |
|---|---|
filename |
Format: {RunName}-{round2plate}-{well}-L{round1index} |
sample |
Final sample name after demultiplexing (no underscores or periods) |
group |
Group identifier for output organization (no spaces or slashes) |
Additional columns can be included freely — only filename, sample, and group are used by the pipeline.
The {RunName}-{round2plate}-{well} portion of filename must match the corresponding file entries in the fastq file list. The round1index value should match the phase entry in the indexfordemux.txt table.
Group behavior: Samples in different groups are output into separate subdirectories within trimmed/. If you don't need grouping, use a single group name for all samples or leave the column blank (keep the group header).
Example (key columns shown):
| filename | sample | group |
|---|---|---|
15mc-003-P08B01-A01-L4 |
P5-A01-plateP-wellA1 |
group1 |
15mc-003-P08B01-A02-L4 |
P5-A02-plateP-wellA2 |
group1 |
15mc-003-P08B01-A01-L5 |
P6-A01-plateW-wellA5 |
group2 |
15mc-003-P08B01-A02-L5 |
P6-A02-plateW-wellA6 |
group2 |
Index Files
The indexfordemux.txt file contains the inline indexes used for demultiplexing. The default file uses the 16S V4 index set.
Pre-built Index Sets
Index files for 13 regions are provided in the other_index_for_demux/ directory:
| Region | File |
|---|---|
| V1 - V2 | V1-V2_index_for_demux.txt |
| V1 - V3 | V1-V3_index_for_demux.txt |
| V2 - V3 | V2-V3_index_for_demux.txt |
| V3 | V3_index_for_demux.txt |
| V3 - V4 | V3-V4_index_for_demux.txt |
| V4 | V4_index_for_demux.txt (default) |
| V4 - V5 | V4-V5_index_for_demux.txt |
| V5 | V5_index_for_demux.txt |
| V5 - V7 | V5-V7_index_for_demux.txt |
| V6 | V6_index_for_demux.txt |
| V6 - V7 | V6-V7_index_for_demux.txt |
| V6 - V8 | V6-V8_index_for_demux.txt |
| V7 - V9 | V7-V9_index_for_demux.txt |
To use a different region, copy the appropriate file into config/ and update the indices field in config.yaml.
Tip
For all pre-built index sets, the default config.yaml length parameters are correct — no changes needed for lenR1index, lenR2index, lenR1primer, or lenR2primer.
Custom Index Files
If you are using custom primers, you need to create a custom indexfordemux.txt file and may need to update the length parameters in config.yaml.
An Excel template (other_index_for_demux.xlsx) is provided in the other_index_for_demux/ directory to help derive custom index files.
Understanding the Read Structure
The final read structure after library prep looks like this (lengths not to scale):
The round 1 indexes use variable-length "phases" (0–7). Each phase has a different number of index bases, with the remaining positions filled by spacer and gene-specific primer sequence:
| Phase | Variable FP | Variable RP |
|---|---|---|
| 0 | ATGGACT |
|
| 1 | T |
GCTAGC |
| 2 | GG |
TGACT |
| 3 | ACT |
CGGT |
| 4 | TAAC |
GTA |
| 5 | CAGTC |
AA |
| 6 | ATCGAT |
C |
| 7 | GCAAGTC |
During demultiplexing, reads are treated as having 7 base pair indexes on both ends. Any positions not filled by the actual index contain spacer or gene-specific primer sequence. In the default V4 index set:
- Underlined = actual index bases
- lowercase = spacer bases
- BOLD UPPERCASE = gene-specific primer region
| Phase | read1index | read2index | bc |
|---|---|---|---|
| 0 | cagtAGA | ATGGACT | CAGTAGAATGGACT |
| 1 | TcagtAG | GCTAGCa | TCAGTAGGCTAGCA |
| 2 | GGcagtA | TGACTat | GGCAGTATGACTAT |
| 3 | ACTcagt | CGGTatc | ACTCAGTCGGTATC |
| 4 | TAACcag | GTAatcc | TAACCAGGTAATCC |
| 5 | CAGTCca | AAatccT | CAGTCCAAAATCCT |
| 6 | ATCGATc | CatccTA | ATCGATCCATCCTA |
| 7 | GCAAGTC | atccTAC | GCAAGTCATCCTAC |
Changing Only the Gene-Specific Region
If you use the same primer design and index scheme but target a different gene, only the gene-specific regions within the indexes need to change.
Example: For the default V4 primers, the gene starts with AGA... and ends with GTA on the forward strand, giving regions of homology AGA and TAC (both 5' to 3').
For a gene reading ATG ... CGT, the regions of homology become ATG and ACG (both 5' to 3'). The updated index table would be:
| Phase | read1index | read2index | bc |
|---|---|---|---|
| 0 | cagtATG | ATGGACT | CAGTAGAATGGACT |
| 1 | TcagtAT | GCTAGCa | TCAGTAGGCTAGCA |
| 2 | GGcagtA | TGACTat | GGCAGTATGACTAT |
| 3 | ACTcagt | CGGTatc | ACTCAGTCGGTATC |
| 4 | TAACcag | GTAatcc | TAACCAGGTAATCC |
| 5 | CAGTCca | AAatccA | CAGTCCAAAATCCT |
| 6 | ATCGATc | CatccAC | ATCGATCCATCCTA |
| 7 | GCAAGTC | atccACG | GCAAGTCATCCTAC |
Only the gene-specific regions (bold) have changed — index and spacer sequences remain identical.
Mixed Base Characters
Avoid mixed base characters such as W or N in the first three positions of your gene-specific primer. If such bases are present, you must include extra entries in the indexfordemux.txt table — one for each possible base (e.g., for W, one entry with A and one with T). Each affected sample should appear twice in the samplesheet, and the output files will need to be merged downstream.
Updating Length Parameters
If your custom primers change the index or primer lengths, update config.yaml:
| Parameter | Description | Default |
|---|---|---|
lenR1index |
Longest read 1 index length | 7 |
lenR2index |
Longest read 2 index length | 7 |
lenR1primer |
Gene-specific primer + spacer length (read 1) | 23 |
lenR2primer |
Gene-specific primer + spacer length (read 2) | 24 |
Note
For all pre-built 16S index sets (V1–V2 through V7–V9), the default length values are correct and do not need to be changed.


