10  Week 3: Exercises

10.1 In-class

  1. Pull the samtools Docker container:
module load Apptainer/1.1.6
apptainer pull docker://biocontainers/samtools:v1.9-4-deb_cv1
  1. Open a shell in the Docker container while in the bash_for_bio/ directory:
apptainer shell --shell /bin/bash docker://biocontainers/samtools:v1.9-4-deb_cv1

Try accessing files:

ls -lh .

This doesn’t work for the other directories, such as /fh/fast/. Try accessing files in fast: /fh/fast/ directory:

ls -l /fh/fast/

That didn’t work - exit the shell:

exit
  1. Bind your directory so that the Docker container can see it:
apptainer shell --bind /fh/fast:/fast docker://biocontainers/samtools:v1.9-4-deb_cv1

Try accessing files in the mounted fast/ bind path:

ls -l fast/
exit
  1. Try running the command using apptainer run:
apptainer exec  --bind /home/tladera2/bash_for_bio:/bash_for_bio \
 docker://biocontainers/samtools:v1.9-4-deb_cv1 samtools \
 view -c /bash_for_bio/data/MOLM13_combined_final.sam

10.2 Homework

  1. Adapt the for loop in this script to use apptainer exec. You can use an ubuntu container for this (docker://ubuntu:resolute-20260413). Pull the container before you run the code.
#!/bin/bash
for file in ./data/*.fastq
do
  wc $file
done
  1. Modify run_bwa.sh in week3/ to use apptainer for bwa (Use docker://biocontainers/bwa:v0.7.17_cv1).

Hint 1: you will need to load Apptainer in the script, and use apptainer exec. to run bwa mem

Hint 2: To make things easier, pull the bwa container first.

Hint 3: Mount the index file’s directory (/shared/biodata/reference/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/) using --bind (you’ll also have to change the ref_fasta_local variable to match it)

#!/bin/bash
module load BWA/0.7.17-GCCcore-11.2.0
input_fastq=${1}
# strip path and suffix
base_file_name="${input_fastq%.fastq}"
base_file_name=${base_file_name##*/}
echo "running $input_fastq"
sample_name="SM:${base_file_name}"
read_group_id="ID:${base_file_name}"
platform_info="PL:Illumina"
ref_fasta_local="/shared/biodata/reference/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa"

bwa mem \
      -p -v 3 -M \
      -R "@RG\t${read_group_id}\t${sample_name}\t${platform_info}" \
      "${ref_fasta_local}" "${input_fastq}" > \
      "${base_file_name}.sam"

module purge

Run the run_bwa.sh script on one of the files to ensure that it works.

Try using week3/run_sbatch.sh on the files in the data/ directory. Were there any modifications you needed to make to run_sbatch.sh?