4 Week 2 Exercises

4.1 In-class exercises

We can run the script on a file (assuming we are in bash_for_bio/):

./scripts/week2/sam_count.sh ./data/my_file.bam

Try out the above script on ./data/CALU1_combined_final.sam.

Make sure you are in the top folder (bash_for_bio) when you do it.

Run run_bwa.sh using:

./scripts/week2/run_bwa.sh ./data/CALU1_combined_final.fasta

Run process_data.R:

module load fhR
Rscript process_data.R input_file="LUSC_clinical.csv"
module purge

Run process_data.py:

module load fhPython
python3 process_file.py lusc_file.csv
module purge

4.2 Homework Exercises

All exercises are required for the badge. Where possible, please paste your code in the grey boxes and output below that. (If the output is long, then just the first few lines is fine.) Make sure to answer the questions.

Copy the below script into a file called samtools_count.sh. What does the script do?

#!/bin/bash
module load SAMtools
samtools view -c $1

How would we modify the above script to redirect the output to ${1}.counts.txt?

Make sure you can get scripts/week2/run_bwa.sh to run on rhino. Run it on data/MOLM13_combined_final.fastq.

When it’s successful, run head on the MOLM113_combined_final.fastq output, and paste your command and the output below.

Modify scripts/week2/run_bwa.sh to
1. Take an additional argument, a folder path
2. Save the SAM file to this folder path

Hint 1: I recommend that you copy run_bwa.sh into a new script and work from there. Put your code into the codeblock below:

Write an example to run your new version of the script below and run it:

For question 4, pick one language to answer.

4R. (R) Modify the below R script and save it to a file called scripts/week2/r_csv_script.R. It should also take an argument called $FILEPATH for read.csv():

library(tidyverse)
csv_file <- read_csv("myfile.csv")
summary(csv_file)

How would you run this on the command line?

module load fhR
-----
module purge

How would you redirect the output of your script to a file?

4Py. (py) Modify the below Python script to be runnable and save it to scripts/week2/py_csv_script.py. Your new version should also take the first position argument (a file path) and process the file:

#| eval: false
import pandas as pd
csv_file = pd.read_csv("my_file.csv")
csv_file.describe()

How would you run this on the command line?

module load fhPython
-----
module purge

How would you redirect the output of your script to a file?