Working with Files
Overview
Teaching: 0 min
Exercises: 0 minQuestions
Key question
Objectives
Use the wild card shortcut
Use the history command to view recently used commands
View, copy, move, and create files
Working with Files
Our data set: FASTQ files
We did an experiment and want to look at sequencing results. We want to be able to look at these files and do some things with them.
Wild cards
Navigate to the ~/dc_sample_data/data/untrimmed_fastq directory. This
directory contains our FASTQ files.
The * character is a shortcut for “everything”. Thus, if
you enter ls *, you will see all of the contents of a given
directory.
Now try this command:
ls *fastq
This lists every file that ends with a fastq.
This command:
ls /usr/bin/*.sh
Lists every file in /usr/bin that ends in the characters .sh.
ls *977.fastq
lists only the file that ends with 977.fastq
So how does this actually work? Well…when the shell (bash) sees a
word that contains the * character, it automatically looks for filenames
that match the given pattern.
We can use the command echo to see wilcards are they are intepreted by the shell.
echo *.fastq
SRR097977.fastq SRR098026.fastq
The * is expanded to include any file that ends with .fastq.
Exercise
Do each of the following tasks from your current directory using a single ls command.
- List all of the files in
/binthat start with the letter ‘c’ - List all of the files in
/binthat contain the letter ‘a’ - List all of the files in
/binthat end with the letter ‘o’
BONUS: List all of the files in ‘/bin’ that contain the letter ‘a’ or the letter ‘c’
HINT: This requires a Unix wildcard that we haven’t talked about yet. Trying searching the internet for information about Unix wildcards to find what you need to solve the bonus problem.
Command History
You can easily access previous commands. Hit the up arrow. Hit it again. You can step backwards through your command history. The down arrow takes your forwards in the command history.
^-C will cancel the command you are writing, and give you a fresh prompt.
^-R will do a reverse-search through your command history. This
is very useful.
You can also review your recent commands with the history command. Just enter:
history
to see a numbered list of recent commands, including this just issues
history command. You can reuse one of these commands directly by
referring to the number of that command.
If your history looked like this:
259 ls *
260 ls /usr/bin/*.sh
261 ls *R1*fastq
then you could repeat command #260 by simply entering:
!260
(that’s an exclamation mark). You will be glad you learned this when you try to re-run very complicated commands.
Exercise
Find the line number in your history for the command that listed all the files in /bin.
Examining Files
We now know how to switch directories, run programs, and look at the contents of directories, but how do we look at the contents of files?
The easiest way to examine a file is to just print out all of the
contents using the program cat.
Enter the following command:
cat SRR098026.fastq
This prints out the all the contents of the the SRR098026.fastq to the screen.
Exercises
-
Print out the contents of the
~/dc_sample_data/untrimmed_fastq/SRR097977.fastqfile. What does this file contain? -
From your home directory, without changing directories, use one short command to print the contents of all of the files in the
/home/dcuser/dc_sample_data/untrimmed_fastqdirectory.
cd ~/dc_sample_data/untrimmed_fastq
cat is a terrific program, but when the file is really big, it can
be annoying to use. The program, less, is useful for this
case.
Enter the following command:
less SRR098026.fastq
less opens the file as read only, and lets you navigate through it. The naviation commands
are identical to the man program.
Some navigation commands in less
| key | action |
|---|---|
| “space” | to go forward |
| “b” | to go backwarsd |
| “g” | to go to the beginning |
| “G” | to go to the end |
| “q” | to quit |
less also gives you a way of searching through files. Just hit the
“/” key to begin a search. Enter the word you would like
to search for and press “enter”. The screen will jump to the next location where
that word is found. Try searching the dictionary.txt file for the
word “cat”.
Shortcut: If you hit “/” then “enter”, less will repeat
the previous search. less searches from the current location and
works its way forward. Note, if you are at the end of the file and search
for the word “cat”, less will not find it. You need to go to the
beginning of the file and search.
For instance, let’s search the file we have open for the sequence GTGCGGGCAATTAACAGGGGTTCAC.
You can see that we go right to that sequence and can see
what it looks like.
Remember, the man program actually uses less internally and
therefore uses the same commands, so you can search documentation
using “/” as well!
There’s another way that we can look at files, and in this case, just look at part of them. This can be particularly useful if we just want to see the beginning or end of the file, or see how it’s formatted.
The commands are head and tail and they let you look at
the beginning and end of a file, respectively.
head SRR098026.fastq
tail SRR098026.fastq
The -n option to either of these commands can be used to print the
first or last n lines of a file. To print the first/last line of the
file use:
head -n 1 SRR098026.fastq
tail -n 1 SRR098026.fastq
Creating, moving, copying, and removing
Now we can move around in the file structure, look at files, search files, redirect. But what if we want to do normal things like copy files or move them around or get rid of them. Sure we could do most of these things without the command line, but what fun would that be?! Besides it’s often faster to do it at the command line, or you’ll be on a remote server like Amazon where you won’t have another option.
Copying
Our raw data in this case is fastq files. We don’t want to change the original files, so let’s make a copy to work with.
Lets copy the file using the cp command. The cp
command backs up the file.
Navigate to the data directory and enter:
cp SRR098026.fastq SRR098026-copy.fastq
ls -F
SRR097977.fastq SRR098026-copy.fastq SRR098026.fastq
Now SRR098026-copy.fastq has been created as a copy of SRR098026.fastq
Let’s make a backup directory where we can put this file.
Creating Directories
The mkdir command is used to make a directory. Just enter mkdir
followed by a space, then the directory name.
mkdir backup
Moving / Renaming
We can now move our backed up file in to this directory. We can
move files around using the command mv.
Enter this command:
mv *-copy.fastq backup
ls -al backup
total 52
drwxrwxr-x 2 dcuser dcuser 4096 Jul 30 15:31 .
drwxr-xr-x 3 dcuser dcuser 4096 Jul 30 15:31 ..
-rw-r--r-- 1 dcuser dcuser 43421 Jul 30 15:28 SRR098026-copy.fastq
The mv command is also how you rename files. Since this file is so
important, let’s rename it!
Type:
cd backup
mv SRR098026-copy.fastq SRR098026-copy.fastq_DO_NOT_TOUCH!
ls
SRR098026-copy.fastq_DO_NOT_TOUCH!
Removing
Finally, we decided this was silly and want to start over.
Type:
rm backup/SRR*
The rm file permanently removes the file. Be careful with this command. It doesn’t
just nicely put the files in the Trash. They’re really gone.
By default, rm, will NOT delete directories. You can tell rm to
delete a directory using the -r option. Let’s delete that new directory
we just made.
Enter the following command:
rm -r backup
Exercise
Do the following:
- Create a backup of your SRR097977.fastq file in the directory containing the original file.
- Move the backup copy to the backup directory.
- Rename the backup copy of your file.
Key Points
First key point.