In the course of doing various projects at work I’ve occasionally needed to do some tasks more advanced than just moving files in the Linux environment. Some, for example, end up needing to run reports, or script something to run automatically. Obviously any time you have a repettitive task you would be best off to automate it to the best of your ability.
This tutorial / explanation is a bit long, so I cut it onto a separate page…
This afternoon I was pulling running some reports for work. In order to do this I have a number of scripts which set up the report files and save the reports to a particular directory. Let me just point something out, we use Oracle, so that means PL/SQL, so my SQL comments may be specific to that environment. When using the PL/SQL shell from the command line there are a number of limitations you won’t find in Aqua Data Studio, but also some parts that give you more flexibility. For reporting I need to end up with a line with no whitespace at the end, but I also don’t want a record to be on more than one line.
The end result is that each report shows this basic structure, I’m going to comment in SQL style comments, “–”:
set lin 500 –I know that my results won’t be more than 500 characters long. “set lin NUMBER” sets the line size that PL/SQL will operate on.
set head off –This turns the column headers off, otherwise they’d appear every 60 some odd rows, which is a problem for people processing the reports in excel.
spool filename.txt –this starts collecting the output of the query I run and saving it to the specified file
select field1||’,'||field2||’,'||field3 from viewortablename where whereclause
spool off –turns off the spooling.
The reason we have the fields separated by ||’,'|| is so that the fields are all separated by a comma. This makes it possible for excel or other spreadsheet program to import the files. In reality, we tend to use ||’”|”‘|| because the “|” (pipe) delimiter is more unique, and will provide more consistent results. The extra quote function to go around the fields so a row out end up looking like:
“result field one”|”result field two”|”result field three”
This leaves us with another problem. The line length is 500, but that string above only contains 58 characters, including whitespace. One of the pitfalls of using “spool” is that it takes the line length and writes each line to that length. So for the example above we end up with 442 empty spaces on the line before the line breaks and continues on.
In the past I used a rather inefficient method to fix this. I would open all of the files at once in VI and do a find and replace, then write and move on to the next file. For this particular find and replace I would enter command mode in VI and do this:
:%s/ //g
In case you are wondering how this breaks down, the colon (:) starts the command portion, then you can type a command. The percent sign tells it to look at all the lines in the file, without it you could put a particular line number or without a line number it would only look at the line you are working with. The lower case “s” starts a search and replace function. Between the first two slashes is the string to search for, the second area is the replace string. In this case I want to replace two spaces with no spaces, so it looks like / //. The basic syntax is /searchstring/replacestring/. The g at the end tells VI to do the search and replace globally on the line. Otherwise it would replace a single instance on each line….not very effective when you would need to do it another 221 times for the above example.
With reports there is a scope of about 100+ files that needed this hand holding. I definitely wasn’t going about it in the most efficient way possible, as even with the fastest copy and pasting on the planet, it still took 5-10 minutes to get through all the files. Thus I thought there must be a way I can script something to automate this and make my life a lot easier and cause my Ctrl, C and V keys a little less pain from constant depression.
There are a million different ways to skin a cat when it comes to scripting. What I am about to show you could have been done in Perl, Awk, Bash (what I used), or any one of a thousand different ways within each of these languages. Bash is native to linux and is very very simple when it comes down to it, but also very very powerful. One of the nice things about Bash is that you can use linux commands that you could use from the command line.
I decided that, for my purposes, I wanted to take all the files in a directory, remove the extra whitespace, and deposit them in another directory. I also wanted to be able to specify the directory on the command line, because this is something that I can use over and over again and not just with reporting. Originally I thought that I might be able to use the “tr” command, which translates. However that command did not do what I needed it two, its purpose is to translate from one string to another, not find a string and replace it with nothing.
Instead I used the sed command, which is a stream editor. It allowed me to do everything I wanted and also supports regular expressions, which means I could do more complex things without much modification if I needed in the future. Lets take a look at the entire script, then I’ll break that down.
#!/bin/bash
# strip_spaces.sh
# strips extra whitespace
#do not include / on the end of the path
# syntax: strip_spaces.sh sourcedirectorypath destinationdirectorypathsrcpath=$1 #source path for files
destpath=$2 #destination path for filesfor file in $(ls *$srcpath* ); #loop through all files in the srcpath
do
src=$srcpath/$file #define the full path for the source file
dest=$destpath/$file #define the full path for the destination filecat $src | sed -e ’s/ //g’ > $dest #cat the source file into sed, replace the spaces with nothing and write the destination file
done #obviously done, this is the end of the loop.
The first line tells the OS where the bash interpreter is located. Its always a pretty good bet its going to be at /bin/bash. You’d have a line like this if you were righting this in PERL or AWK as well. Then I have a couple of comments….comments are ALWAYS important, even in little files like this, so are descriptive variable names. I also listen in the top comments the syntax for how the script was to be used, so if I ever forget, or someone else wants to use it, its is clear the order and number of arguments needed.
Next I assign my two arguments to variables so I can reference them in the code and not need to keep straight which is which.
Next I want to loop through every file in the source directory. Click here for a listing and explanation of shell scripting control structures like for, while and until type controls. In this case
for file in $(ls *$srcpath*);
do
……
done
We are saying l take every line in “ls *$srcpath*” and loop through it. In the loop that variable will be called “$file.” (In bash you don’t use the $ until after the variable has been defined. To be honest I’m not sure why I need to have the asterisks around $srcpath here…I found the example online and cannibalized/modified it for my own purposes.
The next set of statements defines the full path for the source and the destination files. I didn’t have to do this, but for readability I like to use as few variables as possible where I’m doing the meat of my processing…makes debugging easier. So we define the source and destination files for this particular time through the loop.
The last statement of consequence is:
cat $src | sed -e ’s/ //g’ > $dest
The first part of this line, “cat $src” displays the source file to the screen. Except, when you send it into something else, which we do using the pipe. So we take the output of “cat” and send it into the “sed” command by way of the pipe.
The second part of thhe line, “sed -e ’s/ //g’ should look vaguely familiar from before, or at least the “s/ //g” should. This is just a regular expression, which sed can use. The -e tells sed that I will be sending in an expression, which is started and ended by the single quotations.
The final part of the statement is the “> $dest”. The > says to overwrite $dest with whatever is coming out of the command to the left. In contrast, if we wanted to append to the end of the file we would have used >> instead. Thus the new file is written in the destination directory with instances with two spaces back to back. We might end up with one extra space on the end of the file, if there are an odd number of spaces, but that is an acceptable thing, at least for my purposes. Different regular expressions could handle it differently and leave you with none, but I’m not so great with RegEx…so I try not to extend too far beyond my means when I need to get something done quickly.
Finally, before you run the file you’ll need to make it executable. Do this by changing the permissions, “chmod u+x filename.sh”. This will make the file executable for you only. Also, always make a backup of something before you experiment on it with a script, it will save you loads of time later.
Anyway, I hope you find this at least marginally helpful and informative so that you’ve got a few more tools in your arsenal when the need arises.
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
Leave a Comment