Friday, March 15, 2013

Listing files containing all the words

To find files which contain all words in a set, you can use the below awk script. Here I am searching for files containing three words (word1, word2 and word3). The script output all filenames which contain all three words.

 find  -type f -exec awk 'BEGIN{word1=0;word2=0;word3=0}/word1/{word1++}/word2/{word2++}/word3/{word3++}END{if(word1>0 && word2>0 && word3>0){print FILENAME}}' {} \;

Reference:
1. http://www.linuxquestions.org/questions/linux-newbie-8/grep-an-entire-file-but-must-contain-multiple-words-705681/

Wednesday, March 13, 2013

Automatic/periodic FTP download using cron jobs

I came across a situation where I had to download files from an FTP server every week. Initially I was doing it manually, but due to human errors I missed some data. I then realized it must be possible to automate the download.

I initially created a shell script to enable FTP download[2]. The script looks like:

#!/bin/bash
HOST='ftp.server.com'   # change the ipaddress accordingly
USER='username'   # username also change
PASSWD='password'    # password also change
ftp -inv $HOST<<EOF
quote USER $USER
quote PASS $PASSWD
bin
cd /move/to/remote/directory        
lcd "/local/directory/" 
mget filename*
cd /move/to/remote/directory2
lcd "/local/directory2/"
mget filename*     
bye
EOF

Using [1], I setup a cron job using the command:
crontab -e

The job entry format is pretty self-explanatory in the reference [1], and there are some commonly used job examples too.
I had to launch a job at the beginning of every week, so my entry in the file looks like:
0 10 * * 1 ~/ftp_download_script.sh
This line states that the script should be launched at 10am every monday.

References:
1. http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/
2. https://blogs.oracle.com/SanthoshK/entry/automate_ftp_download_using_sh

Sunday, March 10, 2013

Python - Value Error Unsupported Format Character

I had a simple python code working with URLs which caused an error of the form: "unsupported format character 'p' (0x70) at index 72".

The code looks like:
num = 1
abc = "http://<site>?value=[abc%20def],value2=%d"
URL = abc % (num)

This is caused because of using the % sign in the string. We need to escape the % sign with another % sign.
So the new string looks like:
abc = "http://<site>?value=[abc%%20def],value2=%d"

References:
1. http://yuji.wordpress.com/2009/01/09/python-valueerror-unsupported-format-character-percent-sign-python-format-string/