Category Archives: hadoop

s3 rename in batch

# rename.sh this example move all files in folder1 up to root directory, you can modify bucket name and regex to rename the files for f in $(aws s3 ls –recursive s3://bucket1/folder1/ | awk -F’ ‘ ‘{print $4}’);   do … Continue reading

Posted in hadoop, linux | Leave a comment

Cloudera hadoop cluster setup on Rackspace

1. start the master server (ubuntu in this example) and add username/group 2. install Java 1.6 (Sun JDK) 3. install CDH3 (namenode, secondarynamenode, jobtracker, datanode, tasktracker, pig,…) 4. config CDH3 sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster sudo update-alternatives –install /etc/hadoop-0.20/conf hadoop-0.20-conf … Continue reading

Posted in hadoop | Tagged , | 3 Comments