Tag Archives: pig

pig DBStorage into mysql on EMR

sudo apt-get install libmysql-java Pig script: register /usr/share/java/mysql.jar STORE results INTO ‘test’ using org.apache.pig.piggybank.storage.DBStorage(‘com.mysql.jdbc.Driver’, ‘jdbc:mysql://host_ip/database_name’, ‘username’, ‘password’, ‘INSERT INTO test (a,b,c,d) VALUES(?,?,?,?)’); MySQL: /etc/mysql/my.cnf (change bind-address to 0.0.0.0) bind-address           = 0.0.0.0 sudo /etc/init.d/mysql restart mysql -u root INSERT INTO user … Continue reading

Posted in pig | Tagged , , , | Leave a comment

Pig distinct on large bag

very likely cause OOM, an easy trick is to divide and conquer. Instead of group all and then distinct in group, do subgroup = group data by (SUBSTRING(field_to_be_distinct,0,n); #use n to control the number and size of subgroups subgroup_cnt = … Continue reading

Posted in pig | Tagged , | Leave a comment