-
Recent Posts
Archives
Categories
Meta
Monthly Archives: August 2012
Pig distinct on large bag
very likely cause OOM, an easy trick is to divide and conquer. Instead of group all and then distinct in group, do subgroup = group data by (SUBSTRING(field_to_be_distinct,0,n); #use n to control the number and size of subgroups subgroup_cnt = … Continue reading
sort by tab delimited column
sort -t$’\t’ -k17n,17 xxx.txt