-
Recent Posts
Archives
Categories
Meta
Author Archives: Xiaomeng (Shawn) Wan
Tensorflow load images for training
High level (with Estimator & input_fn) and low level (with feed_dict): def input_fn(): image_list = [] label_list = [] for f_name in glob(‘/Users/shawn/Documents/*.png’): image_list.append(f_name) label = int(re.match(r’.*_(\d+).png’, f_name).group(1)) label_list.append(label) imagest = tf.convert_to_tensor(image_list, dtype=tf.string) labelst = tf.convert_to_tensor(label_list, dtype=tf.int32) input_queue = tf.train.slice_input_producer([imagest, … Continue reading
bash: /usr/bin/[ls,find,mv]: Argument list too long
for f in $(echo folder*/file*); do mv ${f} .; done
Posted in linux
Leave a comment
Spark pipeline get best model
val lr = new LinearRegression() val pipeline = new Pipeline().setStages(Array(lr)) val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0, 0.5, 1.0)).build() val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new RegressionEvaluator).setEstimatorParamMaps(paramGrid).setNumFolds(2) val cvModel = cv.fit(data) val model = cvModel.bestModel.asInstanceOf[PipelineModel] val lrModel = model.stages(0).asInstanceOf[LinearRegressionModel]
Posted in spark
Leave a comment
Spark dataframe stats mean
df.describe().rdd.map{ case r : Row => (r.getAs[String](“summary”),r) }.filter(_._1 == “mean”).map(_._2).first().toSeq.drop(1).map(x => x.toString().toDouble)
Posted in spark
Leave a comment
configure https for single instance elastic beanstalk running tomcat
add configuration files to the src/main/ebextensions folder as shown in following doc: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/https-singleinstance-tomcat.html add the following plug in your pom file to ensure the extension folder end up in the root directory of the war file: <plugin> <artifactId>maven-war-plugin</artifactId> <configuration> <webResources> <resource> … Continue reading
Posted in aws, elastic beanstalk, ssl, tomcat
Leave a comment
s3 rename in batch
# rename.sh this example move all files in folder1 up to root directory, you can modify bucket name and regex to rename the files for f in $(aws s3 ls –recursive s3://bucket1/folder1/ | awk -F’ ‘ ‘{print $4}’); do … Continue reading
Posted in hadoop, linux
Leave a comment
pig DBStorage into mysql on EMR
sudo apt-get install libmysql-java Pig script: register /usr/share/java/mysql.jar STORE results INTO ‘test’ using org.apache.pig.piggybank.storage.DBStorage(‘com.mysql.jdbc.Driver’, ‘jdbc:mysql://host_ip/database_name’, ‘username’, ‘password’, ‘INSERT INTO test (a,b,c,d) VALUES(?,?,?,?)’); MySQL: /etc/mysql/my.cnf (change bind-address to 0.0.0.0) bind-address = 0.0.0.0 sudo /etc/init.d/mysql restart mysql -u root INSERT INTO user … Continue reading
python histogram with arbitrary sized bins
import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv(‘data.csv’) factors, edges = pd.qcut(data.iloc[:,3],np.arange(0,1,0.25),retbins=True) “”” or replace np.arange(0,1,0,25) with any array of quantiles eg [0,.1,.25,.7,.99]””” plt.hist(data.iloc[:,3],edges) plt.show()
linux remove tab, space, return and newline
tr -d ‘\ 040\ 011\ 012\ 015’
mongodb query array nth element
‘pageviews.0.page_type’:’home’ will check whether the ‘page_type’ of the first element in the pageviews array is ‘home’