Xiaomeng (Shawn) Wan

Tensorflow load images for training

Posted on September 18, 2017 by Xiaomeng (Shawn) Wan

High level (with Estimator & input_fn) and low level (with feed_dict):

def input_fn():
    image_list = []
    label_list = []
    for f_name in glob('/Users/shawn/Documents/*.png'):
        image_list.append(f_name)
        label = int(re.match(r'.*_(\d+).png', f_name).group(1))
        label_list.append(label)
    imagest = tf.convert_to_tensor(image_list, dtype=tf.string)
    labelst = tf.convert_to_tensor(label_list, dtype=tf.int32)

    input_queue = tf.train.slice_input_producer([imagest, labelst],
                                                num_epochs=1,
                                                shuffle=True)

    filenamesq = tf.convert_to_tensor(input_queue[0], dtype=tf.string)
    file_content = tf.read_file(filenamesq)
    images = tf.image.decode_png(file_content, channels=3)
    images = tf.cast(images, tf.float32)
    images = tf.image.rgb_to_grayscale(images)
    resized_images = tf.image.resize_images(images, [80, 60])

    dataset_dict = dict(images=resized_images, labels=input_queue[1], files=imagest)
    batch_dict = tf.train.batch(dataset_dict, 100,
                                num_threads=1, capacity=100 * 2,
                                enqueue_many=False, shapes=None, dynamic_pad=False,
                                allow_smaller_final_batch=False,
                                shared_name=None, name=None)

    batch_labels = batch_dict.pop('labels')
    batch_images = batch_dict.pop('images')
    return batch_images, batch_labels

def main(unused_argv):

    classifier.fit(
      input_fn=input_fn,
      steps=100,
      monitors=[logging_hook])

image_paths = []
labels = []
for f_name in glob('/Users/shawn/Documents/*.png'):
    image_paths.append(f_name)
    label = int(re.match(r'.*_(\d+).png', f_name).group(1))
    labels.append(label)

image_paths_tf = tf.convert_to_tensor(image_paths, dtype=tf.string, name="image_paths_tf")
labels_tf = tf.convert_to_tensor(labels, dtype=tf.int32, name="labels_tf")

image_path_tf, label_tf = tf.train.slice_input_producer([image_paths_tf, labels_tf], shuffle=False)

image_buffer_tf = tf.read_file(image_path_tf, name="image_buffer")
image_tf = tf.image.decode_jpeg(image_buffer_tf, channels=3, name="image")
image_tf = preprocess_image_tensor(image_tf)  //see above processing

# creating a batch of images and labels
batch_size = 100
num_threads = 4
images_batch_tf, labels_batch_tf = tf.train.batch([image_tf, label_tf], batch_size=batch_size,
                                                  num_threads=num_threads)
# define train_step here

with tf.Session() as sess:
    sess.run(init)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in range(20):
        images, labels = sess.run([images_batch_tf, labels_batch_tf])
        _, loss_val = sess.run([train_step, loss], feed_dict={X: images, Y: labels})

    coord.request_stop() 
    coord.join(threads)

Posted in tensorflow | Tagged images, load, tensorflow, training | Leave a comment

bash: /usr/bin/[ls,find,mv]: Argument list too long

Posted on March 29, 2017 by Xiaomeng (Shawn) Wan

for f in $(echo folder*/file*); do mv ${f} .; done

Posted in linux | Leave a comment

Spark pipeline get best model

Posted on January 10, 2017 by Xiaomeng (Shawn) Wan

val lr = new LinearRegression()

val pipeline = new Pipeline().setStages(Array(lr))

val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0, 0.5, 1.0)).build()

val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new RegressionEvaluator).setEstimatorParamMaps(paramGrid).setNumFolds(2)

val cvModel = cv.fit(data)

val model = cvModel.bestModel.asInstanceOf[PipelineModel]

val lrModel = model.stages(0).asInstanceOf[LinearRegressionModel]

Posted in spark | Leave a comment

Spark dataframe stats mean

Posted on January 10, 2017 by Xiaomeng (Shawn) Wan

df.describe().rdd.map{ case r : Row => (r.getAs[String](“summary”),r) }.filter(_._1 == “mean”).map(_._2).first().toSeq.drop(1).map(x => x.toString().toDouble)

Posted in spark | Leave a comment

configure https for single instance elastic beanstalk running tomcat

Posted on December 18, 2016 by Xiaomeng (Shawn) Wan

add configuration files to the src/main/ebextensions folder as shown in following doc: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/https-singleinstance-tomcat.html
add the following plug in your pom file to ensure the extension folder end up in the root directory of the war file:

<plugin>

<artifactId>maven-war-plugin</artifactId>

<configuration>

<webResources>

<resource>

<directory>src/main/ebextensions</directory>

<targetPath>.ebextensions</targetPath>

<filtering>true</filtering>

</resource>

</webResources>

</configuration>

</plugin>

3. add A record to route 53 to map your domain to elastic beanstalk target xxx.us-west-2.elasticbeanstalk.com

4. (optional) ssh (eb ssh)to the ec2 instance to make sure the configuration/key/crt files are created. For some reason, the /etc/httpd/conf.d/ssl.conf isn’t created in my case, I have to add it manually, and then restart apache

Posted in aws, elastic beanstalk, ssl, tomcat | Leave a comment

s3 rename in batch

Posted on November 18, 2016 by Xiaomeng (Shawn) Wan

# rename.sh this example move all files in folder1 up to root directory, you can modify bucket name and regex to rename the files

for f in $(aws s3 ls –recursive s3://bucket1/folder1/ | awk -F’ ‘ ‘{print $4}’);

do aws s3 mv s3://bucket1/$f s3://bucket1/${f/.*\//}

done

Posted in hadoop, linux | Leave a comment

pig DBStorage into mysql on EMR

Posted on October 21, 2014 by Xiaomeng (Shawn) Wan

sudo apt-get install libmysql-java

Pig script:

STORE results INTO ‘test’ using org.apache.pig.piggybank.storage.DBStorage(‘com.mysql.jdbc.Driver’, ‘jdbc:mysql://host_ip/database_name’, ‘username’, ‘password’, ‘INSERT INTO test (a,b,c,d) VALUES(?,?,?,?)’);

MySQL:

/etc/mysql/my.cnf (change bind-address to 0.0.0.0)

bind-address = 0.0.0.0

sudo /etc/init.d/mysql restart

mysql -u root

INSERT INTO user (Host,User,Password) VALUES(‘%’,’username’,PASSWORD(‘password’));

GRANT ALL PRIVILEGES ON database_name.* To ‘username’@’%’ IDENTIFIED BY ‘password’;

Posted in pig | Tagged DBStorage, EMR, mysql, pig | Leave a comment

python histogram with arbitrary sized bins

Posted on March 11, 2014 by Xiaomeng (Shawn) Wan

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv(‘data.csv’)

factors, edges = pd.qcut(data.iloc[:,3],np.arange(0,1,0.25),retbins=True) “”” or replace np.arange(0,1,0,25) with any array of quantiles eg [0,.1,.25,.7,.99]”””

plt.hist(data.iloc[:,3],edges)

plt.show()

Posted in python | Tagged bin, histogram, python | Leave a comment

linux remove tab, space, return and newline

Posted on December 2, 2013 by Xiaomeng (Shawn) Wan

tr -d ‘\ 040\ 011\ 012\ 015’

Posted in Uncategorized | Tagged linux | Leave a comment

mongodb query array nth element

Posted on October 24, 2013 by Xiaomeng (Shawn) Wan

‘pageviews.0.page_type’:’home’ will check whether the ‘page_type’ of the first element in the pageviews array is ‘home’

Posted in Uncategorized | Tagged mongodb | Leave a comment

Xiaomeng (Shawn) Wan

Tensorflow load images for training

bash: /usr/bin/[ls,find,mv]: Argument list too long

Spark pipeline get best model

Spark dataframe stats mean

configure https for single instance elastic beanstalk running tomcat

<plugin>

<artifactId>maven-war-plugin</artifactId>

<configuration>

<webResources>

<resource>

<directory>src/main/ebextensions</directory>

<targetPath>.ebextensions</targetPath>

<filtering>true</filtering>

</resource>

</webResources>

</configuration>

</plugin>

s3 rename in batch

pig DBStorage into mysql on EMR

python histogram with arbitrary sized bins

linux remove tab, space, return and newline

mongodb query array nth element

Recent Posts

Archives

Categories

Meta

<plugin> <artifactId>maven-war-plugin</artifactId> <configuration> <webResources> <resource> <directory>src/main/ebextensions</directory> <targetPath>.ebextensions</targetPath> <filtering>true</filtering> </resource> </webResources> </configuration> </plugin>

Recent Posts

Archives

Categories

Meta

<plugin>

<artifactId>maven-war-plugin</artifactId>

<configuration>

<webResources>

<resource>

<directory>src/main/ebextensions</directory>

<targetPath>.ebextensions</targetPath>

<filtering>true</filtering>

</resource>

</webResources>

</configuration>

</plugin>