Stepping

博文

目前显示的是四月 16, 2017的博文

Something about Interview of Data Scientist

四月 22, 2017

trend of general data scientist interviews earlier, more like software engineer interviews heavy on coding light problem solving and probablity currently more balance btw coding & ML statistics coding still dispensable 0-1 coding problems in phone interviews 1-2 coding problems in onsite interviews new requirement deep understanding into algorithms and metrics context/domain knowledge for problems example: 为什么 decision tree 会 overfitting minimum split 减少 nonlinear bagging greedy method ， criterion ，一个数据点， generalize 所有数据一个数据点， variance 大，参数越多数据越少层数，最小 node size random forest performace metrics inbalanced data down-sampling 1 ／ 10 down settling 十倍数目 AOC 不变数据不平衡 up sampling down sampling smote sampling package overfitting 为什么方法 1 比 2 好，参数这么调 limited area collection prepare bayes rule— debug bootstrap A , b test p-value —phone interview industrial blogs and papers ...

继续阅读

Fuctional Programming Day1

四月 22, 2017

Function programs Methods to construct Function programs Ways to reason about Function programs Functional programming is a paradigm Migration path from a more concise java language to full-functional programming A paradigm describes distinct concepts / thought patterns in some scientific discipline Main programming paradigms Imperative java C current Functional Logic Orthogonal to these 3 paradigm OOP. Combine 3 Imperative Modify mutable variables Using assignments Control structures 3. e.g if-then-else , loops breaks continue return VN computer model: processor Memory Bus reads both instructions and data Width of the bus of 1 machine 32/64bits nowadays Strong correspondence between the memory cells in VN machine and mutable variables in programming language 变量 Mutable variables —> memory cells Variable dereferences —> load instructions 变量法则...

继续阅读

Scala Materials

四月 22, 2017

https://www.coursera.org/learn/progfun1/supplement/LogEn/eclipse-tutorial Quick References Scala Standard Library API Scala School! : A Scala tutorial by Twitter A Tour of Scala : Tutorial introducing the main concepts of Scala Scala Overview on StackOverflow : A list of useful questions sorted by topic Week 1 Martin ’ s talk at OSCON 2011: Working Hard to Keep it Simple ( slides ) Books: Structure and Interpretation of Computer Programs . Harold Abelson and Gerald J. Sussman. 2nd edition. MIT Press 1996. - [ Full text available online ] . !!! Programming in Scala . Martin Odersky , Lex Spoon and Bill Venners. 3nd edition. Artima 2016. http://www.artima.com/shop/programming_in_scala_3ed !!! Programming in Scala . Martin Odersky , Lex Spoon and Bill Venners. 2nd edition. Artima 2010. - [ Full text of 1st edition available online ] .Artima has graciously provided a 25% discount on the 2nd edition of Programming in Scala to all participants of this course. To...

继续阅读

How to do addition for sparse vectors

四月 22, 2017

How to add sparse vectors http://stackoverflow.com/questions/32981875/how-to-add-two-sparse-vectors-in-spark-using-python Week 4 NoSQL Building large scalable web applications platform—do analysis Programming note >>> zip ([ 1 , 2 ],[ 0 , 3 ]) [( 1 , 0 ), ( 2 , 3 )] >>> dict ( zip ([ 1 , 2 ],[ 0 , 3 ])) {1: 0 , 2: 3} Something like this should work: from pyspark.mllib.linalg import Vectors , SparseVector , DenseVector import numpy as np def add ( v1 , v2 ) : """Add two sparse vectors >>> v1 = Vectors.sparse ( 3 , {0: 1.0 , 2: 1.0} ) >>> v2 = Vectors.sparse ( 3 , {1: 1.0} ) >>> add ( v1 , v2 ) SparseVector ( 3 , {0: 1.0 , 1: 1.0 , 2: 1.0} ) """ assert isinstance ( v1 , SparseVector ) and isinstance ( v2 , SparseVector ) assert v1.size == v2.size...

继续阅读

Scala environment set up on ec2 , Mac, and for Eclipse

四月 21, 2017

Today I want to take some notes to track my study process: I prepare to setup Scala environment on my ec2 AMI , and Mac seperately if the steps are same, I won't present them again. Now let's do it: A. Install sbt 1. JDK in my Mac: java 1.8 --recommended in my ec2: 2. install SBT: http://www.scala-sbt.org/release/docs/zh-cn/Hello.html curl https://bintray.com/sbt/rpm/rpm > bintray-sbt-rpm.repo sudo mv bintray-sbt-rpm.repo /etc/yum.repos.d/ sudo yum install sbt 3. make a directory containing the source code mkdir hello cd hello vi hw.scala(paste) object Hi { def main ( args : Array [ String ]) = println ( "Hi!" ) } or $ mkdir hello $ cd hello $ echo 'object Hi { def main(args: Array[String]) = println("Hi!") }' > hw.scala 4. run this Scala code sbt sbt 完全按照约定工作。会自动找到以下内容：项目根目录下的源文件 src/main/scala 或 src/main/java 中的源文件 src/test/scala 或 src/test...

继续阅读