博文

目前显示的是 四月 16, 2017的博文

Something about Interview of Data Scientist

图片
trend of general data scientist interviews earlier, more like software engineer interviews heavy on coding light problem solving and probablity currently  more balance btw coding & ML statistics coding still dispensable 0-1 coding problems in phone interviews 1-2 coding problems in onsite interviews new requirement deep understanding into algorithms and metrics context/domain knowledge for problems example: 为什么 decision tree 会 overfitting minimum split 减少 nonlinear bagging  greedy method , criterion ,一个数据点, generalize 所有数据 一个数据点, variance 大,参数越多数据越少 层数,最小 node size random forest  performace metrics  inbalanced data  down-sampling 1 / 10 down settling 十倍数目 AOC 不变 数据不平衡 up sampling down sampling smote sampling package overfitting 为什么方法 1 比 2 好,参数这么调 limited area collection prepare bayes rule— debug bootstrap  A , b test p-value —phone interview industrial blogs and papers  ...

Fuctional Programming Day1

图片
Function programs  Methods to construct Function programs Ways to reason about Function programs  Functional programming is a paradigm Migration path from a more concise java language to full-functional programming A paradigm describes distinct concepts / thought patterns in some scientific discipline Main programming paradigms Imperative java C current Functional  Logic  Orthogonal to these 3 paradigm OOP. Combine 3 Imperative  Modify mutable variables  Using assignments Control structures  3. e.g if-then-else , loops breaks continue return VN computer model: processor  Memory  Bus reads both instructions and data  Width of the bus of 1 machine 32/64bits nowadays Strong correspondence between the memory cells in VN machine and mutable variables in programming language 变量 Mutable variables —> memory cells Variable dereferences —> load instructions 变量法则...

Scala Materials

https://www.coursera.org/learn/progfun1/supplement/LogEn/eclipse-tutorial Quick References Scala Standard Library API Scala School! : A Scala tutorial by Twitter A Tour of Scala : Tutorial introducing the main concepts of Scala Scala Overview on StackOverflow : A list of useful questions sorted by topic Week 1 Martin ’ s talk at OSCON 2011: Working Hard to Keep it Simple ( slides ) Books: Structure and Interpretation of Computer Programs . Harold Abelson and Gerald J. Sussman. 2nd edition. MIT Press 1996. - [ Full text available online ] . !!! Programming in Scala . Martin Odersky , Lex Spoon and Bill Venners. 3nd edition. Artima 2016. http://www.artima.com/shop/programming_in_scala_3ed !!! Programming in Scala . Martin Odersky , Lex Spoon and Bill Venners. 2nd edition. Artima 2010. - [ Full text of 1st edition available online ] .Artima has graciously provided a 25% discount on the 2nd edition of Programming in Scala to all participants of this course. To...

How to do addition for sparse vectors

How to add sparse vectors  http://stackoverflow.com/questions/32981875/how-to-add-two-sparse-vectors-in-spark-using-python Week 4  NoSQL Building large scalable web applications  platform—do analysis Programming note >>> zip ([ 1 , 2 ],[ 0 , 3 ]) [( 1 , 0 ), ( 2 , 3 )] >>> dict ( zip ([ 1 , 2 ],[ 0 , 3 ])) {1: 0 , 2: 3} Something like this should work: from pyspark.mllib.linalg import Vectors , SparseVector , DenseVector import numpy as np def add ( v1 , v2 ) :     """Add two sparse vectors     >>> v1 = Vectors.sparse ( 3 , {0: 1.0 , 2: 1.0} )     >>> v2 = Vectors.sparse ( 3 , {1: 1.0} )     >>> add ( v1 , v2 )     SparseVector ( 3 , {0: 1.0 , 1: 1.0 , 2: 1.0} )     """     assert isinstance ( v1 , SparseVector ) and isinstance ( v2 , SparseVector )     assert v1.size == v2.size...

Scala environment set up on ec2 , Mac, and for Eclipse

图片
Today I want to take some notes to track my study process: I prepare to setup Scala environment on my ec2 AMI , and Mac seperately if the steps are same, I won't present them again. Now let's do it: A. Install sbt 1. JDK in my Mac: java 1.8 --recommended  in my ec2: 2. install SBT: http://www.scala-sbt.org/release/docs/zh-cn/Hello.html curl https://bintray.com/sbt/rpm/rpm > bintray-sbt-rpm.repo sudo mv bintray-sbt-rpm.repo /etc/yum.repos.d/ sudo yum install sbt 3. make a directory containing the source code mkdir hello cd hello vi hw.scala(paste) object Hi {   def main ( args : Array [ String ]) = println ( "Hi!" ) } or $  mkdir hello $ cd hello $ echo 'object Hi { def main(args: Array[String]) = println("Hi!") }' > hw.scala 4. run this Scala code sbt sbt 完全按照约定工作。会自动找到以下内容: 项目根目录下的源文件 src/main/scala  或  src/main/java  中的源文件 src/test/scala  或  src/test...