Something about Interview of Data Scientist

trend of general data scientist interviews


earlier,

more like software engineer interviews
heavy on coding
light problem solving and probablity

currently 

more balance btw coding & ML statistics
coding still dispensable
0-1 coding problems in phone interviews
1-2 coding problems in onsite interviews

new requirement
deep understanding into algorithms and metrics
context/domain knowledge for problems


example:


为什么decision treeoverfitting
minimum split减少nonlinear bagging 
greedy methodcriterion,一个数据点,generalize所有数据
一个数据点,variance大,参数越多数据越少
层数,最小node size random forest 
performace metrics 
inbalanced data 
down-sampling 110 down settling 十倍数目
AOC 不变

数据不平衡 up sampling down sampling
smote sampling package

overfitting

为什么方法12好,参数这么调
limited area collection prepare
bayes rule—
debug bootstrap 
A,b test p-value —phone interview
industrial blogs and papers 
work on public competition data
with guidance from profs 
focus on 2-3 areas


suggestions

Regression likelihood
P value常见
Statistic tests f test 选择变量
面经,有限范围

Leetcode medium 高频题

How about AI

only if you really understand and have hand on experience with dl and ai 
do not claim it on your resuyme
if really interested,build up DL and AI experience and knowledge

explain RNN why it can work for certain problems
CNN
what is backward prop why its nessesary 
why gradient vanishing why bad how solve
what is ReLu advantage
how to tune deep neural network



language understanding translation recommendation
cnn image 构造 每一步
参数怎么调
initialization



summary 

plan 
know yourself and set goals
lay out:
1. coding sql
2. probability
3, ml
4, project experience

go deep to ml and projects
optimize ROI of interview preparation 


算法 掌握程度 background 投入时间 coding probability ml algorithm case study
时间 return

算法挖得很深 算法比较
curiosity critical thinking
ROI return of investment 
Boosting 为什么比较好为什么比RF










评论

此博客中的热门博文

8 Link Analysis

1 Map reduce problems

NoSql and AWS DynamoDB practices