Python英语-Issue15


Five Interview Questions to Predict a Good Data Scientist

预测优秀数据科学家的五个面试问题

For those of us in the profession, we’re constantly being reminded of the drastic shortage of data scientists. It’s only going to get worse before it gets better since the demand for technologies like machine learning, AI, and deep learning are on such an upward trajectory. As a result of this deficiency, we’re seeing a lot of people sensing high-paying employment opportunities, and making the transition from other professions. As a result of this onslaught, the problem for employers is clear: you’re not always getting the best candidates for your open positions.

对于我们这个行业的人来说,我们不断被提醒数据科学家的严重短缺。由于对机器学习,人工智能和深度学习等技术的需求处于如此向上的轨道,它只会在变得更好之前才会变得更糟。由于这种不足,我们看到很多人感受到高薪就业机会,并从其他职业转变。由于这种冲击,雇主的问题很明显:你并不总是能为你的空缺职位找到最佳候选人。

What to do? Many firms craft employment ads that are seemingly designed to scare off candidates. Not everyone can fill the role of a data science “unicorn” calling for a Ph.D. in computer science and applied statistics, along with years of domain-specific experience. Of course, there are brave souls who apply for these jobs without the requisite knowledge and experience. You just need to effectively filter out the imposters.

该怎么办? 许多公司制作招聘广告,这些广告似乎旨在吓跑候选人。不是每个人都可以填补数据科学“独角兽”的角色,要求获得计算机科学和应用统计学方面的博士学位,以及多年的特定领域经验。当然,有勇敢的灵魂在没有必要的知识和经验的情况下申请这些工作,您只需要有效地过滤掉冒名顶替者。

The short list below is something I came up with to be used by hiring managers for data science positions (read: not data engineers) to help weed out the folks who are stretching reality with respect to their abilities. It’s true that many tech firms will include grueling coding tests during interviews, but these questions are more nuanced, focusing more on foundational knowledge, down-in-the-trenches experience, and data science common sense. The idea is to see if they know the basics, can create a viable strategy, and can practically solve a problem.

下面的简短列表是我想出来的,招聘经理用于数据科学职位(阅读:不是数据工程师),以帮助排除那些能力不符合要求的人员。确实,许多科技公司将在面试的时候会进行严格的编程测试,但这些问题更加细微,更多地关注基础知识和数据科学常识。我们的想法是看他们是否了解基础知识,可以创建可行的策略,并且可以实际解决问题。

  • What is the significance of the normal distribution to data science? This question is designed to demonstrate an understanding of one of the most basic elements of data science. It would be great if the response involved a discussion of the Central Limit Theorem, but maybe that’s too much to ask for. And maybe getting the mathematical formula for the Gaussian probability distribution function is an overreach. But aside from a mention of the “bell curve” it would be nice to hear something along the lines of: its mean, median and mode are all same, or the entire distribution can be specified using just two parameters — mean and variance, or maybe a description of its importance to linear regression (the workhorse of data science).

  • 正态分布对数据科学有何重要意义? 这个问题旨在表明对数据科学最基本要素之一的理解。 如果回应涉及对中心极限定理的讨论会很好,但也许这个要求太多了。 也许得到高斯概率分布函数的数学公式是一个超越。 但是除了提到“钟形曲线”之外,听到类似的东西会很好:它的均值,中位数和模式都是相同的,或者只用两个参数指定整个分布 -均值和方差,或者也许描述它对线性回归的重要性(数据科学的主力)。

  • Tell me about your passion for data science. Do you: attend local meetups, participate in data challenges like Kaggle, work to use data for common good like public data hacking, speak at conferences, write books or articles, etc.? The point of this question is to determine whether the candidate feels that data science is their true calling. Do they think and dream about data? Do they see a problem and instantly look for a solution involving patterns in data? What books are in their library? A related question is how much does a mathematical foundation for data science play a role in how they think about the subject? A data scientist who understands the math behind the algorithms will typically perform much better.

  • 告诉我你对数据科学的热情。你是否:参加当地的聚会,参与像Kaggle这样的数据挑战,努力使用公共数据等常见数据,在会议上发言,写书或文章等等? 这个问题的关键在于确定候选人是否认为数据科学是他们的真正使命。他们是否对数据有所思考和梦想? 他们是否看到问题并立即寻找涉及数据模式的解决方案? 他们的阅读书单里有哪些书? 一个相关的问题是数据科学的数学基础在他们如何思考这个主题方面发挥了多大作用? 理解算法背后的数学的数据科学家通常会表现得更好。

  • Describe that last time you experienced frustration in a data science project you were working on, and how did you overcome it? Not all data science projects progress swimmingly along, as many potential roadblocks may occur. This question probes the depth of their true experience and how they managed to handle inevitable problems. People with scant knowledge and experience will easily be exposed here.

  • 请描述一下您上次在数据科学项目中遇到的挫败感,以及您是如何克服它的? 并非所有数据科学项目都在顺利进行,因为可能会出现许多潜在的障碍。这个问题探讨了他们真实经历的深度以及他们如何设法处理不可避免的问题。 知识和经验不足的人很容易被曝光。

  • Think back to a past data science project you worked on. If the powers that be asked you to change one of your data sources, and thus use different predictors, how would you alter your solution? This question relates to the previous role the candidate has played, and how well they adapted to changing requirements such as introducing new data sets. Many times, lower level data scientists are simply given a data set with a list of predictors to use, without providing any input to their suitability. Heavier contributors, on the other hand, will be involved with dataset selection, feature engineering, and statistical analysis. You probably want a more well-rounded candidate for your team.

  • 回想一下你曾经做过的过去的数据科学项目。如果要求您更改其中一个数据源并因此使用不同的预测变量,那么您将如何更改解决方案? 这个问题涉及候选人之前扮演的角色,以及他们如何适应不断变化的要求,例如引入新的数据集。很多时候,较低级别的数据科学家只需给出一个数据集,其中包含一系列预测因子,而不对其适用性提供任何输入。 另一方面,出色的贡献者将参与数据集选择,特征工程和统计分析。您可能希望为您的团队提供更全面的候选人。

  • Research has stated that 2.3 billion people have been affected by floods in the last two decades. Describe how you’d approach a data science project to predict upcoming floods in the next 100–500 years. These predictions can be used to build dams at correct locations to minimize loss. This kind of question, or one more in alignment to your specific industry, calls for consideration of the “data science process” including problem formulation, data acquisition, data wrangling, exploratory data analysis, feature engineering, modeling the data (build, fit, and validate a model), and data storytelling with the results. The candidate needs to be intimately familiar with a data scientist’s workflow.

  • 研究表明,过去二十年来有23亿人受洪水影响。描述您如何接近数据科学项目,以预测未来100 - 500年即将发生的洪水。 这些预测可用于在正确的位置建造水坝,以最大限度地减少损失。 这类问题,或者更符合您特定行业的问题,需要考虑“数据科学过程”,包括问题制定,数据采集,数据清洗,探索性数据分析,特征工程,数据建模(构建,拟合, 并验证模型),并用结果进行数据叙述。 候选人需要非常熟悉数据科学家的工作流程。

If you’re looking for a good data scientist versus someone who just claims a title, then the above questions are surprisingly effective to quickly differentiate between the two. The good thing about these questions is that you can fine-tune the acceptable answers in terms of your industry or even your company.

如果您正在寻找一位优秀的数据科学家而不是那些只是声称拥有头衔的人,那么上述问题对于快速区分这两者非常有效。 这些问题的好处在于,您可以根据您的行业甚至您的公司微调可接受的答案。

https://medium.com/predict/five-interview-questions-to-predict-a-good-data-scientist-40d310cdcd68



Python英语

『Python英语』从开始到现在,已发布15期。英语学习,对于开拓视野,学好python还是有诸多益处的。 关于『Python英语』专题,希望能得到大家更好的建议,与大家一起,坚持学习Python。欢迎大家在本文后面留言提出改善的建议,感谢!


---------------- End ----------------


Python英语』

Issue14 | Issue13

Issue12 | Issue11 | Issue10 | Issue09

Issue08 | Issue07 | Issue06 | Issue05

Issue04 | Issue03 | Issue02 | Issue01



『推荐内容』

项目实战 世界杯系列 | 福布斯系列 | 求职系列

Jupyter系列: Cheat sheet | 设置主题 | 输出pdf | 安装py2和py3 | 自动补全代码

Bokeh系列: 入门 | figure | 基础图形 | CDS | 数据筛选 | 图形布局

『微信群』如需加入微信群交流,请添加微信(微信号:147121977,请备注“python”),后续将邀请入群(由于本人浪迹于长沙,所以长沙的小伙伴可以备注“长沙”,加入长沙的交流群)。

如果您对我的文章感兴趣或者觉得文章内容不错的话,请在阅读后顺便转发到您的圈子里,或者点个赞鼓励我继续前行! 感谢您的陪伴与支持!

Python数据之道

Making Data More Valuable


©著作权归作者所有:来自51CTO博客作者mb5fe18e7c44408的原创作品,如需转载,请注明出处,否则将追究法律责任

更多相关文章

  1. Bokeh中独特的数据类型简介: ColumnDataSource | Bokeh 小册子
  2. Bokeh中数据的添加、修改和筛选 | Bokeh 小册子
  3. 毫秒时间戳标识消息导致数据丢失的问题排查
  4. Pandas小册子:日期数据处理 - 如何按日期筛选、显示及统计数据
  5. 2017年文章汇总 - Python数据之道
  6. Python数据类型-List介绍(下)-列表推导式
  7. Python数据类型-List介绍(上)
  8. 内卷?猝死?企业如何利用数据分析提升人效比,让员工远离“996”?
  9. 【工具】历史文章分类汇总-V6 | Python数据之道

随机推荐

  1. XML实战秘籍第四卷:选单连动
  2. XML学习(二)详解DOM操作XML文档
  3. XML实战秘籍第三卷:动态分页
  4. XML学习(一)元素,属性,读取详解
  5. XML实战秘籍第二卷:动态查询
  6. 详细介绍XML和HTML常用转义字符
  7. XML实战秘籍第一卷:动态排序
  8. 详细介绍xml的使用方法总结
  9. XML基础讲解之结构与语法
  10. 详细介绍Android 解析XML文件和生成XML文