一种解释Data Science 信息来源:https://s3.amazonaws.com/aws.drewconway.com/viz/venn_diagram/data_science.html
解释Data science:
数学与统计知识 | 一旦你开始清理已获得的数据,那么从数据中提取信息的洞察力就显得尤为重要。你需要能运用恰当的数学和统计方法,特别是熟悉这些工具的基线。 |
黑客技术 | 数据作为电子交易的商品,需要借助黑客技术才能在“市场”中存在。为了躲避’黑帽’活动,数据黑客们必须能够处理文本文件的命令行,在算法上深入思考并不断学习新的工具。 |
专业知识 | 所谓“科学”就是不断发现和积累知识,这需要提出更多有建设性的问题以及能用统计方法进行测试的数据假设。总之,问题先行,数据支撑。 |
机器学习 | 数据和数学的结合就是机器学习。机器学习作为一种兴趣是无比美妙的,但在从事专门研究的“数据科学家”眼里却并非如此。 |
传统研究 | 传统的研究员们一般都致力于实质性的专业知识与数学统计知识的学习。比如博士级研究人员们花费大量时间学习这些领域获得的专业知识,但很少有时间涉及技术层面。 |
危险区 | 那些因了解的知识过多而开始质疑统计学真实性的人被划入“危险区”,也是上图中最有争议的部分。因为无论是出于无知或恶意, 这些复合的技术能增强他们分析的合理性却不能深入了解他们的分析思路和创造成果。 |
Math & Statistics Knowledge: Once you have acquired and cleaned the data, the next step is to actually extract insight from it. You need to apply appropriate math and statistics methods, which requires at least a baseline familiarity with these tools.
Hacking Skills: Data is a commodity traded electronically, therefore, in order to be in this market you need to speak hacker. Far from ‘black hat’ activities, data hackers must be able to manipulate text files at the command-line, thinking algorithmically, and be interested in learning new tools.
Substantive Expertise: Science is about discovery and building knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested with statistical methods. Questions first, then data.
Machine Learning: Data plus math is machine learning, which is fantastic if that is what you if that is what you are interested in, but not if you are doing data science.
Traditional Research: Substantive expertise plus math and statistics knowledge is where most traditional researcher falls. Doctoral level researchers spend most of their time acquiring expertise in these areas, but very little time learning about technology.
Danger Zone: This is where I place people who, ‘know enough to be dangerous,’ and is the most problematic area of the diagram. It is from this part of the diagram that the phrase ‘lies, damned lies, and statistics’ emanates, because either through ignorance or malice this overlap of skills gives people the ability to create what appears to be a legitimate analysis without any understanding of how they got there or what they have created.