Author: Vincent Granville
What do experienced data scientists know that beginner data scientists don’t know? Here is a quick overview.
- Automating tasks. Writing code that writes code.
- Outsourcing tasks to junior members or to consultants.
- Managing people, hiring the right people, managing managers who report to you.
- Training colleagues who might not be tech-savvy. Be an adviser for senior managers.
- Identifying the right tools and assessing the benefits and minuses of vendor software and platforms, for a specific large-scale project (construction of a huge taxonomy, etc.)
- Identifying the right algorithms and statistical techniques for a specific project. Blending these techniques as needed for optimal performance.
- Not trusting data; identifying useful external or internal data sources, blending various data sources while cleaning data redundancies and other data issues.
- Identifying the best features, perhaps using ratios or transforming, combining raw features to turn them into better predictors. Usually require a good understanding of the business you are in.
- Understanding executive talk, and translating executive requests, questions, concerns, or ideas into successful data science implementations.
- Measuring the ROI that you bring to your company; being able to convince executives about your added value (or providing sound explanations if ROI is negative or not perceived as positive, and offering a corrective path.)
- Interacting successfully with managers / colleagues / executives / clients of all kinds. Mostly a communication issue.
- Having a clear understanding of what will create value for your company or client, and be able to deliver that value in a timely fashion, and consistently, despite internal politics and setbacks.
- Being able to assess how long a complex project will last, what the hurdles and rewards will be, and stay on track regarding deliverables and deadlines.
- Being able to suggest, create new projects, convince stakeholders, and manage these projects from start to finish.
- Being able to jump-start your own company and manage it successfully.
- Making recommendations about data science implementations, help with maintenance and make sure a project that started nicely, does not falter over time.
- Knowing what you don’t know, and manage to outsource or learn new things (and being able to identify / prioritize the knowledge that you need to acquire or outsource.)
- Understanding the business you are in, understanding the vision that executives have in mind, even if not clearly stated by your manager.
- Testing, testing, testing. Why fake news persist on Facebook despite all the efforts and many millions of dollars spent to fight this plague? Lack of testing, and/or not being able to figure out what fake news look like (lack of business acumen) may be the cause. Scammers change their tactics all the time; your algorithms should identify and take care of new trends, rather than just being able to detect 20 types of fake news, and missing new types that will come after your solution is implemented. Work with business analysts and IP admins to constantly refine your algorithms. Use robust algorithms and robust features. Correctly measure your success rate. Do meaningful cross-validation: if you train your algorithm on 15 cases of fraud (test data set); will it be able to identify 5 other cases of fraud not in the training set, but showing up in the control set? Avoid over-fitting like plague.
- Being able to NOT deliver a perfect solution based on a perfect model, requiring three months of work, when an approximate solution can be obtained in one day. After all, data is messy, so perfect models don’t really exist. You are just eroding ROI if you want absolute perfection.
- Trust your intuition and gut feelings, but only up to some point. Some questions can be answered without writing any piece of code or testing. Sometimes, simple simulations can go a long way. Same with post-mortem analyses (you can call it analytics forensics.)
- Spend time on documenting everything, prioritizing, discussing with stakeholders, and planning. At least one hour a day, usually much more.
For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn.
DSC Resources
- Free Book: Applied Stochastic Processes
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Post a Blog | Forum Questions