08.07.2022 14:00

Top 10 Mistakes You Should Avoid as a Data Science Beginner

News image


Data science is a huge success. Students from all over the world enroll in online courses and even master programs in data science.

Data science is a highly competitive field, especially if you want to land one of the dream jobs at one of the top tech companies. You have the opportunity to be competitive in this field by being prepared.

There are too many MOOCs and master programs, boot camps or blogs, as well as numerous data science academies. You may feel confused as a beginner. What course should we take? What should we study? What topics should we focus on? Which programming language and tool should we learn?

Each data scientist is unique and has a learning path. It is impossible to know you and determine the best way to approach your data scientist career.

Data scientists make the same mistakes over and over again. These mistakes can be avoided, but you can learn to avoid them. This is a story from Alex Noah.

My 20+ years of data science experience, including leading teams of up to 150 people and lecturing on a part-time basis in one of the top global universities, has led me to summarize the key mistakes that can be avoided to help you reach your dreams faster.

These mistakes will be made in the order that you learn as a beginner data scientist.

#1 Investing too much time in assessing all the different types and options of courses available before you finally start — or eventually never start

I understand that you feel overwhelmed by the number of courses. You try to avoid making any mistakes. You want to make the most of your money and invest in the best strategies that will bring you the greatest success.

There is no instant success in technical or scientific fields. You will have to work hard for it.

All established institutes, academies, and platforms have excellent courses today. Don’t overthink or over-analyze courses. Choose one course, take it all and then choose another.

It is crucial to start and do. This is the most important aspect of life. You don’t know what your journey will look like or how it might have changed if you chose another. That is something that no one can tell. That’s it.

Learning is not linear and circular. You can take one course in data science, but you may want to try another.

After all my years of experience, I still do data science, machine learning, and AI training. Every course that is “simple” for beginners, I discover a new angle and a different view of the topic. This is what makes data scientists so highly respected. Understanding all perspectives is key to understanding a topic.

#2 You want to learn too many methods and tools at once instead of learning and understanding the methods one by one

Many data scientists believe that having as many methods as possible on a CV will help them get jobs faster. The opposite is true. It’s obvious that if you apply for a job, and have only been working with data science since six months ago, it is a buzzword with no substance.

Regression models are the subject of many books. There are over 50 types of regression, each with its own preconditions. It doesn’t matter if you have only “regression” on your CV. Regression models remain the most important models in applications and serve as the foundation for data science.

It is important to understand the method, the assumptions, the parameters, and the pitfalls.

Every experienced recruiter, or the algorithm behind the process, can determine the depth of your understanding based on your CV and the way you describe the knowledge of regression.

It’s better to be able to apply your knowledge in a few methods than in many.

#3 You code everything from the beginning because you think this helps you to program better and faster

People think that they have to quickly code and re-program as many algorithms as possible when they start coding. You should also focus on understanding the basics and not quantity.

You must first understand the prerequisites for coding: mathematical induction, linear algebra, geometry, and discrete mathematics. This is the strength of programmers, but it is often overlooked by data scientists, statistics, probability theory, graph theory, Boolean algebra, and calculus.

Coding more did not make me better or faster. I learned programming by studying the mathematical foundation, reading code from others, and running and testing them with different data and problems.

While coding is important, it is even more important to understand the architecture of code. This can only be achieved by reading other codes.

Code is becoming more of a commodity. There are even no-code tools. It won’t be the coders who can or cannot program, but rather the ones who understand the architecture.

Let me show you another example. I assume that you are familiar with TensorFlow. But are you sure what it does? But what does it do? It is also known as “TensorFlow“. What is a tensor? It’s not just the mathematical calculation of a Tensor Product, but also what does it signify geometrically?

#4 By learning the theory, you think you know everything but miss enough practical experience

Data science is all about trial and error. You will only gain a deeper understanding of data science if you have as much experience as you can, making every mistake and solving them.

It is important to understand the theory. Understanding the basics is essential.

It rarely works as well in practice as it does in theory. It often works exactly in the way you’ve learned it should.

You must begin from the beginning by using practical examples. You will often not feel prepared to perform practical work because you don’t have enough programming experience or knowledge.

However, I recommend that you start from the beginning even if you are not ready to do any exercises. You don’t have to do it for a whole day or a week. It is sufficient to do a small project of 1-2 hours.

Either start with RapidMiner or KNIME, or you can take someone else’s code and use it. For example, Take a simple sentiment analysis and apply it to tweets or product descriptions. You can then modify the code to create other examples or compare the results.

As a child, you were taught to speak by using single words. Or expressions that contained two or three words. Gradually, you gained a feel for the language. The same applies to a data science practice.

Pro tip: Learning is circular. Keep your work safe. You can later come back and improve it, move to GitHub, or add visualizations using Tableau.

#5 You think that certifications are a competitive advantage to get a data science job

Certifications are acceptable. Many people will tell you not to get certifications. They can be a motivator and can show your willingness to learn. Certificates are still my favorite thing. It is okay to do it.

It isn’t a distinguishing factor in the market. Many people have the same certifications. To have a competitive edge, you need to go beyond the certifications.

A student approached me to help him with a finance internship. He wanted to use what he had learned and to get to know the culture of a data science team. He could be placed with a bank and wrote his semester thesis there. It is difficult to complete the semester thesis, internship, and study simultaneously. It will be an advantage in the job market and it will help him.

#6 You worry about the opinion of other people instead of building your own opinion based on facts

Many data scientists are concerned about what other data scientists think. The more arguments they hear, they become more confused. Although confusion is necessary to find clarity, it shouldn’t be a constant state.

Every data scientist is an individual, with their own learning experience, career path, and opinions. I know from experience that if there are two data scientists in a room, then you will have at most four opinions.

While it is fine to use opinions as an inspiration or guide in your search for information, they should not be considered the actual information.

Find hard facts. Draw your logical conclusions and validate them. This skill is essential to your success in data science.

#7 Not caring about business and domain knowledge

Many data scientists believe they can apply their methods to all industries and problems. However, I have more than 20 years of experience in this field.

Too often, data scientists presented their findings to businesspeople and the response was “oh, this is already known.” We need to know why it happens and how we can solve it. Or worse, this is absolutely absurd because that is not how our business operates.

Domain knowledge is more important than all the fancy and sexiest methods. Data scientists solve business problems, not technical ones. You bring value to the business by solving a business problem. Your solution is only as valuable as your company. This is possible when you are familiar with the business.

I have worked in many industries. Before I even started to get involved in the business, I had read extensively about it:

  • Wikipedia was my first stop. I learned a lot about Wikipedia and the history of the companies.
  • I searched for the investor relations information and annual reports of the top 10 companies within a particular industry.
  • I have read every news article about this industry and these companies over the past few years.
  • I reached out to my LinkedIn contacts who are in this industry.

It was then that I began to interact with the company.

Your learning should include half of the development of business and industry knowledge.

#8 You are not studying and learning on a consistent and ongoing basis

It’s easy to get distracted and give up on a topic you don’t understand. Data science is not a sprint. It’s a marathon. It is important to establish a consistent study schedule. You train daily, but in small groups, just like marathon training.

Learning is circular, as I have said before. You don’t have to be an expert just because you’ve studied it once.

Let me show you an example. I learned many limited theorems in the mathematics finance lectures. I passed the exam with flying colors and was confident that I understood them. Seven years later, I was required to review code for the evaluation of complex financial products. I saw that I didn’t understand the code until that moment.

Book at least daily or weekly a few hours of learning. It doesn’t matter if you are an aspiring senior data scientist or already one.

Learning should include new topics in data science, previously learned topics, but from a different perspective, e.g. Another course or book, technology trends, and industry knowledge, data visualization, data storytelling, and data applications.

This adds layers to your understanding and will allow you to present convincing answers in a job interview by being able show the whole picture from multiple perspectives.

#9 No storytelling with the data

Data science jobs require you to communicate your findings to non-technical people. This includes the business people. Your job is funded by the business. Without their support, your job would not be possible and neither would the data science team.

Your job is to add value to the company. You don’t have to use fancy methods just for the sake.

One of my friends is the global bank’s data science lead. They send data scientists a dataset two weeks before they need it and request a presentation of 20 minutes. They don’t need any further input. They want to hear the stories. They don’t care about the method used — except for candidates who would tell complete nonsense about them.

They want to know the frame of the problem and the reasons it is important to solve. The second is what needs to be solved. And lastly, how it is solved in a small business context. This is the most important job we do throughout the day. While a candidate should not be flawless in this area, she/he must show that they have a good understanding of what is required in our job.

Learn data storytelling, there are even free courses on it. Also, you can learn visualizations of data in a business context.

#10 Learning on your own without interactions with the data science community

Many people believe they can learn data science by themselves. Data scientists are all seen as rivals and it is difficult to share knowledge.

Living in a world where you can only learn and read based on your own selections is biased. This means that there are many other perspectives and viewpoints on any topic or method. Data scientists need to be able to engage in open discussion about topics and gain experience in argumentation.

An experienced recruiter will know after asking a few questions whether you are a one-person or multi-person show. This is a win-win situation for the company, as it increases your market value.

It is crucial to build a network. You can do this by attending Bootcamps, Hackathons, or Meetup meetings.

Now you can theoretically know what to avoid.

These mistakes can be a major problem in your data science job.

You will still make these mistakes, I’m sure. I’m not like you. It is human nature to believe that “I’m different” despite the fact that the data contradicts this belief. These mistakes can be avoided by being aware of them and allowing you to adjust your path more quickly, thereby becoming a data scientist who is sought after.

Thank you!
Join us on social networks!
See you!