How To Enter Data Science? Don’t Start With Machine Learning!

Must read

High-Speed Rail LiDAR Survey Starts in India

The proposed Delhi Varanasi High-Speed Rail (DVHSR) corridor with a length of about 800 km gained momentum. The LiDAR (Aerial Ground) survey for Delhi-Varanasi High-Speed...

Pix4D Celebrates 10 Year Anniversary And Launches A New Logo

Pix4D is celebrating its 10 year anniversary. The company is unveiling a new logo to commemorate the event. Pix4D emerged from the EPFL in...

Global Partnership Announced Between Schneider Digital And DAT/EM Systems International

The software producer DAT/EM Systems International and the 3D hardware specialist Schneider Digital, have announced a global partnership agreement for the distribution of the...

Webinar: Fast Volume Computation Of Stockpiles In Steelworks

Steelworks are complex realities facing several challenges. One of these is the volume computation of material stockpiles, now a day necessary to satisfy the...
Nursinem Handan ŞAHAN
2018 yılında Yıldız Teknik Üniversitesi Harita Mühendisliği bölümünü onur öğrencisi olarak tamamladı. Lisans eğitimi sırasında Erasmus+ programıyla Varşova Teknoloji Üniversitesinde öğrenim gördü. Halihazırda öğrenimine İstanbul Teknik Üniversitesi Coğrafi Bilgi Teknolojileri bölümünde devam etmekte.

The first thing most people think about when they hear the term “data science” is usually “machine learning”. Obviously, to be a “complete” data scientist, you’ll have to eventually learn about machine learning concepts. But you’d be surprised at how far you can get without it.

So why shouldn’t you start with machine learning?

1. Machine learning is only one part of a data scientist (and a very small part too).

Machine learning is (a part of) data science but data science isn’t necessarily machine learning, similar to how a square is a rectangle but a rectangle isn’t necessarily a square.

In reality, machine learning modeling only makes up around 5–10% of a data scientist’s job, where most of one’s time is spent elsewhere. By focusing on machine learning first, you’ll be putting in a lot of time and energy, and getting little in return.

2. Fully understanding machine learning requires preliminary knowledge in several other subjects first.

At its core, machine learning is built on statistics, mathematics, and probability. The same way that you first learn about English grammar, figurative language, and so forth to write a good essay, you have to have these building blocks set in stone before you can learn machine learning.

To give some examples:

  • Linear regression, the first “machine learning algorithm” that most bootcamps teach first is really a statistical method.
  • Principal Component Analysis is only possible with the ideas of matrices and eigenvectors (linear algebra)
  • Naive Bayes is a machine learning model that is completely based on Bayes Theorem (probability).

And so, conclude with two points. One, learning the fundamentals will make learning more advanced topics easier. Two, by learning the fundamentals, you will already have learned several machine learning concepts.

3. Machine learning is not the answer to every data scientist’s problem.

Many data scientists struggle with this. Similar to my initial point, most data scientists think that “data science” and “machine learning” go hand in hand. And so, when faced with a problem, the very first solution that they consider is a machine learning model.

But not every “data science” problem requires a machine learning model.

In some cases, a simple analysis with Excel or Pandas is more than enough to solve the problem at hand.

In other cases, the problem will be completely unrelated to machine learning. You may be required to clean and manipulate data using scripts, build data pipelines, or create interactive dashboards, all of which do not require machine learning.

Also Read: Entering Geospatial Machine Learning with GeoPandas

What should you do instead?

If you would like some tangible next steps to start with instead, here are a couple:

  1. Start with statistics: Of the all building blocks, statistics is the most important. And if you dread statistics, data science probably isn’t for you.
  2. Learn Python and SQL: If you’re more of an R kind of guy, go for it. The better you are at Python and SQL, the easier your life will be when it comes to data collection, manipulation, and implementation. Also be familiar with Python libraries like Pandas, NumPy, and Scikit-learn.
  3. Learn linear algebra fundamentals. Linear algebra becomes extremely important when you work with anything related to matrices. This is common in recommendation systems and deep learning applications. If these sound like things that you’ll want to learn about in the future, don’t skip this step.
  4. Learn data manipulation. This makes up at least 50% of a data scientist’s job. More specifically, learn more about feature engineering, exploratory data analysis, and data preparation.

 Source: Want to Be a Data Scientist? Don’t Start With Machine Learning.

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest article

High-Speed Rail LiDAR Survey Starts in India

The proposed Delhi Varanasi High-Speed Rail (DVHSR) corridor with a length of about 800 km gained momentum. The LiDAR (Aerial Ground) survey for Delhi-Varanasi High-Speed...

Pix4D Celebrates 10 Year Anniversary And Launches A New Logo

Pix4D is celebrating its 10 year anniversary. The company is unveiling a new logo to commemorate the event. Pix4D emerged from the EPFL in...

Global Partnership Announced Between Schneider Digital And DAT/EM Systems International

The software producer DAT/EM Systems International and the 3D hardware specialist Schneider Digital, have announced a global partnership agreement for the distribution of the...

Webinar: Fast Volume Computation Of Stockpiles In Steelworks

Steelworks are complex realities facing several challenges. One of these is the volume computation of material stockpiles, now a day necessary to satisfy the...

GNSS Software Defined Radio Metadata Standard Published

The GNSS Software Defined Radio Metadata Standard document has been published in NAVIGATION: Journal of the Institute of Navigation’s Spring 2021 issue, Volume 68,...