After 5 years of machine learning research, I discovered these 7 truths

Author | George

Translator | Zhang Jianxin

Planning | Ling Min

After 3 years of working on automated machine learning at Mindsdb, I quit my job, at least I won’t be working on any machine learning related career any time soon. I’ve been doing machine learning research for 5 years, but until today, I finally figured out a lot of things I didn’t know before, and I may even be aware of some things that others don’t know.

This article summarizes what I have learned from working on machine learning, please don’t take this as an “expert summary”, you can think of it as an outsider’s work of art – an unusually in-depth study of the zeitgeist , without the gain of someone who is part of the “community”.

1 What is the role of machine learning research?

As early as 5 years ago, I was exposed to machine learning research. But to this day, the role of machine learning research remains a mystery to me.

Most scientific subfields (true fields) can claim that they are a dual process of theory building and data collection. More theoretical fields, such as those revolving around the terms mathematics and computer science, have made progress almost entirely at the conceptual level. These concepts are so fundamental that it seems unlikely that they will be superseded, such as Euclidean elements and so on.

But machine learning is at a strange crossroads. Even if we admit that it has the same theoretical rigor as physics, it still lacks timeless experimental observations. It’s not that the participant lacks virtue, it’s just that the object of its study is a moving target rather than a concrete reality.

At most one can make eternal claims like this:

Given <so-and-so hardware>, <so-and-so accuracy> can be obtained on ImageNet’s <so-and-so part dataset>.

Given a dataset, and keeping a CV using the entire dataset, we can get the latest values ​​of x/y/z for some precision function.

Given an environment that allows a programmable agent to interact with, we can reach x/y/z points on the loop/penalty/time/observation & reward/knowledge/understanding matrix.

But these are not the kinds of claims that the theory itself revolves around, and the easily verifiable and timeless gains from machine learning are theoretically uninteresting. At best, they set a lower bound on the performance of digital hardware in specific tasks.

Also, machine learning is not backed by strong theoretical guarantees. Although there are some “small” theoretical guarantees that can help us with wider experiments, such as proving differentiability in a certain range, etc. But under idealized conditions, some theoretical guarantees can, at best, point to potential experimental paths.

One might argue that machine learning is a very obscure field, and even that most of the data is fake and that most papers are published to win in the wider academic paper citation game.

In reality, however, machine learning may be the only field in academia that works correctly. Typically, machine learning papers are accompanied by code and data, and methods that are rigorous enough to be replicated by others. The points made in the paper are usually easy to justify and obtain using the tools provided by the authors. There are exceptions, but overall they are relatively few.

What’s more, unlike the fields of mathematics and computer science, repetitive research in machine learning is not reserved for a dozen or so experts with plenty of free time. If you want to validate a recent NLP paper, you only need to have knowledge of CS101. But in fields like mathematics, this is not possible. Because in mathematics, the validity of a modern “theorem” (sometimes written in a thick book) depends on the consensus of a handful of experts, not an automated theorem prover.

In fact, much of the actual work in machine learning is done by outsiders with little or no relevant academic background. This is the opposite in other fields. In other fields, credentialism is the absolute authority, and people are used to ranking by seniority.

2 What can software do that hardware can’t?

Broadly speaking, machine learning research includes work based on LA libraries or clustering tools, which always seem to be prone to repeating mistakes. The major modern techniques in machine learning can seem to be called slight conceptual refactorings of ideas from 20, 30, or even 50 years ago.

Just ask, how much of the progress in machine learning since the 1970s is software and how much is hardware? Unfortunately, there are surprisingly few papers on similar issues. Because it is well known that in any language task, trying to train a huge decision tree at a certain scale with some kind of T5 equivalent computation will not match T5, and the task of tuning the algorithm is not simple.

The question itself is a moot question, as some research is always required to develop the software that best suits the evolution of the hardware. A better question might be: if the amount of machine learning research were reduced by a factor of 1000, would there be any impact on performance or the breadth of tasks that could be handled?

My intuition tells me that most future advancements will be the result of hardware, and for those of us who prefer to focus on software, here’s what to do:

  1. Start by figuring out what advances in hardware can allow a person to do in 2-4 years, seek funding, and build a company around these sudden opportunities;

  2. Trying to find a paradigm shift and fix bottlenecks that waste 99% of our resources, these bottlenecks are so ingrained in our minds that we can’t even see them. Figure out that 99% of the available computing power is/actually/not being used and can use this cool trick. Coming up with a simple abstraction that performs alone on any available task is close to SOTA.

3 Automated Machine Learning

Most of my work over the past few years has revolved around automated machine learning, so I have a biased view that automated machine learning is an important part of machine learning. In fact, most people working in machine learning, whether in academia or industry, seem to be working on the edge of automation.

The relevant academic papers can be summed up in the following steps:

  • Architecture.

  • Make some small modifications to the hyperparameters.

  • Run benchmarks on several datasets.

  • Prove some theoretical guarantees that generally do not hold for any real-world training scenario and can be proven empirically (e.g. differentiability, uniform convergence when the data follows some ideal distribution)

  • Add enough filler.

The work of a data scientist and a machine learning engineer can also be boiled down to the following steps:

  • Try some easy-to-use models that, if set up correctly, don’t require 100 lines of code to work.

  • Try to tune the unit accuracy of hyperparameters on test data, even if it’s a bit bad in reality, it’s still worth deploying to production.

  • Wrap it in some kind of API for backend use.

  • If needed, write a cron job to train it every now and then.

  • Write a long PPT and present it to 5 people (with job titles starting with P or C) so they can safely allow you to deploy.

Broadly speaking, this seems like a very easy thing to automate. But it seems that these people’s work has nothing to do with steps 1-4, maybe vague theory and slides are the point.

Maybe this is a simplified way of thinking, but to put it another way, if automated machine learning is really so good, why aren’t more Kaggle leaderboards dominated by people using these packages?

4 Benchmarks and Competition in Machine Learning

The interest in benchmarks and competition does not appear to be high compared to the number of papers published. The number of machine learning-related papers appearing on arxiv is well over 100 per day, while the number of papers on the code leaderboard is much smaller.

I think the way most researchers demonstrate this is that they’re not trying to “compete” with their technology on anything, or trying to improve some kind of precision score, but trying to provide interesting theoretical support for designing and thinking about models direction.

That’s fine in itself, but in fact, as far as I know, no groundbreaking technology is entirely mathematically guaranteed and takes years to mature. Often, if something “works” and gets widespread adoption, it’s because it immediately improves results. Breakthroughs that take years or even decades to achieve are architectural ideas with wide-ranging impacts, but such breakthroughs are very rare.

The reality is that people assume “generic” techniques in papers, such as optimizers or boosting methods, hand crank a few formulas, and then use less than 12 datasets of model dependent variables (eg, optimizer-optimized architecture, boosting algorithms use estimators, etc.) for small to trivial benchmarks.

This is not my criticism of third-rate papers. In my head I can name theories like LightGBM, Modified ADAM and lookahead. For me and many others, they are game-changers, have proven their value in many real-world problems, and were originally proposed in papers with little to no experimentation.

I think the current problem boils down to three things:

  1. Lack of a “generic” benchmark suite. The OpenML automated machine learning benchmark is the closest to a general-purpose benchmark, but its problem focus is very narrow and limited to testing end-to-end automated machine learning scenarios. An ideal general-purpose benchmark should have a many-to-many mapping of architectures to datasets, allowing replacement of certain components in order to evaluate new technologies as part of a larger whole. To some extent, I had fantasies about building the Mindsdb benchmark suite like this, but I doubt anyone would really want such a solution, since there is no incentive structure.

  2. Lack of competition. I mean, sites like Kaggle and a dozen industry-specific clones have formats that make a lot of demands on users, and competitions give out more stingy rewards.

  3. A potential combination of the above 2 problems is that the most “valuable” problems in machine learning are difficult to even benchmark or compete against. Tasks such as translation, text embedding generation, and autonomous driving are tasks at various levels that are difficult to detect objectively with some metrics.

This goes back to the idea that if you do tech development work, you’re better off focusing on paradigm shift or productization unless you’re explicitly getting paid to do something else.

5 Has machine learning reached “state-of-the-art” in a particular domain?

Another interesting question is, is machine learning “state of the art” in a particular domain? This kind of problem can be contained in a .csv file, evaluated using a 0 to 1 precision function, and varies widely in terms of speed, mathematical guarantees, and “interpretability”.

Nonetheless, we are currently unable to answer this question with certainty.

What I can answer with almost certainty is that, from academic researchers to industry researchers to the average mid-sized company data scientist, they are not at all interested in the idea of ​​getting state-of-the-art results.

I’ve been fortunate (or unfortunate) to communicate with dozens of organizations about their machine learning practices, and my impression is that most organizations and projects that “want to use machine learning” haven’t even reached the “data-driven” level of machine learning. “stage. They want to start with conclusions and make predictions out of thin air. Their thought on assessing whether an algorithm is good enough for production is staggering.

About 30 years ago, a doctor published a paper in which he reinvented sixth-grade math in an attempt to figure out how to evaluate his diabetic patients. This happened in the age of the pervasive personal computer, and one would assume that life-or-death decisions that require standardized computation must be made by a machine, not by someone who hasn’t even heard of calculus. Worse yet, considering we’re talking about someone who actually published a paper, which was the top 0.1% of the field at the time, only God knows what everyone else is doing.

I have a feeling that whatever the broader phenomenon this question describes, it is still the root cause of machine learning’s lack of impact in other fields. Raising the precision of some problems by rounding error, or theoretically guaranteeing that the algorithm is within 0.3% of a perfect solution, or reducing parameters for easier interpretation are unlikely to help.

In my opinion, most of the problems people have when using classical machine learning are people, and there is not much research to solve this problem.

6 Machine learning is more like an alien brain

On the other hand, machine learning is increasingly being applied to “non-classical” problems, such as language problems or driving problems. In this field, the distinction between supervised and unsupervised seems to disappear, and people trying to interpret algorithms as simple mathematics, rather than generative systems self-selecting based on constraints, become as stupid as doing it with the brain.

At a macro level, having some influence in a certain direction produces highly specialized methods and algorithms that can be disseminated (as files or services) among researchers as a basis for building higher-level functions . Thirty years later, machine learning seems more like a giant, incredibly complex alien brain that controls most of society than it does linear algebra.

Skeptics argue that, given one or two thousand lines of code, a machine can write GPT-{x} from scratch, and that most of the work is on parallelizing and experimentable code, in achieving numerical percentages on improved techniques. Furthermore, the lack of the ability to objectively evaluate complex tasks will sooner or later lead to collapse, which is almost certain.

I don’t think the context of machine learning and autonomous driving is similar, we’ve spent a long time without much autonomous driving. To me, starting with multiple regression seems to explain machine learning better than predictive coding and game theory, which lay a better foundation. But I’m definitely more impressed with what machine learning can achieve, while also skeptical that progress will progress at a similar pace.

7 3 Development Directions of Machine Learning

Currently, under the machine learning branch, I see 3 interesting directions, and these directions are splitting further.

The first direction is the “classical” machine learning approach. They now have enough computing power to handle most high-dimensional problems. The central issue here is to provide more theoretical guarantees, to produce the kind of “interpretability” needed to underlie “causal” models, and to steer the zeitgeist away from straight lines, p-values, and Platonic shapes.

The second direction is industrial application. I think this has more to do with typical “automated” work, i.e. data wrangling, domain logic understanding, and political activism. It’s just that the new wave of automation is now, as ever, supported by more advanced tools.

The last direction is gilded type research, which is carried out by some idealists and many students trying to get into a career track with a dissertation. Here are the most interesting discoveries, hidden in piles of inoperable or low-impact noise. I’m not sure what’s going on outside the open door because I don’t have enough power to filter the noise. But on the surface, abstract concepts are moving into areas previously thought of as reinforcement learning.

Combine any two or two of these three directions and you’ll get something interesting. For example, AlphaFold is a last-minute advancement of Transformer and scientific domain expertise to replace “hand-made” models of protein folding; Tesla Autopilot is SOTA vision, RL and transfer learning, united with lobbyists, lawyers and marketers , automation reduces double-digit jobs; some people who study crisis recurrence seem to be in between the first two, trying to avoid human error in analyzing data and reviewing evidence by systematizing, although I think such research is time-consuming premature.

While this is an imperfect classification, it helps me understand the field as a whole. I think it’s also a good paradigm for thinking about what problems should be solved, who to work with, and what kind of context is needed.

The text and pictures in this article are from InfoQ


This article is reprinted from
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment

Your email address will not be published.