Machine learning challenges: what to know before getting started
By Sherry Tiao – The rewards of machine learning can be compelling, and it may make you want to get started, now. At the same time, however, you’ll want to consider machine learning challenges before you start your own project.
This article isn’t meant to scare you away; rather, it’s meant to ensure you’re prepared and that you’re carefully thinking about what you’ll need to consider before you get started.
We spoke with Brian MacDonald, data scientist on Oracle’s Information Management Platform team, about the pitfalls he’s seen and what companies can do to avoid them.
These machine learning challenges include:
- Addressing the skills gap
- Knowing how to manage your data
- Operationalizing the data
1. Address the Machine Learning Skills Gap
The biggest difficulty, of course, is the skills gap that comes with using machine learning in a big data environment. There’s a certain community of people who think that big data makes life beautiful and it will be easy to get started.
The biggest challenge you’re going to find is discovering the right people. There is a big demand for people who are skilled in machine learning and a small pool to choose from. But as we described in our article about machine learning success, having executive support is key to this. If you have executive support, you’re also going to have the funding to find and recruit those valuable people.
Here’s something to think about. If you’re in a situation where you’re very sensitive to cost because skilled data scientists are expensive, then you probably don’t have a big enough business problem to make machine learning worth doing.
Let’s say a skilled data scientist costs your company $300,000 to $400,000 (including all benefits and incentives). If that person can’t help you solve a problem that’s worth at least a million a year, then you probably don’t need that person. Right?
On the other hand, if you truly believe this person (or team of people) can help you solve a problem in the tens of millions, then what are you waiting for?
It is difficult to find people. But if it’s truly important to your company, you can find them.
Here’s another issue to think about: the tools and software. While there are of course tools that will help, you’ll rarely be able to find the exact, perfect machine learning tools you need that are ready to go for you, right out of the box. You’ll have to think about the tooling you’re going to use.
Python, R, SQL, TensorFlow? And if you use those, how will they work with your data lake? And how will you handle the setup and configuration that can create challenges? Think through the details before you get started and ensure you have enough funding.
2. Know How to Manage Your Big Data
Machine learning is a messy process. And just having a big data platform doesn’t automatically mean it will be easier. In fact, it might make it messier, because you’ll have more data. That data enables you to do more, but it also means more data prep that has to be done.
You’ll have to think holistically about how you’re going to approach the problem. Here are some questions to think about:
- Where is your data coming from?
- How are you going to approach the problem?
- How are you hoping to handle your data preparation?
- And once that’s done, how will you build your models and operationalize everything?
If you don’t already have a good BI practice or an analytics practice and if you’re not using data in all the ways you can think of already—well, jumping over to machine learning is really going to be a challenge. Already having data-driven decision making is absolutely critical. If you don’t have that, we recommend having that in place before you get started with machine learning.
If you do decide to start, here are some other considerations. Think about them carefully before you get started:
Rapid Change
In the machine learning world, innovation is coming quickly which means rapid change. What’s good today may not be so good tomorrow, and you can’t always rely on the software because it’s a more volatile space. You might get more issues with different versions and conflicts.
The Sheer Volume of Data
With machine learning, you’ll have to deal with data—lots and lots of different kinds of data. Understanding whether you use all of it, the processes, whether to sample, etc.—all of that can be a challenge, especially when you’re getting deeper into your data and dealing with data movement.
Ensure you’re up to facing that challenge and that you have your plan in place.
3. Operationalize Your Big Data
What’s the biggest issue most data scientists face? It’s operationalizing the data.
Let’s say you’ve built a model and it can predict factors that lead to churn. How do you get that model out to the people who can affect those numbers? How can you get it to the CRM or mobile app?
If you have a model that predicts equipment failure, how can you get it to the operator in time to prevent that failure? There are many challenges with taking a model and making it actionable. And it’s probably the biggest technical challenge that exists for data scientists these days.
You can build the most beautiful models in the world. But will your C-suite truly care if it’s not actually making an impact on the company’s bottom line? You might think your part of the bargain is just to make the data available. But it’s not. You have to make sure your data is actually going to be used. Gaining executive support is hugely helpful for this.
So machine learning isn’t really easy. But it can accomplish big things. To inspire you and remind you of what’s possible, we’re sharing a real-life customer example and their machine learning project.
Real-Life Machine Learning and Big Data Example
This company is one of the largest providers of wireless voice and data communications services in the United States.
Business Challenges:
- Credit Risk: Their equipment leasing and loan program through their financing arm has to write off large amounts of bad debt every year. They wanted to reduce bad loans and defaults, which will significantly add to their bottom line in millions every year. In addition, ability to impact pending collections will dramatically help with cash flow.
- Customer Experience and Personalization: Customer churn costs this company millions a year. Early identification and targeting of both potential churn and new high value customers through personalization and segmentation can dramatically increase the number of net new subscribers and reduce churn.
- Operational Effectiveness: This company sought enhanced targeted marketing and campaign effectiveness through network optimization and data monetization.
Technology Challenges:
- This telecom company wanted to detect fraudulent activity much earlier and integrate data from multiple structured and unstructured sources to improve customer scoring. This would enable the company to provide customized offers and reduce risk.
- They also wanted the ability to store and analyze large volumes of customer data to help the business develop a better ability to segment customers and predict their behavior for personalized offers.
- They sought to optimize pricing through new advanced what-if analysis.
In order to accomplish this, the company purchased a wide variety of Oracle big data products including Oracle Golden Gate for Big Data, which is part of Oracle Data Integration Platform Cloud.
Addressing the skills gap, managing the data, and operationalizing it are challenges that need to be dealt with—but they can handled successfully. And the results can be incredible. Read more on tips on success with machine learning for more information.
This article was first published on blogs.oracle.com.