Yhat Whitepaper: Data Science in Practice

This blogpost is an excerpt of our most popular whitepaper about how data science gets applied in the real world. You can also download the full whitepaper PDF if you’d like.

What it’s About

In this whitepaper we introduce five common applications of data science that build upon that definition and goal. We debunk the impression that data science is some type of obscure black magic and give you concrete examples of how it is applied in reality. You’ll learn how real companies are using data science to make their products and day- to-day operations better. Last but not least, we describe the data science life cycle and explain Yhat’s role in getting models into production.

Application 1: Recommender Systems

Recommender systems, also known as recommender engines, are one of the most well known applications of data science. Recommender systems are a subclass of information filtering systems, systems that cut through the noise of all options and present users with just the subset of options they’ll find appealing. The data being filtered can range from products on an e-commerce site to dating matches that appear as you search for ‘the one.’

Recommender systems offer a more intelligent approach to information filtering than a simple search algorithm by introducing users to items they might not have otherwise discovered. Recommender systems generally take either a collaborative or content-based approach to filtering. Collaborative filtering considers a user’s previous behavior, as well as the behavior of similar users. Content- based filtering provides recommendations based on discrete attributes or assigned characteristics.

Data scientists at energy software company Tendril opted for a hybrid approach that combines both collaborative and content- based filtering. Tendril provides analytics and consumer solutions to energy suppliers, including which energy products consumers would most likely consider. “We use Support Vector Regression models to predict household energy consumption to provide our clients with in-depth, personalized information about their customers,” explains Mark Gately, Data Analytics Manager at Tendril. “This detailed information is also used in recommendation models, which help match eligible customers with new or existing energy products.”

Application 2: Credit Scoring

If you have ever applied for a credit card or a loan, you’re likely already familiar with the concept of credit scoring. What you may be less aware of is the set of decision management rules evaluating how likely an applicant is to repay debts behind the scenes.

The first general purpose credit scoring algorithm, now known as the FICO score, was introduced in 1989. The FICO score is still one of the most widely used models in the United States today, though peer-to-peer and direct lending organizations have focused on developing new techniques over the past few years. These new machine learning models and algorithms capture innovative factors and relationships that traditional loan scorecards couldn’t, like how applicants manage monthly cash flow or whether friends or community members would endorse the applicant.

One such company is Ferratum Bank, a pioneer in financial technology and mobile consumer lending since 2005. “We developed complex statistical and machine learning models to enable smarter lending decisions,” explains Scott Donnelly, Director of Business Lending at Ferratum Bank. “By getting creative with our approach and adopting innovative technologies, we’ve been able to reinvent how both consumers and businesses obtain loans. This has allowed us to reach prospective customers that in the past may have been overlooked by traditional banking institutions.”

Application 3: Dynamic Pricing

You walk out of the store, arms full of groceries, only to realize that a torrential downpour began as you perused the produce inside. You struggle to retrieve your phone, check your favorite ride app and are dismayed to find…a 2.1x surge!? Welcome to your first lesson on dynamic pricing.

Businesses use dynamic pricing algorithms to model rates as a function of supply, demand, competitor pricing, and exogenous factors (e.g. weather or time). Many fields, from airline travel to athletics admission ticketing, employ dynamic pricing to maximize expected revenue. The nuts and bolts of dynamic pricing strategies vary widely, though generalized linear models and classification trees are popular techniques for estimating the “right” (lowest/highest) price that consumers are willing to pay for a book, a flight, or a cab.

That’s it for today’s preview of the whitepaper. If you liked what you read & want to learn more,