How to Choose a Business Intelligence Solution For Your Business

How to Choose a Business Intelligence Solution For Your Business

Choosing the right business intelligence (BI) solution for your company can be overwhelming. There are niche solutions, general solutions and everything in between. Some BI providers specialize in certain industries, business sizes or even job roles.

To muddle things further, many vendors offer solutions that are of outstanding quality, which makes objectively choosing the ‘best’ solution just a bit of a challenge. In this blog we will explore five key criteria you should consider when deciding on a new BI solution for your business.


1. Which challenges are you looking to solve?

The first step to ensure your BI project will succeed is to determine which pain points you would like to “relieve.” This step demands ongoing communication with the members of your business who will be using the software. Often discovering which pain points you’d like to solve is a process; through discussion and collaboration what is possible new and valuable ideas can emerge.

There is nothing worse – for client and vendor, both – than being promised the world, only to discover upon implementation that the BI solution cannot do what your team had hoped it would. Having a clear idea of the challenges you would like to solve within your business allows you to ultimately request from the vendor a proof of concept (PoC) that relates directly to your business and its growth.

2. Who will use the solution?

It is essential to identify those employees who will use the solution. Consider asking questions like:

        • What job roles do they fill?
        • What’s the extent of their technological skills?
        • Do they typically work from the office – or do they travel?
        • What type of device do they typically work from: laptop, smart phone, or tablet?

Your answers to these questions will help determine what requisite capabilities you need from a BI solution. Are you looking for a solution specifically for your sales department? Or would you like a solution that can also benefit other teams (e.g., executive or inventory teams)?

Assessing the technological skillset of your intended BI users, as well as the amount of time they have to sit down and learn about the new solution, will further influence which BI solution you may consider. Some solutions, while simple for your IT team, may appear complicated to non-technical employees. In that case, choosing a solution that intuitively analyzes data and which does not require extensive training may be the way to go.

You may also want to think about how your intended users prefer to work. If they often travel, choosing a solution that can only be used at the office may not make much sense. Similarly, if you are anticipate your warehouse manager tracking inventory with the BI solution, it might be worth considering a solution that can be used on a tablet, bringing BI to the warehouse floor and obviating the need to return to his desk for every new query.

3. Does the vendor understand your industry?

One of the most overlooked criteria when choosing a BI solution is whether the vendor understands your industry. This does not mean that the vendor must have designed a niche product focused exclusively on your industry, but rather that the vendor can prove it has successfully worked with businesses similar to yours.

Obviously, the better the vendor understands your business needs, the better its BI solution can analyze what you need it to analyze. For example, a solution focused on SaaS businesses is unlikely to understand the needs of a food and beverage wholesaler. However, a vendor that has experience with your industry will likely “have the goods” and may even suggest metrics or means to use the solution that you had not considered.

A good place to start investigating a vendor’s familiarity with your type of business is to read the case studies or success stories each BI software vendor places on its website. Such case studies will tell you which industries the vendor has worked with in the past, how the solution has helped the relevant businesses, and often include direct quotes from the customers. Checking case studies is a good way to quickly tell whether a solution would be relevant to your company.

4. What do current business intelligence users have to say?

Lastly, you want to ensure you know how current users of the BI solution rate their experience. While case studies provide valuable information about how a vendor has worked with its customers, these reviews provide a hands-on and current evaluation of the customer’s experience with the solution.

You can look for reviews on sites like Better Buys, as well as third-party reviews like those on BARC. When you read them, note whether the customer found the solution easy to use, valuable to his or her job role, and whether the vendor delivered what it promised.

5. Choose the right BI solution for your business

After you have mapped out your pain points, considered who will use the solution, and found a solution that knows your industry and has excellent reviews, it’s time to request that PoC from the vendor. Once you’ve engaged relevant stakeholders, (e.g., project managers, executives and end-users) to get their input on what they need to see in a BI solution, you solicit those vendor candidates to see which can best deliver a practical solution that satisfies all parties.

If one vendor gains consensus from the team, your decision has been made. Welcome to the brave new world of digital BI.

Once you have made your choice, implemented the solution, and started to investigate trends in your business data, you can start making data-driven decisions and reap benefits like increased sales, lower operation costs and improved workforce efficiencies.

What’s Next for Business Intelligence 2019?

What’s Next for Business Intelligence 2019?

What’s Next for Business Intelligence Software? 2019 Predictions

According to the Business Application Research Center’s (BARC) 2018 Business Intelligence Survey, the BI market is awash in a sea of IT meta-trends. These meta-trends lately have included data digitization and its security/privacy, solution agility and cloud deployment (SaaS), mobile platforms and artificial intelligence.


Taking the initiative, we asked several BI software experts to predict what awaits the market in 2019 and beyond. Their replies corroborate much of BARC’s findings, but also uncover some outlier trends. 

This is what they said: 

In 2019, BI goes AI

The confluence of factors pointing to the emergence of AI in business is undeniable. Machine learning is changing the world, and business finally has the right tools to amass the training data that algorithms need. 2019 will see headlines that show a broadening of the niche ML solutions we’ve seen in business to date. Automated expertise won’t just be the realm of boutique airline pricing or insurance risk prediction. Instead, digital agents will begin to collaborate with business analysts to automate decisions like where and how to invest in innovation or new markets, marketing spend, or competitive forecasts.

Of course, the effect of this new collaboration (which Garner calls “Augmented Analytics”) will not be shorter workdays and more time on the beach. Instead, competition will drive business to use AI agents as a means to increase the pace of decision-making and feedback cycles. In other words, AI will boost the efficiency of operations by letting businesses optimize resource allocation more frequently. This will be the start of a cycle that will disrupt many industries, just as many technological advances have in the past. Smart companies will start small and get quick wins in order to learn how AI works before scaling it up.

2019 BI Predictions by Better Buys

Mike Finley
Chief Data Scientist


AI will make data accessible (Finally!)

In 2019, I predict we’ll see artificial intelligence start to move from science project to production throughout the analytics ecosystem and really start to deliver on the promise of democratizing access to insights. AI will fundamentally change who has access to data, broadening the proverbial aperture and allowing anyone to interact, analyze, and utilize data.

While I predict this will alleviate much of the access issue in analytics, it’s not some silver bullet that will automatically turn organizations into data driven machines. All this access means enterprises need to teach their teams what data is available, what it actually means, and how to effectively use it. Enterprises next year must not only grapple with implementing AI, but simultaneously instill a culture of data literacy throughout the organization. It’s not going to be an easy transition, but for those that can do it successfully, the opportunities are endless.

2019 BI Predictions by Better Buys

Doug Bordonaro
Chief Data Evangelist


Insights embedded everywhere

“Software,” it is said, “is eating the world,” but analytics are eating software. And AI is eating analytics.

In other words, as software automates our existence, it creates a level of data size and complexity that only analytics can solve. As more people demand data-driven insights faster, those analytics will be streamlined, and eventually augmented by machine learning and AI.

We see 2019 as the year of “insights embedded everywhere” – analytics embedded not into graphs and dashboards, but into the very fabric of the workplace: products, processes, and places.

The objective is for business decision-makers to get the data they need, when they need it, where they need it – and to eventually eliminate dashboards and even standalone analytics themselves.

2019 BI Predictions by Better Buys

Amir Orad


ETL capability will be further integrated into BI platforms

With the growing demand for real-world machine learning, all modern data analytic applications will start to offer serious end-user driven data preparation and ETL capabilities alongside classic analytic functionality, because it’s the only way to make machine learning work properly.

Today, it is common for ETL tools used to Extract, Transform, and Load data into the target database to be separate from BI platforms. This limits the effectiveness of machine learning algorithms because they are applied to data that has already been prepared and aggregated. By incorporating ETL capabilities into the analytics platform, it becomes possible to apply machine learning to the raw data, leading to more relevant correlations and insights.

2019 BI Predictions by Better Buys

Avi Perez
Pyramid Analytics

7 Key Business Intelligence Trends for 2019

7 Key Business Intelligence Trends for 2019

7 Key Business Intelligence Trends for 2019

“You don’t tug on Superman’s cape. You don’t spit into the wind. You don’t pull the mask off that ol’ Lone Ranger,” and you don’t run your business without analytics.

As 2018 wanes, the company that dismisses business intelligence does so at its own peril. Over the past half decade, as BI solutions have evolved, more and more businesses have been run at the speed of analytics.

If it’s not already, yours ought to be.

Mining raw, collected data no longer suffices. Today’s analytics comes to life not with standard graph and pie charts, but with dynamic and real-time visualizations from which sound strategy can be developed – or adjusted. Business intelligence software delivers value by generating real-time analytics that delineates trends, from which proactive and objective decisions can be made.

The payoff? This deeper insight impacts and improves personnel, product, and user experience. Your company runs better. Staff is content. Customers are happy. Product is sold. Revenue climbs. Profits soar.

So, it’s no surprise when looking back to Better Buys’ 2016 BI Trends Infographic that our new and updated 2018 BI Trends Infographic continues to reflect the momentum, popularity and utility of Business Intelligence software.


How to Write a Great Data Science Resume

How to Write a Great Data Science Resume

Writing a resume for information science job applications isn’t a fun task, however it’s a necessary evil. the bulk of firms need a resume so as to use to any of their open jobs, and a resume is commonly the primary layer of the method in obtaining past the “Gatekeeper” — the Recruiter or Hiring Manager.

A resume (or résumé, or CV), by definition, may be a transient record of your personal, academic, and skilledqualifications and knowledge. Writing a quick outline of your own experiences seems like a straightforward task, however several struggle with it. Here square measure some tips on a way to write a transparent and apothegmatic resume that may catch the attention of a recruiter/hiring manager.

Keep it transient
The first issue you ought to try for in writing a resume is to stay it short. an honest resume ought to solely be one page long, unless you’ve got 15+ years of relevant expertise for the task you’re applying to. Even then, there square measure recruiters out there UN agency can toss any resume longer than one page. Recruiters/Hiring Managers receive a great deal of resumes daily, and that they typically have concerning thirty seconds to appearover someone’s resume and create a choice. you wish to boil your expertise all the way down to the foremostdetails and create it simply scannable.

Customize it for the particular information science job
While you actually will produce one resume and send that to each job you apply for, it’d even be an honest plan to undertake and add tailor-made tweaks to your resume for every application you are doing. though it needs a lot ofexercise front, adding little details here and there in accordance to the task description, would definitely impress the hiring manager/recruiter.

This doesn’t essentially mean you would like to try and do a wholesale rewrite and plan whenever you apply for a job! however at a minimum, you ought to explore the task posting, and if you notice vital keywords and skills mentioned there that you simply have data in, you ought to be adding them to your resume. you will additionallywish to require a glance at the corporate’s web site to undertake to induce a plan of their most popular vogue and tone, and regulate the writing and aesthetics of your resume consequently.

(Obvious, however still value inform out: don’t list any skills or expertise that you simply don’t even have. It’s fine to re-frame your real skills and experiences to suit the context of employment posting, however it’s not okay to exaggerate or create things up.)

Choose a guide
While each resume can forever embrace data like past work expertise, skills, contact data, etc., you ought to have a resume that’s distinctive to you. That starts with the visual look of the resume, and there square measure many alternative ways in which to accomplish a novel look. you’ll be able to produce your own resume from scratch, however it’s going to be easier to begin with artistic resume templates from free sites like Creddle, Canva, VisualCV, CVMKR, SlashCV, or maybe a Google Doc resume guide.

Keep in mind that the sort of resume guide you decide on is additionally vital. If you’re applying to firms with a a lot of ancient feel (the Dells, HPs, and IBMs of the world), try and aim for a a lot of classic, subdued variety ofresume.


If you’re aiming for an organization with a lot of of a startup vibration (Google, Facebook, Pinterest, etc…), you’ll be able to select a guide or produce a resume with a touch bit a lot of aptitude, maybe with some graphics and distinctive coloring.


You can additionally choose from a column-style resume (usually higher for folks that square measure troubled to suit everything on one page) or a block-style resume wherever everything is stacked in one column. Either way, though, keep it easy. Again, a hiring manager might solely be taking thirty seconds to scan this document and create a choice, thus once unsure, keep things short and sweet. Don’t be terrified of having some white house in your resume style.

Contact data
Once you’ve got either chosen a resume guide or set to form one from scratch, take a second to see to it the contact data section. Your name, headline, and get in touch with data should live at the highest of the page. Some templates can have the contact information settled towards all-time low of the page, thus you may wish to set upthe order manually if that’s the case. If a recruiter or hiring manager decides to contact you supported your resume, you don’t wish them to possess to go looking through the complete resume for that data.

Key things to recollect concerning your contact data and what you decide on to place there:

You do not need to place in your whole physical address, all you would like is that the town and state that you simply sleep in. it’s going to be best to depart your location off utterly if you’re applying for jobs in alternativecities (as long as you’re willing to relocate).

Always ensure you’ve got an honest, operating telephone number and a professional-looking email address listed. an honest email would be some combination of your 1st and cognomen, i.e. or You don’t wish to use a personal-looking email addresses like on a resume.

You should embrace your Linkedin profile link, however you don’t wish to simply copy and paste the completeprofile uniform resource locator, because it can look ungainly. you’ll be able to produce a shorter, a lot ofcustomized profile uniform resource locator on LinkedIn (directions here). This uniform resource locator ought tobe some version or iteration of your name, i.e. otherwise you will merely use a uniform resource locator shortening service like

You probably additionally wish to feature a Github link or personal profile link to your contact data in addition. You’re applying for information science jobs, thus most employers square measure about to wish to require a glance at your portfolio to envision what forms of things you’re engaged on (and that you’re operating regularly).

Make sure your headline (typically found beneath your name) reflects the task you’re wanting to induce instead ofthe task you’ve got presently. If you’re making an attempt to become an information man of science, your headline ought to say information man of science albeit you’re presently operating as a cook.


Data science comes and publications
Immediately following your name, headline, and get in touch with data ought to be your Projects/Publications section. In any resume, particularly within the technology trade, the most issue you wish to focus on is what you’ve got created. For information scientists, this might embrace information analysis comes, machine learning comes, and even printed scientific articles or secret writing tutorials.

However, confine mind that hiring firms wish to envision what you’ll be able to truly do with the talents that you simply have listed. this is often the section wherever you’ll be able to show that off. you’ll be able to undoubtedlyembrace personal comes, however you ought to decide ones with some connexion or association to the taskyou’re applying for.

You want a minimum of one project or publication on your resume, however if you’ve got the house for a lot of, add as several as you’ll be able to showing neatness work. If you would like facilitate fashioning comes for your resume and portfolio, we’ve a full series of diary posts to guide you thru building nice information science comes.

When you describe every project, be as specific as attainable concerning the talents, tools, and technologies you used, however you went concerning making the project, and what your individual contribution was if you’re highlight cluster comes. Specify the secret writing language, any libraries you used, etc.

Don’t worry if it feels as if you’re repetition an equivalent skills you intend to list in your skills section. In fact, the a lot of times you’ll be able to add those key tools, technologies, and skills in your resume, the better. Recruiters and hiring managers typically use easy keyword searches to scan resumes, and you wish your relevant skills highlighted in as several spots as attainable once they search your resume.

At an equivalent time, though, bear in mind that an information scientist’s job isn’t simply to crunch numbers, it’s to investigate information and so communicate those findings during a method that solves business issues. information science recruiters square measure trying to find folks that have the technical skills that they have, however additionally {people UN agency|people that|folks that|those that|those who} square measure effective communicators and who perceive the massive image. they need information Scientists UN agency will effectively story-tell with information.

One way you’ll be able to demonstrate these traits is by highlight cooperative comes (which proves you’ll be able to add and communicate with a team) and by framing your accomplishments within the context of business metrics (which proves you perceive however your analyses apply to the larger business issues you’re making an attempt to solve). Write your comes section and your work expertise section with these ideas in mind.

Another great way you’ll be able to stand out from the herd during this section is with any mention of operatingwith unstructured information, i.e., any information you’ve worked therewith needed you to makespreadsheets/data tables yourself as a result of it didn’t come back to you in table format. samples of this might be operating with videos, posts, blogs, client reviews, and audio, among others. expertise operating with unstructured information is spectacular to hiring managers/recruiters, because it shows you’re capable of doing distinctive work with untidy information, not simply crunching numbers in pristine datasets.

Here’s a sample of what this section of your resume would possibly look like:


Work expertise
Next comes your work expertise. you’ll be able to label this section “Experience” or “Professional Experience”. Your most up-to-date work expertise ought to be listed on prime, with the preceding job below that, so on in written record order.

How way back you get in terms of expertise relies on a couple of things. usually you wouldn’t wish to travel back additional than five years. However, if you’ve got relevant work expertise that goes back additional than that, you will wish to incorporate that have in addition.

Keep in mind that whereas you don’t need to list all of your expertise, you are doing wish to take care that no matter you are doing list appearance seamless. Gaps of longer than six months in your work expertise section square measure a significant red flag for recruiters and hiring managers. If you’ve got such a niche, you most undoubtedly wish to clarify it on your resume. for instance, if you took 2 years off of labor to boost kids between 2015 – 2017, you continue to wish to feature those dates on your resume and state that you simply were a stay-at-home parent throughout that amount.

When scripting this section, every entry ought to embrace your job title, the company, the amount of your timeyou control the position, and what you accomplished in this role. Keep the data format uniform across your resume, however significantly during this section: if you utilize stuffed in bullets for your description of 1 job, ensure you utilize an equivalent precise bullet format for all the opposite job descriptions, too. an equivalent issuegoes for the way you list dates on your resume; if you’re orthography out the complete month for every work expertise date, then ensure you are doing this in every place on the resume wherever a date is enclosed.

If you’ve got relevant work expertise to the task you’re applying for (i.e. previous work that’s relevant to information science, analytics, etc.), ensure your description consists of principally accomplishments instead ofduties. Employers wish to envision what you truly did, not simply what you were imagined to do. And bear in mind, framing your information science accomplishments within the context of business metrics may be a great way to demonstrate you perceive the massive image and skills to translate your analysis results into real business outcomes.

If your work expertise isn’t relevant to the task you’re applying for, then you simply have to be compelled toembrace an organization name, your job title, and therefore the dates worked. You don’t have to be compelled totake up house with all the small print of Associate in Nursing unsuitable job.

Here’s Associate in Nursing example of what you would possibly embrace for a relevant job:


Although it’s nice to possess a degree, you almost certainly don’t wish to focus on that 1st on your resume unless you’re a graduating student trying to find their 1st job during a relevant field. several resume templates list education 1st, however if you’ve got work expertise and/or relevant comes to showcase, you’ll in all probabilitywish show those off 1st and place education nearer to all-time low.

The only belongings you ought to be listing within the Education section square measure post-secondary degrees (i.e. college, junior college, and graduate degrees). If you visited school however didn’t receive a degree, it’s best to not list that faculty. however if your degree isn’t relevant to the task you’re applying for, you ought to still list it. Some positions merely need a degree in any field, thus you wish to confirm you’re within the running for these positions.

If the graduation date for your degree is 15+ years back, use your discretion concerning whether or not you wishto incorporate a date or not. sadly, some firms see a graduation date beginning with 19XX as a red flag.

If you don’t have a degree, don’t sweat it, simply leave the Education section utterly off of your resume. What you don’t wish to try and do is add your highschool data underneath your Education section as this is often another red flag for recruiters and hiring managers.

Finally, don’t list “micro-degrees”, on-line coaching certificates, or alternative skilled coaching here. We’ll embracethat data elsewhere on your resume.


Skills, Certificates, and Extras
If you’re making an attempt to seek out your 1st job in information science, it may be tough to demonstrate you’ve got the relevant skills and knowledge on your resume. however there square measure one or two of various ways in which you’ll be able to sing their own praises relevant skills additionally to listing your information science comes and publications:

Including the relevant skills you’ve got learned during a Skills section
Adding Associate in Nursing “Extras” section with relevant activities and coaching.
The skills section isn’t optional; for technical positions, this is often a necessity. Recruiters and hiring managers can possibly do a keyword search as a primary step in viewing your resume, thus you wish to form key terms like “Python” or “Machine learning” square measure highlighted. solely list technical skills or tools here; you are doingnot have to be compelled to list soft skills like leadership, communication, etc.

Some resume templates might raise you to “rank” yourself for every ability, however it’s higher if you don’t list a ranking on your resume. You don’t ever wish to over promise or sell yourself short. The method a recruiter or hiring manager can explore your skills is by assumptive that the talents you list 1st square measure your stronger skills, and therefore the skills you list last square measure your weakest. For that reason, list your strongest and most relevant skills 1st, and leave skills wherever you’re less comfy or that square measure less probably to be relevant to the position for later in your list.

If you’ve done all of the higher than and still have house to fill in your resume, otherwise to indicate that you simply square measure continued to find out or grow in your required field is by having Associate in Nursing“Extras” section. This section may be tagged Awards, or Certifications, or coaching, or the rest that looksapplicable and skilled. within the information science realm, you would possibly wish to list any smart Kaggle competition results you’ve had, any on-line certificates you’ve earned (this is wherever you list your informationscience certificates and/or progress), meetup/events you’ve participated in this were relevant and purposeful, and the rest that demonstrates you’re actively concerned in learning and doing information science.

Data Science and Machine Learning Hackathons square measure a large and on your resume because it shows that you simply have a healthy competitive spirit and you’ll be able to enhance your skills and data in your field whereas making actual content and comes, and these might even be enclosed within the Extras section. (Check out sites like Machinehack and Hackerearth if you’re curious about taking part in hackathons however haven’t joined any however.)

Here’s a sample of what the talents and extras sections would possibly look like:


Finishing touches
Once you’re finished adding all of the relevant content to your resume, the last major issue to try and do may be aorthography and descriptive linguistics check. a large red flag for recruiters and hiring managers has grammatical or orthography errors on your resume.

These may be arduous to catch yourself, thus have a trusty friend (or a few) do a review of your resume for you and provides you feedback. {they might|they’ll|they will} catch little errors that you simply may have missed! A finished resume ought to look one thing like this:


Not sure you’ve the talents to induce employed yet? sign in for a free account at Dataquest and begin learning information science at once..

Looking for a lot of customized facilitate obtaining hired? Our Premium Subscribers will book one-on-one conferences with our information scientists for resume reviews, interview tips, and alternative career recommendation. Subscribe now!

Math in Data Science

Math in Data Science

Math is like an octopus: it has tentacles that can reach out and touch just about every subject. And while some subjects only get a light brush, others get wrapped up like a clam in the tentacles’ vice-like grip. Data science falls into the latter category. If you want to do data science, you’re going to have to deal with math.

If you’ve completed a math degree or some other degree that provides an emphasis on quantitative skills, you’re probably wondering if everything you learned to get your degree was necessary. I know I did. And if you don’t have that background, you’re probably wondering: how much math is really needed to do data science? In this post, we’re going to explore what it means to do data science and talk about just how much math you need to know to get started.

Let’s start with what “data science” actually means. You probably could ask a dozen people and get a dozen different answers! Here at Dataquest, we define data science as the discipline of using data and advanced statistics to make predictions. It’s a professional discipline that’s focused on creating understanding from sometimes-messy and disparate data (although precisely what a data scientist is tackling will vary by employer).

Statistics is the only mathematical discipline we mentioned in that definition, but data science also regularly involves other fields within math. Learning statistics is a great start, but data science also uses algorithms to make predictions. These algorithms are called machine learning algorithms and there are literally hundreds of them. Covering how much math is needed for every type of algorithm in depth is not within the scope of this post, I will discuss how much math you need to know for each of the following commonly-used algorithms:

  • Naive Bayes
  • Linear Regression
  • Logistic Regression
  • Neural Networks
  • K-Means clustering
  • Decision Trees

Now let’s take a look how much math is actually required for each of these!

What they are: Naïve Bayes’ classifiers are a family of algorithms based on the common principle that the value of a specific feature is independent of the value of any other feature. They allow us to predict the probability of an event happening based on conditions we know about events in question. The name comes from Bayes’ theorem, which can be written mathematically as follows:


where A and B are events and P(B) is not equal to 0.

That looks complicated, but we can break it down into pretty manageable pieces:

  • P(A|B) is a conditional probability. Specifically, the likelihood of event A occurring given that B is true.
  • P(B|A) is also a conditional probability. Specifically, the likelihood of event Boccurring given the A is true.
  • P(A) and P(B) are the probabilities of observing A and B independently of each other.

Math we need: If you want to scrape the surface of Naïve Bayes’ classifier algorithms and all the uses for Bayes’ theorem, a course in probability would be sufficient. To get an introduction to probability, you can check out our course on probability and statistics.

Linear Regression

What it is: Linear regression is the most basic type of regression. It allows us to understand the relationships between two continuous variables. In the case of simple linear regression, this means taking a set of data points and plotting a trend line that can be used to make predictions about the future.

Linear regression is an example of parametric machine learning. In parametric machine learning, the training process ultimately enables the machine learning algorithms is a mathematical function that best approximates the patterns it found in the training set. That mathematical function can then be used to make predictions about expected future results. In machine learning, mathematical functions are referred to as models.

In the case of linear regression, the model can be expressed as:


where a1a2an represent the parameter values specific to the data set, x1x2xn represent the feature columns we choose to use in out model, and y represents the target column.

The goal of linear regression is to find the optimal parameter values that best describe the relationship between the feature column and the target column. In other words: to find the line of best fit for the data so that a trend line can be extrapolated to predict future results.

To find the optimal parameters for a linear regression model, we want to minimize the model’s residual sum of squares. Residual is often referred to as the error and it describes the difference between the predicted values and the true values.

The formula for the residual sum of squares can be expressed as:

(where y^ are the predicted values for the target column and y are the true values.)

Math we need: If you want to scrape the surface, a course in elementary statistics would be fine. If you want a deep conceptual understanding, you’ll probably want to know how the formula for residual sum of squares in derived, which you can learn in most courses on advanced statistics.

Logistic Regression

What it is: Logistic regression focuses on estimating the probability of an event occurring in cases where the dependent variable is binary (i.e., only two values, 0 and 1, represent outcomes).

Like linear regression, logistic regression is an example of parametric machine learning. Thus, the result of the training process for these machine learning algorithms is a mathematical function that best approximates the patterns in the training set. But where a linear regression model outputs a real number, a logistic regression model outputs a probability value.

Just as a linear regression algorithm produces a model that is a linear function, a logistic regression algorithm produces a models that is a logistic function. You might also hear it referred to as a sigmoid function, which squashes all values to produced a probabilty result between 0 and 1.

The sigmoid function can be represented as follows:


So why does the sigmoid function always return a value between 0 and 1? Remember from algebra that raising any number to a negative exponent is the same as raising that number’s reciprocal to the corresponding positive exponent.

Math we need: We’ve discussed exponents and probability here, and you’ll want to have a solid understanding of both Algebra and probability to get a working knowledge of what is happening in logistic algorithms. If you want to get a deep conceptual understanding, I would recommend learning about Probability Theory as well as Discrete Mathematics or Real Analysis.

Neural Networks

What they are: Neural networks are machine learning models that are very loosely inspired by the structure of neurons in the human brain. These models are built by using a series of activation units, known as neurons, to make predictions of some outcome. Neurons take some input, apply a transformation function, and return an output.

Neural networks excel at capturing nonlinear relationships in data and aid us in tasks such as audio and image processing. While there are many different kinds of neural networks (recurrent neural networks, feed forward neural networks, recurrent neural networks etc.), they all rely on that fundamental concept of transforming an input to generate an output.


When looking at a neural network of any kind, we’ll notice lines going everywhere, connecting every circle to another circle. In mathematics, this is what is known as a graph, a data structure that consists of nodes (represented as circles) that are connected by edges (represented as lines.)

Keep in mind, the graph we’re referring to here is not the same as the graph of a linear model or some other equation. If you’re familiar with the Traveling Salesman Problem, you’re probably familiar with the concepts of graphs.

At it’s core, neural networks is a system that takes in some data, performs some linear algebra and then outputs some answers. Linear algebra is the key to understanding what is going on behind the scenes in neural networks.

Linear Algebra is the branch of mathematics concerning linear equations such as y=mx+b and their representations through matrices and vector spaces. Because linear Algebra concerns the representation of linear equations through matrices, a matrix is the fundamental idea that you’ll need to know to even begin understanding the core part of neural networks.

A matrix is a rectangular array consisting of numbers, symbols, or expressions, arranged in rows or columns. Matrices are described in the following fashion: rows by columns. For example, the following matrix:

is called a 3 by 3 matrix because it has three rows and three columns.

With dealing with neural networks, each feature is represented as an input neuron. Each numerical value of the feature column multiples to a weight vector that represents your output neuron. Mathematically, the process is written like this:

where X is an mxn matrix where m is the number of input neurons there are and n is the number of neurons in the next layer. Our weights vector is denoted as a, and aT is the transpose of a. Our bias unit is represented as b. Bias units are units that influence the output of neural networls by shifting the sigmoid function to the left or right to give better predictions on some data set.

Transpose is a Linear Algebra term and all it means is the rows become columns and columns become rows. We need to take the transpose of a because the number columns of the first matrix must equal the number of rows in the second matrix when multiplying matrices. For example, if we have a 3×3 matrix and a weights vector that is a 1×3 vector, we can’t multiply outright because three does not equal one. However, if we take the transpose of the 1×3 vector, we get a 3×1 vector and we can successfully multiply our matrix with the vector.

After all of the feature columns and weights are multiplied, an activation function is called that determines whether the neuron is activated. There are three main types of activation functions: the RELU function, the sigmoid function, and the hyperbolic tangent function. We’re already familiar with the sigmoid function. The RELU function is a neat function that takes an input x and outputs the same number if it is greater than 0; however, it’s equal to 0 if the input is less than 0. The hyperbolic tangent function is essentially the same as the sigmoid function except that it constrains any value between -1 and 1.

Math we need: We’ve discussed quite a bit here in terms of concepts! If you want to have a basic understanding of the math presented here, a discrete mathematics course and a course in Linear Algebra are good starting points. For a deep conceptual understanding, I would recommend courses in Graph Theory, Matrix Theory, Multivariate Calculus, and Real Analysis. If you’re interested learning linear algebra fundamentals, you can get started with our Linear Algebra for Machine Learning course.

K-Means Clustering

What it is: The K Means Clustering algorithm is a type of unsupervised machine learning, which is used to categorize unlabeled data, i.e. data without defined categories or groups. The algorithm works by finding groups within the data, with the number of groups represented by the variable k. It then iterates through the data to assign each data point to one of k groups based on the features provided.

K-means clustering relies on the notion of distance throughout the algorithm to “assign” data points to a cluster. If you’re not familiar with the notion of distance, it refers to the amount of space between two given items. Within mathematics, any function that describes the distance between any two elements of a set is called a distance function or metric. There are two types of metrics: the Euclidean metric and the taxicab metric.

The Euclidean metric is defined at the following:


where (x1,y1) and (x2,y2) are coordinate points on a Cartesian plane.

While the Euclidean metric is sufficient, there are some cases where it does not work. Suppose you’re on a walk in a huge city; it does not make sense to say “I’m 6.5 units away from my destination” if there’s a gigantic building blocking your path.

To combat this, we can use the taxicab metric. The taxicab metric is as follows:


where (x1,y1) and (x2,y2) are coordinate points on a Cartesian plane.

Math we need: This one’s a bit less involved; really you just need to know to add and subtract, and understand the basics of algebra so you can grasp the distance formulas. But to get a firm understanding of the basic kinds of geometries each of those metrics exist in, I would recommend a Geometry class that covers both Euclidean and Non-Euclidean geometries. For an in-depth understanding of both metrics and what it means to be a metric space, I would read up on Mathematical Analysis and take a Real Analysis class.

Decision Tree

What it is: A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable - and each branch is the outcome of that test.

Decision trees rely on a theory called information theory to determine how they’re constructed. In information theory, the more one knows about a topic, the less new information one can know. One of the key measures in information theory is known as entropy. Entropy is a measure which quantifies the amount of uncertainty for a given variable.

Entropy can be written like this:


In the above equation, P(x) is the probability of the feature occurring in the data set. It should be noted that any base b can be used for the logarithm; however, common values are 2, e (2.71), and 10.

You might have noted the fancy symbol that looks like an “S”. That is the summation symbol and it means continuously add whatever function is outside the summation for as many times as you can. How many times you add is dictated by the lower and upper limits of the summation.

After computing entropy, we can start constructing out decision tree by using information gain, which tells which split will reduce entropy the most.

The formula for information gain is written as:


Information gain measures how many “bits” of information someone can gain. In the case of decision trees, we can calculate the information gain on each of our columns in the data set in order to find what column will give us maximum information gain and then split on that column.

Math we need: Basic Algebra and probability are all you really need to scrape the surface of decision trees. If you want a deep conceptual understanding of probability and the logarithm, I would recommend courses in Probability Theory and Algebra.

Final Thoughts

If you’re still in school, I highly recommend taking some pure and applied mathematics courses. They can definitely feel daunting at times, but you can take solace in the fact that you will be better equipped when you encounter these algorithms and know how to best apply them.

If you’re not currently in school, I recommend hopping over to your closest book store and reading up on the topics highlighted in this post. If you can find books that deal with Probability, Statistics, and Linear Algebra, I would strongly recommend picking up books that cover each of those topics in depth to truly get a feel for what is going on behind the scenes of both the machine algorithms covered in this post and ones that aren’t covered in this post.

Math is everywhere in data science. And while some data science algorithms feel like magic at times, we can understand the ins and outs of many algorithms without needing much more than algebra and elementary probability and statistics.

Don’t want to learn any math? Technically, you can rely on machine learning libraries such as scikit-learn to do all of this for you. But it’s very helpful for a data scientist to have a solid understanding of the math and statistics behind those algorithms so they can choose the best algorithm for their problems and datasets and thus make more accurate predictions.

So embrace the pain, and dive into the math! It’s not as tough as you think, and we’ve even got courses on a few of these topics to get you started:

  • Probability and Statistics
  • Linear Algebra for Machine Learning
How Will Artificial Intelligence Change Healthcare?

How Will Artificial Intelligence Change Healthcare?

Editor’s note: This piece was written in collaboration with the MAA Center, an online resource for those who have been exposed to asbestos and those looking to learn more about it. Lauren Eaton is a Communications Specialist at MAA. We hope this piece will give you an idea of how data is becoming a part of every field.

From the smartphones in our back pockets to advances within the care trade, technology’s impact on our lives is plain. The potential uses of technology appear endless, particularly for care professionals. One space demonstrating these ever-growing potentialities is AI (AI). AI has the potential to drastically modification the care trade, from simplifying body tasks toadditional concerned medical uses like rising early identification.

Data processing and optimizing administration

Whether it’s general info or specialised take a look at results, care suppliers got to method and organize giant amounts of data on a day after day. in step with a 2017 study by researchers from the University of Wisconsin and also the yankee Medical Association, “Primary care physicians pay over half of their workday, nearly half dozen hours, interacting with the EHR (electronic health record) throughout and once clinic hours.”

AI has the potential to drastically cut back this point by creating EHRs and alternative documents and processes additionaleconomical. for instance, AI may replace superannuated workplace fixtures, like fax machines, and alternative longtechnologies that suppliers have thought-about integral to the executive method. Similarly, AI “virtual assistants” may even beaccustomed method routine requests or grade a doctor’s disruption list.

Ideally, these changes can improve advancement, increase info accessibility, and provides time back to suppliers. howeverhowever may these technological changes and “virtual assistants” facilitate medical professionals within the examination room? and the way will patients enjoy this new assistance?

From image analysis to aiding identification

AI has the facility to influence however suppliers and patients act with care. Previously, AI has effectively performed image analysis for specializations like radiology, however recently the advantages of mistreatment AI to aid clinical judgement andidentification have begun to emerge.

One care organization in Iowa has even relied on “the initial autonomous AI diagnostic system” to discover diabetic retinopathy, a rare diabetic complication moving the eyes. tho’ applications of AI aiding identification area unit still rare, this specific use may expedite patient care and increase the number of your time medical professionals have with patients.

What AI may mean for early identification
While it’s unlikely that AI can ever replace human physicians entirely, it’s doable for AI to have an effect on patient survival rates due to however the technology may influence the method of identification. Researchers have already recommendedthat AI may have a a major impact on patient prognosis, particularly for people suffering from often misdiagnosed conditions,like heart condition and bound styles of cancer.

For example, Associate in Nursing early cancer identification will be priceless as a result of early detection might drastically impact the care accessible for patients diagnosed with rarer styles of cancer. this can be very true for cancers with long latency periods, like carcinoma. carcinoma patients usually begin showing symptoms anyplace from ten to fifty years oncetheir initial exposure to cancer inflicting substances like amphibole. {this is|this is usually|this can be} particularly dangerousas a result of early symptoms area unit often mistaken for additional common diseases, like the contagion, pneumonia, orcarcinoma.

So what will AI mean for healthcare?
Though automation and optimisation area unit helpful, AI is exclusive as a result of it’s designed to mimic the human braininstead of hoping on pre-fabricated algorithms. this can be one amongst the most important reasons AI may have a largeimpact on the care trade. Its ability to let machines “learn” may build it doable for technology to contribute in clearlyalternative ways from previous uses within the care trade.

While AI will ensure body tasks easier, the role it may play in aiding clinical identification has the potential to bring valuable insights to suppliers and patients. whether or not it’s Associate in Nursing earlier or additional correct identification, AI mayfacilitate guarantee patients area unit treated additional effectively and with efficiency.