BigML News, Issue #59, December 2018
BigML - Machine learning Made Easy

Hi <<Full Name>>,

The new BigML release is here! Join us on Thursday, December 20, 2018, at 10:00 AM PST (Portland, Oregon. GMT -08:00) / 07:00 PM CET (Valencia, Spain. GMT +01:00) for a FREE live webinar to discover the latest addition to the BigML platform. We will be showcasing Principal Component Analysis (PCA), a key unsupervised Machine Learning technique used to transform a given dataset in order to yield uncorrelated features and reduce dimensionality. PCA is most commonly applied in fields with high dimensional data including bioinformatics, quantitative finance, and signal processing, among others.

BigML Principal Component Analysis (PCA)

Principal Component Analysis (PCA), available on the BigML Dashboard, API and WhizzML for automation as of December 20, 2018, is a statistical technique that transforms a dataset defined by possibly correlated variables (whose noise negatively affects the performance of your model) into a set of uncorrelated variables, called principal components. This technique is used as the first step in dimensionality reduction, especially for those datasets with a large number of variables, which helps improve the performance of supervised models due to noise reduction. As such, PCA can be used in any industry vertical as a preprocessing technique in the data preparation phase of your Machine Learning projects. 

BigML PCA Unique Implementation
BigML PCA is distinct from other implementations of the PCA algorithm, our Machine Learning platform lets you transform many different data types in an automatic fashion that does not require you to configure it manually. That is, BigML’s unique approach can handle numeric and non-numeric data types, including text, categorical, items fields, as well as combinations of different data types. To do so, BigML PCA incorporates multiple factor analysis techniques, specifically, Multiple Correspondence Analysis (MCA) if the input contains only categorical data, and Factorial Analysis of Mixed Data (FAMD) if the input contains both numeric and categorical fields. 
BigML PCA: Dimensionality Reduction
When we work with high dimensional datasets, we often have the challenge of extracting the discriminative information in the data while removing those fields that only add noise and make it difficult for the algorithm to achieve the expected performance. PCA is ideal for these events. While a PCA transformation maintains the dimensions of the original dataset, it is typically applied with the goal of dimensionality reduction. Reducing the dimensions of the feature space is one method to help reduce supervised model overfitting, as there are fewer relationships between variables to consider. The principal components yielded by a PCA transformation are ordered by the amount of variance each explains in the original dataset. Plots of the cumulative variance explained, also known as scree plots, are one way to interpret appropriate thresholds for how many of the new features can be eliminated from a dataset while preserving most of the original information.

To learn more about PCA, please join our free, live webinar on Thursday, December 20, 2018, at 10:00 AM PT.  Register today as space is limited! Stay tuned for our next 6 blog posts that will gradually present PCA and how to benefit from it using the BigML platform. 

BigML Inc

Our mailing address is:
BigML, Inc
2851 NW 9th Street
Suite D, Conifer Plaza Building
Corvallis, Oregon 97330

Add us to your address book

Copyright © 2018 BigML, Inc, All rights reserved.