What is this all about
Predictron Labs provides an easy to use predictive analytics platform for its users.
OK, but what is this for then
Predictive analytics is for predicting future events based on patterns found in historical data. It is widely used in several business application like credit scoring, rate-making, fraud detection, campaign optimization and several others.
What I need to use it
First of all you need to register for an account. Once you received the confirmation letter about your registration you can log in to the PredictronLabs Observatory and you can create your first predictive analytics project.
Other than an account you also need to have something (most probably an event) you want to predict before it happens. If it has some business value that is even better. Based on the event you want to predict you need to create a dataset to train a model. In order to use your model it is also good to have a deployment dataset as well.
When you have no dataset right now, but you are still interested, get some sample dataset from the download section.
Why this service is better than using a predictive analytics solution
Buying a predictive analytics solution from a well established vendor could be extremely expensive and you need a data miner or a data scientist or someone with statistical knowledge to use these software to develop models. Moreover, if you want to deploy the model you trained, you most probably need to purchase additional software components from your selected vendor, or you need to develop an in-house solution that allows you to deploy the models regularly without human interaction.
Once you are able to deploy your models regularly, you will need to monitor their performance and check whether their predictions are still valid. This requires again additional packages to purchase from the vendor you've chosen or you need to develop something in-house again.
The Predictron Labs platform offer all this services for you in an easy to use framework.
Can I do predictive analytics without statistical knowledge?
Yes you can. All you need is to know what you want to predict before it happens (this is the target variable) and you need to have data about other events that might have impact to the target variable. If you are ready to create a dataset that fits to our requirements you can start developing models. Once you have a model, using the web UI you can research its performance. If you are happy with it, you need nothing else just to use them.
So, do I need to hire a data miner or a data scientist?
Not necessarily. Data miners and data scientist are usually clever people and clever people are good to be in any organization. However, using our service you do not depend on their unique skill-set and you will be able to unlock the power of predictive analytics with a help of a data engineer who can produce the data needed for training, evaluation and model deployment. By reading our online doc will make you understand the basic concept without worrying about statistic or the details of machine learning algorithms. If you are still unsure we also provide training and consultation when you need, just let us know.
I'm a data miner and it seems me you want to steal my job!
Not really. You are probably the one in your organization who has the best understanding of the data you have, moreover you probably spend a fair amount of time producing data for model training evaluation and deployment. Moreover you are the only one who knows what all these means and why is it important. So no, we do not steal your job, we just want to make it a bit easier. Using Predictron Labs, you still need understand the business concept, design experiments, and you also need to take care the proper data to be ready. Our framework will use this data to produce models and will give you a good interface to monitor, deploy and evaluate your the models. So, you will have just more time understanding the business need and care about other details rather than imputing, transforming variables and checking the performance of different algorithms with different settings through dozens of iterations. All this activities will do our framework for you.
I'm a data scientist an it seems my kids are going to starve!
Hopefully not, moreover we are recruiting ;). Using the Predictron Labs service you do not need to develop sophisticated applications to train models, you do not need to run dozens of iterations of the model training with different settings and variable transformations because the framework will do this for you. You also do not required to transform your code into a low latency robust scoring engine because we already did it for you. All you have to do is to create the best possible data you can to solve a certain predictive modeling problem and monitor the results produced by the framework and draw conclusions.
I'm a data engineer, using this software do I become a data scientist?
Not literally but you will be able to accomplish the same type of job a data scientist would without coding and statistical knowledge. Understanding and producing the data, care about proper deployment in your organization might also become your responsibility so in some manner you became someone who does the same as a data scientist or data miner or their line manager does.
Automatic model training, huh?
Yes the framework train models automatically based on the dataset uploaded. The dataset you use for training need to meet certain requirements to inform us about the role of some variables. All the rest such as imputation, data transformation, choosing the best performing algorithm is carried out by our platform. In some cases we will train several models in iteration to provide you a model with superior performance. We are pretty sure that the performance of our models compared to the time spent on training them is the best around the world.
Are my data secure?
Yes, we guard the privacy of your data to the best of our ability and work hard to protect your information from unauthorized access. Your data will be stored at world-class cloud comping providers in highly secure data centers utilize state-of-the art electronic surveillance and multi-factor access control systems. We also work together with external partners specialized in cloud provisioning and we have a strict policy and technical access controls that prohibit employee access except for the reasons stated in our Terms of Service and Privacy Policy.
Anyhow, we can good understand that information security is a key in several organizations, so we are happy to provide our service in your corporate or private cloud if you choose our private installation plan.
Is it possible to train better models than the framework did?
Theoretically yes. The performance tunning of predictive models is an endless challenge and even there are companies specialized in organizing competitions of predictive models. Anyhow, at the end of the day, the ultimate goal of models or the the data science work itself is to put a model in production. A model could have a higher performance but for this price it might be also too complicated and it might turn out to be impossible or, at least, very difficult to deploy in real environment. John Foreman pointed out :
"What’s better : A simple model that’s used, updated, and kept running ? Or a complex model that works when you babysit it but the moment you move on to another problem no one knows what the hell it’s doing"
So even at the end of a classical competition it just happened that not the best performing model was deployed:"Additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring the complex models into a production environment"
So in summary performance is important but deploy-ability is also a key and we are pretty good at both if not the best.Can I export the models from the framework?
Unfortunately not yet. Exporting the models into PMML is on our todo list, however we haven't implemented yet. Send us a development request to increase its priority.
Anyhow, in case you really need, we obviously can create you an export of your model with a small script in R or in Python that does the necessary data transformation as well.
Can I select what machine learning algorithm to use or tune its parameters?
Not yet. Right now the the framework is optimized to manage large number of models during their entire lifetime and the UI is not optimized to develop models individually. Anyhow, if you are obsessed about tuning individual models send as a development request and when there is a need we will add this feature to the UI.
What machine learning algorithms are trained in the framework?
It depends on the type of the problem, but our favorites are logistic regression, linear regression, svm, generalized linear model a decision tree or random forest.
Is your service ready to do segmentation?
Not yet, but this is really on the top of our todo list. Please send us a development request in order to be involved in the close alpha test of this feature.
Can I do forecasting using this service?
Unfortunately not yet, however we are keen to develop this feature. Please send us a dev. request to let us know how big is the need for this feature out there.
Can I do market basket or affinity analysis using this service?
Not yet but if you really need we can develop something for you in a consultation project ;), but to be honest you can do this by your own in pure SQL (an example), so you do not really need a separate tool for this i guess.
Does your engine suitable for real-time scoring?
Yes we can do that. Our service can have very low latency and you can get the prediction in the response of of the JSON send to the API. If you need real-time scoring please get in touch with us in order we can make sure the agreed SLA.
How does it scale?
It scales extremely well. Due to the highly modular architecture of our service and thanks to the virtually endless capabilities of cloud based resources it scales very well. If you have still some doubts drop us a mail and test it by yourself.
Can I stream my data?
Well, unfortunately not yet, however it is also on our todo list. Please send us a dev. request to let us know you need this.
Could Overfitting be a problem?
Of course not. Prediction Labs is a professional provider of machine learning models and in the background will split your dataset to several parts to achieve the best performance through bootstrapping and to avoid overfitting.
Do I need to impute my data?
In general, no. Missing data might convey information that could be useful to predict the target variable. So, the framework will take care of missing data and you will be notified when it cause troubles.