Our data model is a standard database table or spreadsheet in comma-separated csv format-- a single table that has no preset limits on the number of rows or columns.
"ID","systolic blood pressure","bad cholesterol","family history of heart disease","comments","risk" |
Joe,170,150,1,"very overweight","high" |
Emily,120,110,0,,"low" |
Brenda,200,200,0,"healthy","high" |
Robert,100,90,0,,"low" |
"systolic blood pressure","bad cholesterol","family history of heart disease","comments","class" |
170,150,1,"taking statins",2 |
120,110,0,,1 |
"ID X2","systolic blood pressure","bad cholesterol","family history of heart disease","comments","life expectancy years" |
18979,170,150,1,"very overweight",70 |
94321,120,110,0,,85 |
Prediction models typically expect records that summarize many previous transactions like total yearly purchases by a customer. For rare-event prediction, best results are often achieved by assembling a sample of all rare events and an equal-size sample of alternative events.
You will be emailed a extensive set of results to be displayed in a browser. Many different predictive models are invoked and are presented in a manner intended to be transparent and understandable to the non-expert. You control the output and quality of results by incremental revisions to the input data. Do not underestimate the importance of this task. You are the expert in that endeavor, and you can improve results by reacting to the results of previous experiments.
Currently, our limit on zip file transfer is 58Mb which is in the vicinity of half a terabyte of uncompressed data. This is quite large for structured data and prediction models. Small files having less than 5k records will be processed using a 50% train/test split.