Warum würde ein statistisches Modell bei einem riesigen Datensatz überanpassen?
Für mein aktuelles Projekt muss ich möglicherweise ein Modell erstellen, um das Verhalten einer bestimmten Personengruppe vorherzusagen. Der Trainingsdatensatz enthält nur 6 Variablen (ID dient nur zu Identifikationszwecken): id, age, income, gender, job category, monthly spend in dem monthly...
modeling
large-data
overfitting
clustering
algorithms
error
spatial
r
regression
predictive-models
linear-model
average
measurement-error
weighted-mean
error-propagation
python
standard-error
weighted-regression
hypothesis-testing
time-series
machine-learning
self-study
arima
regression
correlation
anova
statistical-significance
excel
r
regression
distributions
statistical-significance
contingency-tables
regression
optimization
measurement-error
loss-functions
image-processing
java
panel-data
probability
conditional-probability
r
lme4-nlme
model-comparison
time-series
probability
probability
conditional-probability
logistic
multiple-regression
model-selection
r
regression
model-based-clustering
svm
feature-selection
feature-construction
time-series
forecasting
stationarity
r
distributions
bootstrap
r
distributions
estimation
maximum-likelihood
garch
references
probability
conditional-probability
regression
logistic
regression-coefficients
model-comparison
confidence-interval
r
regression
r
generalized-linear-model
outliers
robust
regression
classification
categorical-data
r
association-rules
machine-learning
distributions
posterior
likelihood
r
hypothesis-testing
normality-assumption
missing-data
convergence
expectation-maximization
regression
self-study
categorical-data
regression
simulation
regression
self-study
self-study
gamma-distribution
modeling
microarray
synthetic-data