2020-08-17 Paperspace benchmarks
Using the XGBoost benchmark
python tests/benchmark/benchmark_tree.py --tree_method=gpu_hist python tests/benchmark/benchmark_tree.py --tree_method=hist
my own CPU
Ryzen 3900X: 15.4sec
(GT1030): 872.6sec . Clearly I have saved up money on this.
free-P5000 CPU:48.12 sec (2.4GHz CPU) ($0.78/hr)
free-P5000 GPU: 8.47sec ($0.78usd/hour)
V100 CPU: 58sec (it has 2.2GHz x4 CPU) ($2.4/hr)
V100 GPU: 4.432 sec – not bad! ($2.4/hr)
I think I will be happy with my own CPU for prototyping, and then
Free-5000 instance for running the experiments. I can also develop my code in such way that it can be ran on multiple
GPU+ instances ($0.45/hr each). The V100 will come handy when it comes to my upcoming FDTD code
Also, note that
GPU+ are not available in Europe;
P5000 is available in Europe. That’s OK.
$ gradient machines availability --region AMS1 --machineType V100 # Machine available: False $ gradient machines availability --region AMS1 --machineType GPU+ # Machine available: False $ gradient machines availability --region AMS1 --machineType P5000 # Machine available: True
2020-08-17 I get a top 3% score in a Kaggle competition
- Interpreting the source data correctly as rational, interval, ordinal and nominal
- adding engineered feature columns to indicate that certain data is unavailable or an outlier
- Transforming the rational and interval data to linearize the effects (log1p, sqrt, quantile transform)
- Hyperparameter distributed grid search
xgboostcustom forest ensemble. It probably overfits quite a bit, but I get satisfactory result anyway.
The source notebooks (it’s a pipeline) available on request to demonstrate that I am a genuine author of the solution.
2020-08-08 Complete a Kaggle tutorial
I’d say it was extremely basic, but still, for purposes of demonstration and job finding: