There is something beautifully simple about using Ludwig in Kaggle. It is almost like calling a “make_me_a_model(data)” function.
Almost. I decided to give it a try. It was the first time I was using Ludwig, so I decided to make it a little bit hardcore. It is advertised as a “magical” tool that trains deep learning models without the need to write code. Let’s use it like this. I am going to provide a minimal configuration and see what happens.
First of all, we must install the Ludwig library. It turns out, that the version available in pip repository was throwing errors while running in a Kaggle kernel, so I installed the most recent code from the repository.
!pip install https://github.com/uber/ludwig/archive/master.zip
After that, it was easy. I loaded the training dataset, defined the features, and ran Ludwig. Note that I used the “PUBG Finish Placement Prediction” dataset.
import pandas as pd
data = pd.read_csv('../input/train_V2.csv')
model_definition = {
'input_features': [
{'name': 'assists', 'type': 'numerical'},
{'name': 'boosts', 'type': 'numerical'},
...
{'name': 'matchType', 'type': 'category'}
],
'output_features': [{'name': 'winPlacePerc', 'type': 'numerical'}]
}
from ludwig import LudwigModel
model = LudwigModel(model_definition)
model.train(data)
After that, I loaded the test dataset, ran the “predict” function and saved the results in an output file.
import pandas as pd
data = pd.read_csv('../input/test_V2.csv')
ids = data['Id']
predictions = model.predict(data)
model.close()
output = pd.concat([
pd.DataFrame(ids, columns = ['Id']),
pd.DataFrame(predictions, columns = ['winPlacePerc'])
], axis = 1)
output.to_csv('submission.csv', index=False)
It is still running, so I have no idea what the score is. Maybe it is going to run out of memory or the available processing time. I don’t know yet.
If it works well without any human-driven preprocessing, I will start to worry a little bit. I think it is going to fail miserably.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
How such tools are going to change data science and machine learning engineering? I think that the only change that we are going to see is less time spent on writing boring code. After all, implementing a preprocessing pipeline is fun the first time you are doing it. Maybe the second or even the fifth time it is fun too. At some point it gets boring.
Perhaps we should start reading more business books because it looks that simple machine learning can be successfully automated. What is left for us? In my opinion, now we can focus on the creative part of data science like finding new data, adding features to the dataset, looking for new business problems, and applying ML to solve them.