An AI Scientist, Meta's Sapiens, and Grok 2 – Live and Learn #49
Welcome to this edition of Live and Learn. This time with a focus on the newly released AI Scientist by Sakana AI, the release of an even better 3D generation model, and the release of Grok 2 by xAI. As always, I hope you enjoy this Edition of Live and Learn!
✨ Quote ✨
I am, day in and day out, a skeptic of all claims. Whenever someone tells me “[new technology] is going to change the world,” my general response is indifference.
– Nicholas Carlini - (source)
🖇️ Links 🖇️
Sapiens Dataset by Meta. Meta released a family of models that can create segmentations, depth, and normal maps, and pose estimations for general video input of humans doing all sorts of things. Sapiens will be their foundation model for human-centric AR/VR vision tasks in the future. In true Meta fashion, the workflows and code are all publicly available on Github, with the models being available on HuggingFace too.
AI Scientist by Sakana AI. Sakana AI is working hard to build an AI model that can do autonomous scientific research and produce fully-fledged research papers as output. And they just released their newest contribution to this area of research which they call "The AI Scientist". They essentially built a model that can do autonomous machine-learning research. In the words of the report: "The AI Scientist can perform idea generation, literature search, experiment planning, experiment iterations, figure generation, manuscript writing, and reviewing to produce insightful papers." This on its own would be scary, what is even more ridiculous though, is that the AI is trying to modify its own source code and re-executes its own script to better achieve its research goals. This to me sounds like the beginning of the end. I feel like this is a meaningful shifting point in the grander scheme of things, where we are showing that AI models can do complete machine learning research and modify their source code to try to improve themselves to better achieve their goals. Right now the attempted modifications of the execution scripts are not that clever, but even the fact that the AI is trying to do this in the first place is a very big deal. If you read only one thing linked from this newsletter, it should be the entirety of this report (or even the 185-page PDF research paper).
Meshy 4. Text to 3D models keep getting better and better and the resolution and amount of detail are insane compared to what came before. 3D models are quickly approaching an output fidelity that is comparable to that of professional 3D modelers and eventually creating assets for games or VFX in a specific style will be available, just as image and audio generation are now.
Grok 2 by xAI. Grok 2 has been launched recently and it's a powerful state-of-the-art LLM. The main difference between it and other models like Llama 3.1 or GPT-4o is that it has little to no guardrails and comes with Flux image generation capabilities that are on par with those of other tools like Midjourney. This makes it unfortunately excellent for producing deepfakes though.
Building Genie Technical Report by Cosine AI. Genie is an LLM specifically trained to be good at coding tasks by using a different training pipeline with examples that are closer to the work that real-world software developers do. Tailoring the training data like this makes Genie much better at doing useful software engineering tasks than other models available right now. The technical report dives a little deeper into the details of how they achieved this and the performance of the model across various benchmarks.
🌌 Traveling 🌌
Portugal is still very nice and I am spending time in a cozy little place in the forests in the North of the country. Sleeping in a tent in these lush forests, days and nights with pleasant climate and lots of wonderful people around. Life is good and happy, days brimming with excitement.
🎶 Song 🎶
Learning by Jordan Rakei
That's all for this time. I hope you found this newsletter useful, beautiful, or even both!
Have ideas for improving it? As always please let me know.
Cheers,
– Rico