340 - Comparing Top Large Language Models for Python Code Generation

Published: 31 July 2024
on channel: DigitalSreeni
2,895
100

A video walkthrough of up an experiment testing various large language models for generating hashtag #Python code for scientific image analysis.

340 - Comparing Top Large Language Models for Python Code Generation

The task: segment nuclei in a multichannel hashtag #microscopy image (proprietary format), measure mean intensities of other channels in the segmented nuclei regions, calculate ratios and report in csv.

All well-hyped models are tested:
Claude 3.5 Sonnet
ChatGPT-4o
Meta AI Llama 3.1 405-B
Google Gemini 1.5 Flash
Microsoft Copilot (in God's name, how do I find the model name?)

All generated code can be found here: https://github.com/bnsreenu/python_fo...

A summary of my findings:
When I didn't specify how to segment, all needed a bit of hand-holding. Claude did the best here.
When I asked for Stardist segmentation, ChatGPT-4 nailed it on the first try, with minimal code.
Claude and Meta AI weren't far behind, just needed a small tweak (normalize pixel values).
Gemini and Copilot... well, I'm super disappointed with their performance. Didn't manage to run the code at all, even after many prompts.

With Stardist-based segmentation, the code generated by ChatGPT, Claude, and Meta AI produced statistically identical results for the intensity measurements.

While AI is making rapid progress, the need for detailed prompting to obtain reliable results underscores the continued importance of domain expertise and basic coding skills.

Title:
340 - Comparing Top Large Language Models for Python Code Generation


Watch video 340 - Comparing Top Large Language Models for Python Code Generation online without registration, duration hours minute second in high quality. This video was added by user DigitalSreeni 31 July 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 2,895 once and liked it 100 people.