Blog - Galileo AI

AI Data Flywheel diagram showing the continuous improvement cycle for agentic AI: Step 1 (Dataset Curation) with NeMo Curator and Galileo's analysis tools, Step 2 (Model Customization) with NeMo Customizer, Stage 3 (Model Evaluation) with NeMo Evaluator and Galileo metrics, Step 4 (Guardrailing) with NeMo Guardrails and Galileo Protect, and Step 5 (Deploy Custom Model) with NVIDIA NIM integration. The diagram includes performance metrics showing improvements in context adherence and tool selection quality.

April 23 2025

A Powerful Data Flywheel for De-Risking Agentic AI

Discover how Galileo and NVIDIA NeMo microservices created a powerful data flywheel that enabled Cisco to deploy reliable AI agents with 10x lower latency metrics and 40% higher accuracy for business-critical operations.

Yash ShethCOO

AI Agents AI Evaluation

Recent Posts

AI enabling information symmetry in enterprises

April 26 2025

The Role of AI in Achieving Information Symmetry in Enterprises

Discover how AI is creating information symmetry in enterprises by breaking down silos, democratizing data access, and transforming organizational decision-making.

Podcast

April 23 2025

Build your own ACP-Compatible Weather DJ Agent.

Learn how to create your own intelligent, multi-API AI agent using the Agent Connect Protocol (ACP). This tutorial walks you through building a weather-predicting, fashion-recommending, mood-setting assistant with real-time evaluation powered by Galileo.

AI AgentsTutorialAI Evaluation

April 09 2025

Webinar – The Future of AI Agents: How Standards and Evaluation Drive Innovation

Join Galileo and Cisco to explore the infrastructure needed to build reliable, interoperable multi-agent systems, including an open, standardized framework for agent-to-agent collaboration.

WebinarsAI AgentsAI Evaluation

Enterprise AI trust and transparency concept

April 02 2025

Building Trust and Transparency in Enterprise AI

Explore strategies for building trust and transparency in enterprise AI, including regulated industries. Build safer AI.

Podcast

March 12 2025

Webinar – Evaluation Agents: Exploring the Next Frontier of GenAI Evals

Learn the next evolution of automated AI evaluations – Evaluation Agents.

WebinarsAI AgentsAI Evaluation

AI and programming modernizing legacy apps

March 12 2025

The Role of AI and Modern Programming Languages in Transforming Legacy Applications

Discover how AI and modern programming languages like Rust and Golang transform legacy applications, reduce technical debt, and drive innovation in today's competitive tech landscape.

Podcast

March 06 2025

AGNTCY: Building the Future of Multi-Agentic Systems

AGNTCY brings together industry leaders to create open standards for multi-agentic systems. We're addressing the lack of standardization, trust, and infrastructure to build a future where AI agents can seamlessly discover, compose, deploy, and evaluate each other's capabilities at scale.

AI AgentsCompany News

February 25 2025

Understanding and Evaluating AI Agentic Systems

Learn how AI agentic systems enhance automation through autonomous decision-making. Learn key evaluation strategies, task completion metrics, error management, and Galileo’s approach to AI performance monitoring.

Podcast

Multi-Domain Tool Calling Evaluation of Latest LLMs

February 12 2025

Introducing Our Agent Leaderboard on Hugging Face

We built this leaderboard to answer one simple question: "How do AI agents perform in real-world agentic scenarios?"

AI AgentsLLMsAI Evaluation

Multimodal AI and Google Gemini insights

February 12 2025

Unlocking the Power of Multimodal AI and Insights from Google’s Gemini Models

Discover how Google's Gemini models unlock the power of multimodal AI—combining text, images, audio, and video—to create smarter, more intuitive applications across industries.

Podcast

Bring together the benefits of human evaluations and LLM-as-a-Judge with continuous learning with human feedback (CLHF).

February 11 2025

Introducing Continuous Learning with Human Feedback: Adaptive Metrics that Improve with Expert Review

Continuous Learning with Human Feedback combines the scalability of automated LLM-as-a-Judge evaluations with the precision of human insights— it's a breakthrough workflow that enables automated prompt-tuning of evaluation metrics for as much as 30% improvements in accuracy with as few as 2-5 labeled records.

Product

February 04 2025

Mastering Agents: Build And Evaluate A Deep Research Agent with o3 and 4o

A step-by-step guide for evaluating smart agents

AI EvaluationAI AgentsLLMs

Team collaborating on AI with focus on psychological safety

January 29 2025

Building Psychological Safety in AI Development

Explore how AI is reshaping developer collaboration by enhancing psychological safety, boosting inclusion, and empowering diverse teams through transparent, open-source solutions.

Podcast

January 23 2025

Introducing Agentic Evaluations

Everything developers need to build, ship, and scale best-in-class AI agents.

ProductAI Agents

January 22 2025

Webinar – Lifting the Lid on AI Agents: Exposing Performance Through Evals

Learn how to improve AI agent performance through structured evaluations, including how to evaluate tool selection, common pitfalls, and how to optimize agentic decision-making.

WebinarsAI Agents

January 15 2025

Unlocking the Future of Software Development: The Transformative Power of AI Agents

Join Conor Bronsdon as he chats with Galileo co-founders Yash Sheth (COO) and Atindriyo Sanyal (CTO) about major trends to look for this year. These include AI finding its product "tool stack" fit, generation latency decreasing, AI agents, their potential to revolutionize code generation and other industries, and the crucial role of robust evaluation tools in ensuring the responsible and effective deployment of these agents.

Podcast

January 09 2025

Human-in-the-Loop Strategies for AI Agents

Effective human assistance in AI agents

AI AgentsLLMs

January 08 2025

Navigating the Future of Data Management with AI-Driven Feedback Loops

"This is the time. This is the time to start building... I can't say that often enough. This is the time." - Bob van Luijt Join Bob van Luijt, CEO and co-founder of Weaviate as he sits down with our host Conor Bronson for the Season 2 premiere of Chain of Thought. Together, they explore the ever-evolving world of AI infrastructure and the evolution of Retrieval-Augmented Generation (RAG) architecture.

Podcast

December 20 2024

Agents, Assemble: A Field Guide to AI Agents

Whether you’re diving into the world of autonomous agents for the first time or just need a quick refresher, this blog breaks down the different levels of AI agents, their use cases, and the workflow running under the hood.

AI Agents

December 18 2024

How AI Agents are Revolutionizing Human Interaction

Discover how AI assistants function as "async junior digital employees," taking on specific tasks and contributing to the organizational structure

Podcast

December 18 2024

Mastering Agents: Evaluating AI Agents

Top research benchmarks for evaluating agent performance for planning, tool calling and persuasion.

AI AgentsAI EvaluationLLMs

December 11 2024

Deploying Generative AI at Enterprise Scale: Navigating Challenges and Unlocking Potential

Explore the challenges and opportunities of deploying GenAI at enterprise scale in a conversation that's a wake-up call for any business leader looking to harness the power of AI.

Podcast

December 10 2024

Measuring What Matters: A CTO’s Guide to LLM Chatbot Performance

Learn to bridge the gap between AI capabilities and business outcomes

AI EvaluationChatbotsAI AgentsLLMs

December 04 2024

Webinar - Beyond Text: Multimodal AI Evaluations

Learn the key concepts behind multimodal AI evaluation, why multimodality is more challenging than text-based evaluations, and what to consider in your evaluation framework.

Webinars

December 04 2024

Evaluating Generative AI: Overcoming Challenges in a Complex Landscape

As AI agents and multimodal models become more prevalent, understanding how to evaluate GenAI is no longer optional – it's essential. Generative AI introduces new complexities in assessment compared to traditional software, and this week on Chain of Thought we’re joined by Chip Huyen (Storyteller, Tép Studio), Vivienne Zhang (Senior Product Manager, Generative AI Software, Nvidia) for a discussion on AI evaluation best practices

Podcast

December 03 2024

Metrics for Evaluating LLM Chatbot Agents - Part 2

A comprehensive guide to metrics for GenAI chatbot agents

ChatbotsLLMsAI EvaluationAI Agents

November 27 2024

Measuring AI ROI and Achieving Efficiency Gains: Insights from Industry Experts

The “ROI of AI” has been marketed as a panacea, a near-magical solution to all business problems. Following that promise, many companies have invested heavily in AI over the past year and are now asking themselves, “What is the return on my AI investment?” This week on Chain of Thought, Galileo’s CEO, Vikram Chatterji joins Conor Bronsdon to discuss AI's value proposition, from the initial hype to the current search for tangible returns, offering insights into how businesses can identify the right AI use cases to maximize their investment.

Podcast

November 27 2024

Metrics for Evaluating LLM Chatbot Agents - Part 1

A comprehensive guide to metrics for GenAI chatbot agents

LLMsAI EvaluationChatbotsAI Agents

November 21 2024

Strategies for Engineering Leaders to Navigate AI Challenges

Discover strategies for engineering leaders to successfully navigate AI challenges, balance business pressures, and implement effective AI adoption frameworks.

Podcast

November 20 2024

Governance, Trustworthiness, and Production-Grade AI: Building the Future of Trustworthy Artificial Intelligence

Will 2025 be the year open-source LLMs catch up with their closed-source rivals? Will an established set of best practices for evaluating AI emerge? This week on Chain of Thought, we break out the crystal ball and give our biggest AI predictions for 2025

Podcast

November 13 2024

Introduction to Agent Development Challenges and Innovations

AI agents have quickly emerged as the next ‘hot thing’ in AI, but what constitutes an AI agent and do they live up to the hype?

Podcast

November 11 2024

Mastering Agents: Metrics for Evaluating AI Agents

Identify issues quickly and improve agent performance with powerful metrics

AI AgentsAI Evaluation

November 06 2024

Navigating the Complex Landscape of AI Regulation and Trust

From ChatGPT's search engine to Google's AI-powered code generation, artificial intelligence is transforming how we build and deploy technology. In this inaugural episode of Chain of Thought, the co-founders of Galileo explore the state of AI, from open-source models to establishing trust in enterprise applications. Plus, tune in for a segment on the impact of the Presidential election on AI regulation. The episode culminates with an interview of May Habib, CEO of Writer, who shares practical insights on implementing generative AI at scale.

Podcast

November 04 2024

Meet Galileo at AWS re:Invent

Join us at AWS re:Invent to see the latest in AI evaluation intelligence and learn from leading GenAI experts!

Company News

October 24 2024

Tricks to Improve LLM-as-a-Judge

Unlock the potential of LLM Judges with fundamental techniques

AI EvaluationLLMs

October 22 2024

Best Practices For Creating Your LLM-as-a-Judge

Master the art of building your AI evaluators using LLMs

AI Evaluation

October 21 2024

Confidently Ship AI Applications with Databricks and Galileo

Galileo’s native integrations with Databricks makes it simple for enterprises to use Databricks models for LLM evaluation and programmatically build training and evaluation datasets for model improvement.

Integrations

October 16 2024

LLM-as-a-Judge vs Human Evaluation

Understand the tradeoffs between LLMs and humans for generative AI evaluation

AI Evaluation

Announcing Galileo's $45 million Series B fundraising round led by Scale Ventures with Premji Invest, Citi Ventures, Sentinel One, Walden Capital, Factory, Battery Ventures, Amex Ventures, Databricks Ventures and ServiceNow.

October 15 2024

Announcing our Series B, Evaluation Intelligence Platform

Galileo secures $45M in Series B funding to boost its Evaluation Intelligence Platform, driving AI accuracy and trust for enterprise teams. With backing from leading investors, Galileo is transforming how companies like HP and Twilio evaluate AI performance.

Company NewsProduct

October 14 2024

State of AI 2024: Business, Investment & Regulation Insights

Industry report on how generative AI is transforming the world.

LLMs

LLM Ops: Views from Across the Stack - Join CxOs from leading LLMOps providers to hear their perspectives on the emerging enterprise GenAI stack, from orchestration to evaluation, inference, retrieval and more.

October 09 2024

LLMOps Insights: Evolving GenAI Stack

Explore insights from industry leaders on the evolving GenAI stack at Galileo's GenAI Productionize conference. Learn how enterprises are adopting LLMOps, optimizing costs, fine-tuning models, and improving data quality to harness the power of generative AI. Discover key trends and strategies for integrating GenAI into your organization.

Data Quality

October 09 2024

Help improve Galileo GenAI Studio

Win prizes while driving the future roadmap of GenAI Studio. Sign up now!

Product

Webinar - How Clearwater Analytics Creates Agentic Systems with SLMs

September 19 2024

Webinar - How To Create Agentic Systems with SLMs

Learn how Clearwater Analytics, the leading SaaS-based investment accounting and analytics solution, built and deployed a customer-facing, multi-agent system using fine-tuned SLMs.

Webinars

September 17 2024

Mastering Agents: Why Most AI Agents Fail & How to Fix Them

Understand the most common issues with AI agents in production.

AI AgentsLLMs

September 10 2024

Mastering Data: Generate Synthetic Data for RAG in Just $10

Learn to create and filter synthetic data with ChainPoll for building evaluation and training dataset

AI EvaluationRAGLLMsData Quality

Select the best framework for building intelligent AI Agents

September 05 2024

Mastering Agents: LangGraph Vs Autogen Vs Crew AI

Select the best framework for building intelligent AI Agents

AI AgentsLLMs

How to Integrate IBM watsonx with Galileo to Evaluate your LLM Applications

August 14 2024

Integrate IBM Watsonx with Galileo for LLM Evaluation

See how easy it is to leverage Galileo's platform alongside the IBM watsonx SDK to measure RAG performance, detect hallucinations, and quickly iterate through numerous prompts and LLMs.

AI Evaluation

Learn the intricacies of evaluating LLMs for RAG - Datasets, Metrics & Benchmarks

August 13 2024

Mastering RAG: How To Evaluate LLMs For RAG

Learn the intricacies of evaluating LLMs for RAG - Datasets, Metrics & Benchmarks

RAGAI EvaluationLLMs

August 07 2024

Generative AI and LLM Insights: August 2024

While many teams have been building LLM applications for over a year now, there is still much to learn about RAG and all types of hallucinations. Check out our roundup of the top generative AI and LLM articles for August 2024.

LLMs

August 07 2024

Webinar - How To Productionize Agentic Applications

Learn how to build scalable, reliable, highly personalized agentic solutions, including best practices for bringing agentic solutions to production.

Webinars

August 06 2024

Best LLMs for RAG: Top Open And Closed Source Models

Top Open And Closed Source LLMs For Short, Medium and Long Context RAG

AI EvaluationRAGHallucinationsLLMs

July 29 2024

LLM Hallucination Index: RAG Special

The LLM Hallucination Index ranks 22 of the leading models based on their performance in real-world scenarios. We hope this index helps AI builders make informed decisions about which LLM is best suited for their particular use case and need.

HallucinationsAI EvaluationCompany News

July 15 2024

HP + Galileo Partner to Accelerate Trustworthy AI

Galileo and HP partner to enable faster and safer deployment of AI-powered applications.

Company News

June 25 2024

Survey of Hallucinations in Multimodal Models

An exploration of type of hallucinations in multimodal models and ways to mitigate them.

HallucinationsLLMsAI Evaluation

Learn to do robust evaluation and beat the current SoTA approaches

June 18 2024

Addressing GenAI Evaluation Challenges: Cost & Accuracy

Learn to do robust evaluation and beat the current SoTA approaches

LLMsAI Evaluation

Galileo Luna: Breakthrough in LLM Evaluation, Beating GPT-3.5 and RAGAS

June 11 2024

Galileo Luna: Advancing LLM Evaluation Beyond GPT-3.5

Research backed evaluation foundation models for enterprise scale

AI EvaluationHallucinationsRAGLLMs

Introducing Galileo Luna – the new standard for enterprise GenAI evaluations

June 06 2024

Meet Galileo Luna: Evaluation Foundation Models

Low latency, low cost, high accuracy GenAI evaluation is finally here. No more ask-GPT and painstaking vibe checks.

AI Evaluation

June 03 2024

Webinar - The Future of Enterprise GenAI Evaluations

Evaluations are critical for enterprise GenAI development and deployment. Despite this, many teams still rely on 'vibe checks' and manual human evaluation. To productionize trustworthy AI, teams need to rethink how they evaluate their solutions.

AI Evaluation

May 22 2024

Meet Galileo at Databricks Data + AI Summit

Join us at Databricks Data+AI Summit to see the latest innovations at the convergence of data and AI and learn from leading GenAI experts!

Company News

May 01 2024

Webinar – Galileo Protect: Real-Time Hallucination Firewall

We’re excited to announce Galileo Protect – an advanced GenAI firewall that intercepts hallucinations, prompt attacks, security threats, and more in real-time! Register for our upcoming webinar to see Protect live in action.

HallucinationsWebinarsProduct

May 01 2024

Introducing Protect: Real-Time Hallucination Firewall

We're thrilled to unveil Galileo Protect, an advanced GenAI firewall solution that intercepts hallucinations, prompt attacks, security threats, and more in real-time.

HallucinationsProduct

May 01 2024

Generative AI and LLM Insights: May 2024

The AI landscape is exploding in size, with some early winners emerging, but RAG reigns supreme for enterprise LLM systems. Check out our roundup of the top generative AI and LLM articles for May 2024.

LLMs

April 25 2024

Practical Tips for GenAI System Evaluation

It’s time to put the science back in data science! Craig Wiley, Sr Dir of AI at Databricks, joined us at GenAI Productionize 2024 to share practical tips and frameworks for evaluating and improving generative AI. Read key takeaways from his session.

AI EvaluationWebinars

April 25 2024

Is Llama 3 better than GPT4?

Llama 3 insights from the leaderboards and experts

LLMsAI Evaluation

April 17 2024

Enough Strategy, Let's Build: How to Productionize GenAI

At GenAI Productionize 2024, expert practitioners shared their own experiences and mistakes to offer tools and techniques for deploying GenAI at enterprise scale. Read key takeaways from the session on how to productionize generative AI.

Data QualityAI Evaluation

April 08 2024

The Enterprise AI Adoption Journey

2024 has been a landmark year for generative AI, with enterprises going from experimental proofs of concept to production use cases. At GenAI Productionize 2024, our enterprise executive panel shared lessons learned along their AI adoption journeys.

Webinars

April 05 2024

Mastering RAG: How To Observe Your RAG Post-Deployment

Learn to setup a robust observability solution for RAG in production

RAGLLMsAI Evaluation

April 03 2024

Generative AI and LLM Insights: April 2024

Smaller LLMs can be better (if they have a good education), but if you’re trying to build AGI you better go big on infrastructure! Check out our roundup of the top generative AI and LLM articles for April 2024.

LLMs

Reduce hallucinations drastically in RAG with self reflection and finetuning

April 01 2024

Mastering RAG: Adaptive & Corrective Self RAFT

A technique to reduce hallucinations drastically in RAG with self reflection and finetuning

RAGLLMsHallucinations

March 29 2024

GenAI at Enterprise Scale

Join Ya Xu, Head of Data and AI at LinkedIn, to learn the technologies, frameworks, and organizational strategies she uses to scale GenAI at LinkedIn.

Webinars

March 28 2024

Mastering RAG: Choosing the Perfect Vector Database

Master the art of selecting vector database based on various factors

RAGLLMs

March 21 2024

Mastering RAG: How to Select A Reranking Model

Choosing the best reranking model for your RAG-based QA system can be tricky. This blog post simplifies RAG reranking model selection, helping you pick the right one to optimize your system's performance.

RAG

March 08 2024

Generative AI and LLM Insights: March 2024

Stay ahead of the AI curve! Our February roundup covers: Air Canada's AI woes, RAG failures, climate tech & AI, fine-tuning LLMs, and synthetic data generation. Don't miss out!

LLMs

March 05 2024

Mastering RAG: How to Select an Embedding Model

Unsure of which embedding model to choose for your Retrieval-Augmented Generation (RAG) system? This blog post dives into the various options available, helping you select the best fit for your specific needs and maximize RAG performance.

RAGAI Evaluation

February 23 2024

Mastering RAG: Advanced Chunking Techniques for LLM Applications

Learn advanced chunking techniques tailored for Language Model (LLM) applications with our guide on Mastering RAG. Elevate your projects by mastering efficient chunking methods to enhance information processing and generation capabilities.

HallucinationsRAG

Mastering RAG: Improve RAG Performance With 4 Powerful RAG Metrics

February 15 2024

Mastering RAG: 4 Metrics to Improve Performance

Unlock the potential of RAG analysis with 4 essential metrics to enhance performance and decision-making. Learn how to master RAG methodology for greater effectiveness in project management and strategic planning.

RAGHallucinationsAI Evaluation

February 06 2024

Introducing RAG & Agent Analytics

Introducing a powerful set of workflows and research-backed evaluation metrics to evaluate and optimize RAG systems.

RAG

February 01 2024

Generative AI and LLM Insights: February 2024

February's AI roundup: Pinterest's ML evolution, NeurIPS 2023 insights, understanding LLM self-attention, cost-effective multi-model alternatives, essential LLM courses, and a safety-focused open dataset catalog. Stay informed in the world of Gen AI!

LLMs

Webinar - Fix Hallucinations in RAG Systems with Pinecone and Galileo

January 29 2024

Fixing RAG System Hallucinations with Pinecone & Galileo

Watch our webinar with Pinecone on optimizing RAG & chain-based GenAI! Learn strategies to combat hallucinations, leverage vector databases, and enhance RAG analytics for efficient debugging.

WebinarsHallucinationsRAG

January 23 2024

Mastering RAG: How To Architect An Enterprise RAG System

Explore the nuances of crafting an Enterprise RAG System in our blog, "Mastering RAG: Architecting Success." We break down key components to provide users with a solid starting point, fostering clarity and understanding among RAG builders.

RAGLLMs

Galileo on Google Cloud accelerates evaluating and observing generative AI applications.

January 22 2024

Galileo & Google Cloud: Evaluating GenAI Applications

Galileo on Google Cloud accelerates evaluating and observing generative AI applications.

LLMs

Mastering RAG: LLM Prompting Techniques For Reducing Hallucinations

January 04 2024

RAG LLM Prompting Techniques to Reduce Hallucinations

Dive into our blog for advanced strategies like ThoT, CoN, and CoVe to minimize hallucinations in RAG applications. Explore emotional prompts and ExpertPrompting to enhance LLM performance. Stay ahead in the dynamic RAG landscape with reliable insights for precise language models. Read now for a deep dive into refining LLMs.

HallucinationsRAGLLMs

Prepare for the impact of the EU AI Act with our guide

December 21 2023

Ready for Regulation: Preparing for the EU AI Act

Prepare for the impact of the EU AI Act with our actionable guide. Explore risk categories, conformity assessments, and consequences of non-compliance. Learn practical steps and leverage Galileo's tools for AI compliance. Ensure your systems align with regulatory standards.

AI Regulations

Learn to deploy trustworthy RAG applications with confidence

December 18 2023

Mastering RAG: 8 Scenarios To Evaluate Before Going To Production

Learn how to Master RAG. Delve deep into 8 scenarios that are essential for testing before going to production.

RAGLLMsHallucinations

November 15 2023

Introducing the Hallucination Index

The Hallucination Index provides a comprehensive evaluation of 11 leading LLMs' propensity to hallucinate during common generative AI tasks.

HallucinationsLLMs

November 08 2023

15 Key Takeaways From OpenAI Dev Day

Galileo's key takeaway's from the 2023 Open AI Dev Day, covering new product releases, upgrades, pricing changes and many more!

LLMs

5 Key Takeaways From President Biden’s Executive Order for Trustworthy AI

November 02 2023

5 Key Takeaways from Biden's AI Executive Order

Explore the transformative impact of President Biden's Executive Order on AI, focusing on safety, privacy, and innovation. Discover key takeaways, including the need for robust Red-teaming processes, transparent safety test sharing, and privacy-preserving techniques.

LLMsHallucinationsAI Regulations

ChainPoll: A High Efficacy Method for LLM Hallucination Detection

October 26 2023

Introducing ChainPoll: Enhancing LLM Evaluation

ChainPoll: A High Efficacy Method for LLM Hallucination Detection. ChainPoll leverages Chaining and Polling or Ensembling to help teams better detect LLM hallucinations. Read more at rungalileo.io/blog/chainpoll.

AI EvaluationHallucinationsLLMs

Deeplearning.ai + Galileo - Mitigating LLM Hallucinations

October 26 2023

Webinar: Mitigating LLM Hallucinations with Deeplearning.ai

Join in on this workshop where we will showcase some powerful metrics to evaluate the quality of the inputs (data quality, RAG context quality, etc) and outputs (hallucinations) with a focus on both RAG and fine-tuning use cases.

LLMsHallucinations

October 20 2023

Galileo x Zilliz: The Power of Vector Embeddings

RAG

Enhance LLM Performance: Comprehensive Insights into RAG, Fine-Tuning, and Synergies

October 10 2023

Optimizing LLM Performance: RAG vs. Fine-Tuning

A comprehensive guide to retrieval-augmented generation (RAG), fine-tuning, and their combined strategies in Large Language Models (LLMs).

LLMsHallucinationsRAG

October 04 2023

Webinar: Announcing Galileo LLM Studio

Webinar - Announcing Galileo LLM Studio: A Smarter Way to Build LLM Applications

WebinarsHallucinationsAI EvaluationProduct

Learn about how to identify and detect LLM hallucinations

October 02 2023

A Framework to Detect & Reduce LLM Hallucinations

Learn about how to identify and detect LLM hallucinations

LLMsHallucinations

September 19 2023

Announcing LLM Studio: A Smarter Way to Build LLM Applications

LLM Studio helps you develop and evaluate LLM apps in hours instead of days.

ProductAI Evaluation

Learn about different types of LLM evaluation metrics

September 19 2023

A Metrics-First Approach to LLM Evaluation

Learn about different types of LLM evaluation metrics needed for generative applications

LLMsHallucinationsAI Evaluation

August 24 2023

5 Techniques for Detecting LLM Hallucinations

A survey of hallucination detection techniques

LLMsHallucinations

July 09 2023

Understanding LLM Hallucinations Across Generative Tasks

The creation of human-like text with Natural Language Generation (NLG) has improved recently because of advancements in Transformer-based language models. This has made the text produced by NLG helpful for creating summaries, generating dialogue, or transforming data into text. However, there is a problem: these deep learning systems sometimes make up or "hallucinate" text that was not intended, which can lead to worse performance and disappoint users in real-world situations.

LLMsHallucinations

Galileo LLM Studio enables Pineonce users to identify and visualize the right context to add powered by evaluation metrics such as the hallucination score, so you can power your LLM apps with the right context while engineering your prompts, or for your LLMs in production

June 26 2023

Pinecone + Galileo = get the right context for your prompts

Galileo LLM Studio enables Pineonce users to identify and visualize the right context to add powered by evaluation metrics such as the hallucination score, so you can power your LLM apps with the right context while engineering your prompts, or for your LLMs in production

LLMs

Data Error Potential -- Quantifiably identify the data your models struggle with

April 18 2023

Introducing Data Error Potential (DEP) Metric

The Data Error Potential (DEP) is a 0 to 1 score that provides a tool to very quickly sort and bubble up data that is most difficult and worthwhile to explore when digging into your model’s errors. Since DEP is task agnostic, it provides a strong metric to guide exploration of model failure modes.

Data Quality

March 26 2023

LabelStudio + Galileo: Fix your ML data quality 10x faster

Galileo integrates deeply with Label Studio to help data scientists debug and fix their training data 10x faster.

Data QualityIntegrations

Galileo Console surfacing errors on ImageNet

March 20 2023

ImageNet Data Errors Discovered Instantly using Galileo

Using Galileo you can surface labeling errors and model errors on the most popular dataset in computer vision. Explore the various error type and simple visualization tools to find troublesome data points.

Data Quality

February 14 2023

Webinar - Unpacking The State of Data Quality in Machine Learning

Unpack the findings of our State of Machine Learning Data Quality Report. We have surveyed 500 experienced data professionals to learn what types of data they work with, what data errors they encounter, and what technologies they use.

Data Quality

February 14 2023

Free ML Workshop: Build Higher Quality Models

Learn how to instantly resolve data errors using Galileo. Galileo Machine Learning Data Quality Intelligence enables ML Practitioners to resolve data errors.

Data Quality

NLP: Huggingface Transformers NER, understanding BERT with Galileo

February 02 2023

Understanding BERT with Huggingface Transformers NER

HuggingFace has proved to be one of the leading hubs for NLP-based models and datasets powering so many applications today. But in the case of NER, as with any other NLP task, the quality of your data can impact how well (or poorly) your models perform during training and post-production.

Data Quality

December 29 2022

Building High-Quality Models Using High Quality Data at Scale

One neglected aspect of building high-quality models is that it depends on one crucial entity: high quality data. Good quality data in ML is the most significant impediment to seamless ML adoption across the enterprise.

Data Quality

Putting a high-quality Machine Learning (ML) model into production can take weeks, months, or even quarters. Learn how ML teams are now working to solve these bottlenecks.

December 20 2022

How to Scale your ML Team’s Impact

Putting a high-quality Machine Learning (ML) model into production can take weeks, months, or even quarters. Learn how ML teams are now working to solve these bottlenecks.

Data Quality

December 08 2022

Fixing Your ML Data Blindspots

When working on machine learning (ML) projects, the challenges are usually centered around datasets in both the pre-training and post-training phases, with their own respective issues that need to be addressed. Learn about different ML data blind spots and how to address these issues.

Data Quality

December 08 2022

How We Scaled Data Quality at Galileo

At Galileo, we had a simple goal: enable machine learning engineers to easily and quickly surface critical issues in their datasets. This data-centric approach to model development made sense to us but came with a unique challenge that other model-centric ML tools did not face: data is big. Really big. While other tracking tools were logging *meta-*data information such as hyper-parameters, weights, and biases, we were logging embedding and probability vectors, sometimes in the near thousand-dimension space, and sometimes many per input.

Data Quality

November 27 2022

Being 'Data-Centric' is the Future of Machine Learning

Machine Learning is advancing quickly but what is changing? Learn what the state of ML is today, what being data-centric means, and what the future of ML is turning into.

Data Quality

October 03 2022

4 Types of ML Data Errors You Can Fix Right Now ⚡️

In this article, Galileo founding engineer Nikita Demir discusses common data errors that NLP teams run into, and how Galileo helps fix these errors in minutes, with a few lines of code.

Data Quality

5 Principles You Need To Know About Continuous ML Data Intelligence

September 20 2022

5 Principles of Continuous ML Data Intelligence

Build better models, faster, with better data. We will dive into what is ML data intelligence, and it's 5 principles you can use today.

Data Quality

September 08 2022

“ML Data” : The past, present and future

Data is critical for ML. But it wasn't always this way. Learn about how focusing on ML Data quality came to become the central figure for the best ML teams today.

Data Quality

June 07 2022

🔭 Improving Your ML Datasets, Part 2: NER

We used Galileo on the popular MIT dataset with a NER task, to find data errors fast, fix them, get meaningful gains within minutes, and made the fixed dataset available for use.

Data Quality

May 27 2022

🔭 What is NER And Why It’s Hard to Get Right

In this post, we discuss the Named Entity Recognition (NER) task, why it is an important component of various NLP pipelines, and why it is particularly challenging to improve NER models.

Data Quality

May 23 2022

🔭 Improving Your ML Datasets With Galileo, Part 1

We used Galileo on the popular Newsgroups dataset to find data errors fast, fix them, get meaningful gains within minutes, and made the fixed dataset available publicly for use.

Data Quality

May 03 2022

Introducing ML Data Intelligence For Unstructured Data

“The data I work with is always clean, error free, with no hidden biases” said no one that has ever worked on training and productionizing ML models. Learn what ML data Intelligence is and how Galileo can help with your unstructured data.

Data Quality

A Powerful Data Flywheel for De-Risking Agentic AI

Recent Posts

Understanding LLM Observability: Best Practices and Tools

The Role of AI in Achieving Information Symmetry in Enterprises

Build your own ACP-Compatible Weather DJ Agent.

A Powerful Data Flywheel for De-Risking Agentic AI

Navigating the Hype of Agentic AI With Insights from Experts

Ultimate Guide to Specification-First AI Development

Adapting Test-Driven Development for Building Reliable AI Systems

9 Strategies to Ensure Stability in Dynamic Multi-Agent Interactions

Comparing Collaborative and Competitive Multi-Agent Systems

The 7-Step Framework for Effective AI Governance

Best Practices to Navigate the Complexities of Evaluating AI Agents

AI Observability: A Complete Guide to Monitoring Model Performance in Production

Threat Modeling for Multi-Agent AI: Identifying Systemic Risks

Mastering Continuous Integration for AI

Mastering Continuous Integration (CI) Fundamentals for AI

A Guide to Measuring Communication Efficiency in Multi-Agent AI Systems

Webinar – The Future of AI Agents: How Standards and Evaluation Drive Innovation

How to Detect Coordinated Attacks in Multi-Agent AI Systems

How to Detect and Prevent Malicious Agent Behavior in Multi-Agent Systems

Centralized vs Distributed Multi-Agent AI Coordination Strategies

5 Key Strategies to Prevent Data Corruption in Multi-Agent AI Workflows

9 LLM Summarization Strategies to Maximize AI Output Quality

4 Advanced Cross-Validation Techniques for Optimizing Large Language Models

Enhancing Recommender Systems with Large Language Model Reasoning Graphs

MoverScore in AI: A Semantic Evaluation Metric for AI-Generated Text

Comprehensive AI Evaluation: A Step-By-Step Approach to Maximize AI Potential

Building Trust and Transparency in Enterprise AI

A Complete Guide to LLM Evaluation For Enterprise AI Success

Real-Time vs. Batch Monitoring for LLMs

7 Categories of LLM Benchmarks for Evaluating AI Beyond Conventional Metrics

Evaluating AI Models: Understanding the Character Error Rate (CER) Metric

Measuring Agent Effectiveness in Multi-Agent Workflows

Benchmarks and Use Cases for Multi-Agent AI

Evaluating AI Applications: Understanding the Semantic Textual Similarity (STS) Metric

The Ultimate Guide to AI Agent Architecture

7 Key LLM Metrics to Enhance AI Reliability | Galileo

Self-Evaluation in AI Agents: Enhancing Performance Through Reasoning and Reflection

Effective LLM Monitoring: A Step-By-Step Process for AI Reliability and Compliance

Agentic RAG Systems: Integration of Retrieval and Generation in AI Architectures

BERTScore in AI: Transforming Semantic Text Evaluation and Quality

Understanding the G-Eval Metric for AI Model Monitoring and Evaluation

Retrieval Augmented Fine-Tuning: Adapting LLM for Domain-Specific RAG Excellence

Truthful AI: Reliable Question-Answering for Enterprise

Enhancing AI Evaluation and Compliance With the Cohen's Kappa Metric

Understanding the Mean Average Precision (MAP) Metric

Webinar – Evaluation Agents: Exploring the Next Frontier of GenAI Evals

Mastering Dynamic Environment Performance Testing for AI Agents

Choosing the Right AI Agent Architecture: Single vs Multi-Agent Systems

Explaining RAG Architecture: A Deep Dive into Components | Galileo.ai

The Role of AI and Modern Programming Languages in Transforming Legacy Applications

Exploring Llama 3 Models: A Deep Dive

AI Agent Evaluation: Methods, Challenges, and Best Practices

Qualitative vs Quantitative LLM Evaluation: Which Approach Best Fits Your Needs?

Unlocking Success: How to Assess Multi-Domain AI Agents Accurately

AUC-ROC for Effective AI Model Evaluation: From Theory to Production Metrics

The Mean Reciprocal Rank Metric: Practical Steps for Accurate AI Evaluation

7 Essential Skills for Building AI Agents

Practical AI: Leveraging AI for Strategic Business Value

F1 Score: Balancing Precision and Recall in AI Evaluation

Functional Correctness in Modern AI: What It Is and Why It Matters

Understanding Human Evaluation Metrics in AI: What They Are and How They Work

Evaluating AI Text Summarization: Understanding the ROUGE Metric

Enhancing AI Models: Understanding the Word Error Rate Metric

6 Data Processing Steps for RAG: Precision and Performance

Optimizing AI Reliability with Galileo’s Prompt Perplexity Metric

Expert Techniques to Boost RAG Optimization in AI Applications

Mastering Multimodal AI Models: Advanced Strategies for Model Performance and Security

AGNTCY: Building the Future of Multi-Agentic Systems

What is the Cost of Training LLM Models? A Comprehensive Guide for AI Professionals

Ethical Challenges in Retrieval-Augmented Generation (RAG) Systems

Enhancing AI Accuracy: Understanding Galileo's Correctness Metric

Multi-Agent AI Success: Performance Metrics and Evaluation Frameworks

Agent Evaluation Systems: A Complete Guide for AI Teams

Multi-Agent Decision-Making: Threats and Mitigation Strategies

A Guide to Galileo's Instruction Adherence Metric

Understanding and Evaluating AI Agentic Systems

9 Accuracy Metrics to Evaluate AI Model Performance

BLEU Metric: Evaluating AI Models and Machine Translation Accuracy