Hi everyone. Thanks for joining us. My name is Hannah Storey and I'm here with Chuck Tischner, one of our enterprise solutions architects who's going to talk through some AI model tuning tips with us today. We'll share our contact information at the end of the presentation so if you have any questions you can reach out to us and we look forward to chatting with you. Chuck over to you. Thanks Hannah. So today I'm going to show you how to tune your fabric semantic models in order to make them more efficient, more accurate and less expensive when being consumed from a fabric data agent. There's some really cool tuning steps that you can apply that will really make your agent much better and more accurate. The best benefit about that is that it's actually less expensive because it uses less tokens. So here we see right now is a lakehouse. It's a very simple sample lakehouse. It has one fact, which is sales. And then the sales are filtered by customer, date, product type, and region. So really simple model. This is just for demonstration purposes, just simulated data. And what we're going do is we're going to take this into a semantic model and we're going to tune one of those semantic models with the tuning tips that I'm going to talk to you about, and we're going to leave the other semantic model untuned and we're going to see what type of results we get from those agents. But it's standard stuff that you see here. If you want to click on the fact table, you'll see we have the standard information associated with facts. So it's product, customer, date, quantity, unit price, sales amount. The column names are the typical type of column names that you'll see out there in the wild, in real production systems. They're not really spelled nice like PRDKEY, CUSTKEY. It'd be better to have product, customer, etcetera. But this is the type of data that you're going to see in the real world, and this is why you have to tune your models to make them more usable from a semantic model basis for your data agents. The tuning really makes a significant difference. Simulated real world data, and I'm going show you the type of results that you get from your different agents when you tune and you don't tune. Here's the first agent that I'm going to show you, which is a tune untuned data agent. Just select this guy here. Untuned sales agent. So here we see the untuned sales agent. So this is one I just created a semantic model directly from the data lake, didn't apply any tuning to it. And the first thing we see is when we say, Hey, what were the total sales in February by region? We see that we're getting the exact same number for all the region. That's totally incorrect. As it turns out, seven and sixty thousand two hundred and forty one is actually the total of all sales for all regions. So right out of the gate, it's getting the wrong answer. This is probably because it can't interpret region correctly, maybe can't interpret date correctly because we didn't do any tuning. If you go over and look at the tune sales agent on the same amount, you get on the same query, what were the total sales for February by region, you get the correct results and each one is different. We can see that the North region had the most, right? And then if we come over here and we say, how many orders did we have in February? We see that we get the correct amount, which is four hundred. If we go back to the untuned sales agent, we see it says five hundred orders. That's actually the total number of orders. So that's an incorrect value too. So we come over and say, what region has the highest average order value? That's an even more complex question. Our train model does a great job. It says the East region has the highest average order value. It just answers the question right off. And then you can ask additional follow-up questions. And over here on the untrained agent, it just completely gets it wrong. This is just absolutely the wrong answer. Says that Southeast, North and West all have the same average order value, which is incorrect. Another question is here, what are the top three products by total sales in January? It said that these three items were all tied, which is incorrect because it says they all have the same sales, which is the total for all of our sales, which is an incorrect number. Once again, getting the wrong number because it thinks that everyone has the maximum number of sales. If we don't come back to our tuned agent, it actually gets the correct three products for January. And it actually provides the correct numbers next to it, right? So we see actual numbers here coming out. So I think you're getting the feel here for the untuned system. It just doesn't know enough. The agent doesn't know enough and because it doesn't know enough about the model or the information in the model, it returns incorrect results. How do we tune that? What do we do to get from this untuned sales agent that gets everything wrong to the tuned sales agent that gets everything right? The great thing about that is it's really easy. There's just a few rules that you have to remember in order to tune your semantic model. Once you've tuned your semantic model, your data agent will get correct results. It will return them much more quickly. Once again, it will use less tokens, so it will be faster. Very importantly, always go through a semantic model. Unless you have a very limited use case, you should use a semantic model within the context of your system rather than just connecting directly to your lakehouse. Fabric, they do allow you to define your data agents directly from your lakehouse. I would typically recommend to folks to use a semantic model. You can actually use both if you want to. If you still want to use the lakehouse and have that be part of your data that you put into your data agent, you can do that as well. Another very important aspect is you should keep your model very simple. The more dimensions and facts that you have as part of your data agent, the more chances for ambiguity, the more chances for confusion the agent will have. If you keep a very tight model with a limited number of facts and all the DIMM tables connected directly to the facts, not through multiple hierarchical relationships, you're going to get good results and you're going to get good performance. Try to avoid many to many relationships if you can or any other types of complex paths from your filtering into your fact tables. Then the first thing you want to do is you want to define all your relationships explicitly. I'll give you an example of this. We come over to our fabric model and we go over to our tuned semantic model, jump over there, we'll see that we have our relationships defined. We've got one, two, three, four relationships. All set up once again. They're all direct to the fact. And they're one to many relationships. Very simple. Make sure you have those all defined. Make sure you have your date table identified. Identify your DIMM date as your date table. That's important because then the system will know if you use that for all date filtering, any date related queries. You can see in our untuned data model, which is over Oh, okay, it's not visible right now, but those relationships are not defined. It has the same exact tables, but there's no relationships there. Tip would be to make sure your column names have clean, clearly understandable descriptive names. That's very important for the model. Very importantly, that's the column names that you're actually going to be using. So in the case of columns that you aren't going to be using, like for example, surrogate keys, FAR keys, you can just hide those. And that's another tip that we should have because those aren't going to be used by the agent. The agent is going to be using DAX to access these things. So you don't want it actually to do an explicit join. So you should hide all of those columns that aren't going to be used. And the rest of the columns give them really nice descriptive names. Here we've got customer name and then I gave that a description, which I'm highlighting here. Name of the customer associated with the transaction. Customer segment. Customer classification. So I even tell it what codes are in there. C equals consumer. Corp equals corporate. SMB equals small business. This is a lot of really good information in the semantic model that you could add for the data agent to consume and to understand it better. Over here, we've got order ID, we've got quantity, sales amount, unit price. Those weren't the original names, but we renamed them in order to give them nice names. We even gave it some tips price per single unit of product before quantity is applied, use with quantity to calculate sales amounts. Those are nice instructions for the agent to use. This is very helpful. This is the difference between successful operation of your agent and performant and cost effective operation of your agent and an agent that does not know what is going on and gives you bad results. What is the next tip here to follow? We talked about the column names. We talked about the column descriptions. I will show you very quickly how you can actually add the column descriptions. Just click on the particular table. I'm coming over here to quantity table. And you can come over here and you can add your information in here. Quantity, description. You can add formatting information. You can even put a display folder. In some cases, it makes sense to have a display folder so you have different types of calculations in different type of display folders. All that information is available to you right within the semantic editor and you can just come over here and change that. I'm in view mode right now. I can change that over to editing and then edit to your heart's content in here, change the name and all that information will be reflected and be available for your data agent. Going back to the tips, we talked about adding a data category and a data folder. These are just different ways to indicate what type of information is being used. If the data category is, okay, is this a region? Is this a date? Is this a numeric calculation that is associated with finance? You can provide that type of information. And then display folders allow you to organize the information functionally, in terms of your domain functionality within your organization. So this is important additional tips that your data agent will use in order to return the results. We already talked about hiding superfluous columns. Coming back to the model, we saw that a lot of those columns that were used to surrogate keys or foreign keys were hidden. You can just click on that and hide it here. So we've got hide from the report view. And this is this is things that you're gonna do for your semantic model typically anyway for your Power BI reports. And it's it's a good idea to do that because this helps get rid of any ambiguity that the model might have with respect to how it derives the results from this model. And it will also drive the model more towards using measures which are defined over here. You should always define measures. We'll talk about that in a second, as opposed to trying to do its own joining of the data and its own more complex operations. Going back to the tips, you should define measures and you should give all of your measures descriptions. Best practice is to put all your measures in a measure table. That's not strictly required, but it does make it easier for your AI to find the measures and understand where they are. So we have a measures table defined in our model. We're calling it key metrics. Then we have different calculations that we think are going to typically be used by users, by the agents in order to ask questions. This is a sales model, so folks are always going to be asking about total sales like, Hey, what were my total sales last month? What were my total sales last month compared to last year? They're also typically going ask about orders. Sales last month, how were my sales this year versus last month? Your average order value, that's a more complex I wanted to put that in there so you can add really more complex measures in here if you think your users are going to ask those questions or if you're finding that your users are asking those a lot and the agent isn't always returning a consistent value for them. You can actually put that in here as a measure. That's a way to tune your model over time. If you found out that folks are asking a lot of questions about different things but they're getting inconsistent results because the queriers are too complex or too ambiguous, you can add measures in here to help. Then the Fabricate agent will go to the measure to get that information as opposed to trying to figure it out itself. Then you can also include descriptive information in your prompt to send it towards that measure in order to get the correct results. So when you look at all the tips that we have here, each one is a little one. They take about five minutes each to do in your model. Not a lot of work here, but the results are very significant when you combine these all together. Once again, you get higher performance. Your agent is going to return results more quickly. You get more accurate results and less ambiguity, less hallucinations because your agent is able to find the right answer easily with all this descriptive information. The best thing about that, not only is it faster and more accurate, it's less expensive because it uses less tokens in order to get the result. Combining these all together really makes a significant difference. You should adhere to these different tuning tips when you are putting together your semantic models, especially if they are going to be consumed by fabric data agents. If you need any additional information, please feel free to contact Cano or myself. We are happy to help you tune your models and make your models the most efficient they can be when being used with Power BI or Fabric Data Agents. Thanks for everything, and we'll talk to you soon.