AI Engineering May 2026 11 min read

How We Integrate Azure AI Foundry into a .NET Application

A practical, production-grade guide to integrating Azure AI Foundry (Azure OpenAI) into .NET with Terraform, managed identity, structured JSON output, retry strategy, and graceful degradation.

Drop Table Team

This deep dive documents the exact integration pattern we use in production to connect Azure AI Foundry (Azure OpenAI) to a .NET application. It covers infrastructure, authentication, runtime configuration, API calling patterns, prompt and JSON handling, retries, and fault tolerance.

What Is Azure AI Foundry and Azure OpenAI?

Azure AI Foundry is Microsoft's managed platform for building and operating AI-enabled applications. For GPT model inference, the key resource is an Azure OpenAI account where you deploy a named model to a named deployment and call it through HTTPS.

Because this runs inside your Azure tenancy, you keep control over identity, access, network posture, and compliance boundaries while still using frontier model capabilities.

Infrastructure Pattern: Two Cognitive Resources

We provision two Azure Cognitive resources with Terraform.

An OpenAI account used for chat completions and model inference
An AIServices account used for broader platform features in Foundry workflows

resource "azurerm_cognitive_account" "openai" {
  name                  = "aoai-myapp-prod"
  location              = azurerm_resource_group.rg.location
  resource_group_name   = azurerm_resource_group.rg.name
  kind                  = "OpenAI"
  sku_name              = "S0"
  custom_subdomain_name = "aoai-myapp-prod"
}

resource "azurerm_cognitive_account" "ai_services" {
  name                = "ai-myapp-prod"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  kind                = "AIServices"
  sku_name            = "S0"
}

The OpenAI account must include a custom subdomain so the endpoint is in the form https://subdomain.openai.azure.com. This endpoint form is required for Entra ID bearer-token authentication.

Important

Use an OpenAI cognitive account for inference traffic. Use AIServices for broader Foundry capabilities. Keep these concerns separate so ownership and RBAC stay clear.

Authentication Pattern: Managed Identity, No API Keys

We avoid static secrets entirely. A user-assigned managed identity is attached to the workload and granted Cognitive Services OpenAI User on the OpenAI account scope.

resource "azurerm_user_assigned_identity" "web" {
  name                = "id-web-myapp-prod"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
}

resource "azurerm_role_assignment" "web_foundry_user" {
  scope                = azurerm_cognitive_account.openai.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_user_assigned_identity.web.principal_id
}

In the container runtime we pass the managed identity client ID through AZURE_CLIENT_ID:

env {
  name  = "AZURE_CLIENT_ID"
  value = azurerm_user_assigned_identity.web.client_id
}

The identity client ID is injected as AZURE_CLIENT_ID, and DefaultAzureCredential resolves it at runtime. In local development, the same credential chain falls back to Azure CLI or IDE sign-in, so no code branching is needed between local and cloud execution.

private DefaultAzureCredential BuildCredential()
{
    var managedIdentityClientId = Environment.GetEnvironmentVariable("AZURE_CLIENT_ID");

    return new DefaultAzureCredential(new DefaultAzureCredentialOptions
    {
        ManagedIdentityClientId = managedIdentityClientId
    });
}

Locally, the same code path falls back to developer identity sources (Azure CLI / IDE sign-in), so there is no environment-specific authentication branch in application code.

Configuration Pattern: Environment Variables to Options

The endpoint, deployment name, and API version are configured as environment variables in the runtime host and bound to a typed options class.

env {
  name  = "MyApplication__FoundryEndpoint"
  value = "https://${azurerm_cognitive_account.openai.custom_subdomain_name}.openai.azure.com/"
}
env {
  name  = "MyApplication__ModelDeploymentName"
  value = var.foundry_model_deployment_name
}
env {
  name  = "MyApplication__ApiVersion"
  value = var.foundry_api_version
}

The options set includes endpoint URL, deployment name, API version, retry settings, and max prompt length. Binding these values through a typed options class keeps the integration explicit and environment portable.

public sealed class MyApplicationOptions
{
    public const string SectionName = "MyApplication";

    public string FoundryEndpoint { get; set; } = string.Empty;
    public string ModelDeploymentName { get; set; } = string.Empty;
    public string ApiVersion { get; set; } = "2024-10-21";
    public int FoundryRetryMaxAttempts { get; set; } = 4;
    public int FoundryRetryBaseDelaySeconds { get; set; } = 2;
    public int FoundryRetryMaxDelaySeconds { get; set; } = 20;
    public int MaxPromptCharacters { get; set; } = 80000;
}

API Pattern: Direct HttpClient to Azure OpenAI REST

We call the REST endpoint directly using HttpClient. This keeps request and response contracts explicit and easy to inspect.

Why we did not use the SDK here

This is an intentional design choice, not an omission. We needed preview API features that were available on the REST surface before they were available in the .NET SDK. Using HttpClient let us adopt those capabilities immediately while keeping authentication and retry behavior under our control.

If your required features are fully supported in your target SDK version, using the SDK is a strong option for long-term ergonomics. In our case, direct REST calls were the most practical way to move quickly without blocking on client-library release cadence.

Acquire token for https://cognitiveservices.azure.com/.default
Build deployment URL from endpoint + deployment + API version
Send chat completions payload as JSON
Read choices[0].message.content

private async Task<string> ExecuteFoundryChatRequestAsync(object payload, CancellationToken cancellationToken)
{
  var token = await BuildCredential().GetTokenAsync(
    new TokenRequestContext(["https://cognitiveservices.azure.com/.default"]),
    cancellationToken);

  var client = _httpClientFactory.CreateClient();
  client.DefaultRequestHeaders.Authorization =
    new AuthenticationHeaderValue("Bearer", token.Token);

  var endpoint = new Uri(_options.FoundryEndpoint.Trim());
  if (!string.Equals(endpoint.Scheme, "https", StringComparison.OrdinalIgnoreCase))
    throw new InvalidOperationException("FoundryEndpoint must use HTTPS.");

  var requestUri =
    $"{endpoint.AbsoluteUri.TrimEnd('/')}" +
    $"/openai/deployments/{_options.ModelDeploymentName}" +
    $"/chat/completions?api-version={Uri.EscapeDataString(_options.ApiVersion)}";

  using var response = await client.PostAsJsonAsync(requestUri, payload, cancellationToken);
  var rawBody = await response.Content.ReadAsStringAsync(cancellationToken);

  response.EnsureSuccessStatusCode();

  using var responseJson = JsonDocument.Parse(rawBody);
  return responseJson.RootElement
    .GetProperty("choices")[0]
    .GetProperty("message")
    .GetProperty("content")
    .GetString()!;
}

At runtime we explicitly enforce HTTPS endpoint usage and parse the response body from the first choice message content path. This keeps the read model consistent across extraction scenarios.

Structured Output: Force Strict JSON

For deterministic downstream parsing, prompts require strict JSON and include schema expectations in the system message. We also set response format to JSON object.

var payload = new
{
  messages = new object[]
  {
    new
    {
      role = "system",
      content = "You extract information from text. Return strict JSON with exactly: {\"name\":\"string\",\"tags\":[\"tag1\"],\"score\":1}."
    },
    new { role = "user", content = $"Text:\n{inputText}" }
  },
  response_format = new { type = "json_object" },
  max_completion_tokens = 2000
};

On deserialization, we parse defensively with per-property fallbacks to avoid brittle failures when optional fields are missing.

var name = root.TryGetProperty("name", out var el) && el.ValueKind == JsonValueKind.String
  ? el.GetString() ?? fallback
  : fallback;

Resilience: Retry Transient Failures with Backoff

We retry 429, 502, 503, and 504 responses with exponential backoff and jitter, while honoring Retry-After when present.

for (var attempt = 1; attempt <= maxAttempts; attempt++)
{
  using var response = await client.PostAsJsonAsync(requestUri, payload, cancellationToken);

  if (response.IsSuccessStatusCode)
  {
    // parse and return
  }

  var isRetryable = response.StatusCode is
    HttpStatusCode.TooManyRequests or
    HttpStatusCode.BadGateway or
    HttpStatusCode.ServiceUnavailable or
    HttpStatusCode.GatewayTimeout;

  if (isRetryable && attempt < maxAttempts)
  {
    var delay = ComputeRetryDelay(response, attempt);
    await Task.Delay(delay, cancellationToken);
    continue;
  }

  throw new InvalidOperationException($"Foundry request failed: {response.StatusCode}");
}

Our defaults (4 max attempts, 2-second base, 20-second cap) strike a practical balance between user latency and transient fault recovery.

private TimeSpan ComputeRetryDelay(HttpResponseMessage response, int attempt)
{
  if (response.Headers.RetryAfter?.Delta is TimeSpan delta)
    return Cap(delta);

  if (response.Headers.RetryAfter?.Date is DateTimeOffset date)
    return Cap(date - DateTimeOffset.UtcNow);

  var seconds = _options.FoundryRetryBaseDelaySeconds * Math.Pow(2, attempt - 1)
          + Random.Shared.NextDouble() * 0.75;

  return Cap(TimeSpan.FromSeconds(seconds));

  TimeSpan Cap(TimeSpan t) =>
    t > TimeSpan.FromSeconds(_options.FoundryRetryMaxDelaySeconds)
      ? TimeSpan.FromSeconds(_options.FoundryRetryMaxDelaySeconds)
      : t < TimeSpan.Zero ? TimeSpan.Zero : t;
}

Prompt Safety: Cap Input Length

Long documents are truncated before prompt assembly to prevent context overflow and runaway token cost. Truncation is logged for observability.

private string LimitTextForPrompt(string text)
{
  var normalized = text.Trim();
  var max = Math.Max(_options.MaxPromptCharacters, 4000);

  if (normalized.Length <= max)
    return normalized;

  _logger.LogInformation(
    "Truncating input for Foundry prompt from {Original} to {Max} characters.",
    normalized.Length,
    max);

  return normalized[..max];
}

Fault Tolerance: Graceful Degradation

AI processing is treated as best-effort, not a hard dependency. If extraction fails, we persist the record with a manual-review status instead of blocking the business workflow.

try
{
  var result = await ExtractDataAsync(text, cancellationToken);
  return result with { ProcessingStatus = "Completed" };
}
catch (Exception ex)
{
  _logger.LogWarning(ex, "AI processing failed; record stored for manual review.");

  return new AiResult
  {
    ProcessingStatus = "NeedsReview",
    ProcessingError = "AI processing failed. Record stored and requires manual review."
  };
}

Production Checklist

Concern	Implementation
Provisioning	OpenAI cognitive account with custom subdomain endpoint
Authentication	User-assigned managed identity + OpenAI RBAC role
Local development	DefaultAzureCredential fallback to developer sign-in
Configuration	Environment variables mapped to typed options
API client	Direct HttpClient with bearer token auth
Output contract	response_format JSON object + schema in prompt
Retry behavior	Exponential backoff, jitter, Retry-After support
Prompt safety	Configurable max prompt characters with logging
Failure mode	Graceful degradation to NeedsReview state

If you want to replicate this architecture, start with identity and endpoint correctness first, then add structured output and resilience controls. Those three layers deliver most of the reliability gains in real production workloads.

Discuss with our team

Want more insights?

Join our mailing list for future blog posts and practical Azure security guidance.

View All Articles Get Expert Advice