🧠LLMs As Sensors

Why OrKa 0.9.10 Wraps GenAI Inside Deterministic Systems

I will start bluntly.

I like generative AI. I use it every day. I build around it. But I do not trust it to own the outcome of a system.

For me, GenAI is a fantastic tool for two things:

Generating content
Analyzing context

That is already huge. But it is still just one tool in a bigger machine.

What worries me is how often I see people trying to bend the model into being the whole product.

“Just send a giant prompt, get an answer, ship it.”

It works for demos. It does not scale to real systems that need reliability, reproducibility, or any kind of serious accountability.

This article is about that gap.

Why LLMs should be treated as probabilistic sensors, not entire applications
Why their outputs must be wrapped into real objects and fed into deterministic algorithms
And how this philosophy is shaping the current work I am doing with OrKa v0.9.10, including a routing fix that forces me to hold myself to the same standard I am describing here

I am not trying to hype anything. I am trying to describe how I think modern AI should be wired if we want it to behave like infrastructure instead of roulette.

The uncomfortable truth: LLMs are not your system

Let me restate the rough idea that kicked this off:

AI, especially GenAI, is a great tool for content generation and context analysis. But it is still just a tool.

We need to stop treating it as the whole solution and instead force it to generate outcomes that can feed a bigger system, so those outcomes can be used for deterministic execution of algorithms.

That is the core.

LLMs are:

Stochastic
Non deterministic
Sensitive to prompt phrasing, context ordering, temperature, and even invisible whitespace
Very good at pattern matching, fuzzy reasoning, and “filling in the missing piece”

They are not:

Reliable finite state machines
Formal decision trees
Deterministic planners
Systems you can audit in a classical sense

And that is fine, as long as you do not pretend otherwise.

Where LLMs shine is exactly where classic systems struggle:

Quick approximate reasoning
Extracting structure from messy input
Mapping unstructured signals into higher level descriptions
Acting almost like a “universal fuzzy detector” for patterns

So the question is not

“How do I make the LLM do everything?”

The question is

“How do I use the LLM where it shines, then hand off to deterministic code as soon as possible?”

Think of LLMs as sensors, not brains

The metaphor that keeps coming back in my head is this:

An LLM is a sensor that reads the world of language and returns a noisy, high level interpretation.

Just like:

A microphone turns air vibration into a waveform
A camera turns photons into pixels
An accelerometer turns motion into axes of numbers

An LLM turns sequences of tokens into:

Labels
Spans of text
Explanations
Rankings
Summaries
Structured JSON

The trick is to treat that output as measurement, not as law.

For example:

“This voice sounds like a 35 to 45 year old male, 70 percent confidence.”
“This message is probably a support ticket about billing.”
“This paragraph expresses frustration, particularly toward a teammate.”

Those measurements are incredibly powerful. Before LLMs, many of these tasks required:

Custom signal processing
Domain specific feature extraction
Custom models for each upstream task
A lot of time and data

Now you can prototype them in hours.

But once you have that measurement, you should wrap it:

{
  "age_estimate": 38,
  "age_range": [35, 45],
  "confidence": 0.73,
  "source": "audio_segment_023.wav",
  "model": "my_local_model_1.5b"
}

That object is no longer just “LLM output”. It is:

A typed entity in your system
Something you can log, replay, test, and validate
A first class citizen in your deterministic logic

Then the decisions are made by normal code:

if person.age_estimate >= 18:
    enable_feature("adult_profile", person.id)
else:
    enable_feature("underage_profile", person.id)

The “smart” part is upstream. The accountable part is downstream.

A concrete example: detecting aging from audio

You mentioned something like “detect the aging from an audio” and I like this example a lot because it is exactly the kind of thing that smells “AI-ish” but should be designed as a system, not as a prompt.

A naive approach looks like this:

Send raw audio (or its transcription) to an LLM with a prompt like
“Analyze this audio and tell me how old the speaker is and how it is changing over months.”
Get back some English explanation.
Show it in a UI. Call it a feature.

That is fragile and impossible to test properly.

A more system-level design:

Signal layer
- Extract features from the audio over time.
- Maybe you use some classic DSP, maybe you use a small embedding model.
- Build a timeline of short samples.
LLM as a sensor
- For each window, the LLM gets a compressed description of the signal, or even just some textual metadata if you have it.
- It outputs something compact and structured:

   {
     "timestamp": 1733332500,
     "age_estimate": 39,
     "confidence": 0.68,
     "voice_stability": "slightly_decreasing"
   }

Deterministic aging detector
- A standard algorithm (not an LLM) runs on top of these structured records.
- It can be a simple function, or a time series model, but the key is:
  - The transitions are explicit
  - The thresholds are configurable
  - The logic is not hidden in a prompt
System outcome
- The system might decide:
  - “We do not detect significant aging over the last 12 months.”
  - Or “We detect a consistent pattern of degradation, trigger an alert.”

You can test this.

You can replay the same input data and verify you get the same decision. You can experiment with different threshold values. You can swap out the LLM with a smaller local model that returns a similar JSON structure.

The LLM is a pluggable sensor. The system is the deterministic pipeline that consumes its readings.

Why wrapping model output into objects matters

This is the part that seems small but changes everything.

If you let your LLM return “whatever it wants, as long as the text looks good”, your system will always be at the mercy of prompt drift.

If you force your LLM to return objects, and you treat those objects as contract, you get:

A clear boundary between probabilistic and deterministic behavior
The ability to version that schema
Explicit error handling when the object is malformed or incomplete
Real regression tests

Typical pattern:

Prompt the LLM to output strict JSON with an explicit schema.
Validate that JSON in your code.
Log the raw model output and the parsed object.
Use only the parsed object downstream.

In pseudocode:

raw = call_llm(prompt, input_context)
parsed = json.loads(raw)

validate_schema(parsed, AgeEstimateSchema)  # raises if invalid

decision = age_classifier(parsed)
persist_decision(decision)

If validate_schema fails, that is not “mysterious AI behavior”. It is a normal bug you can see in a log and fix by adjusting the prompt or model.

And now we can talk about orchestration.

OrKa: building a deterministic spine around probabilistic agents

OrKa exists because I wanted a way to:

Compose multiple “sensors” and agents
Route between them based on their outputs
Keep the execution trace fully visible and replayable
Avoid hardcoding everything in application code over and over

In OrKa, I do not think of “a big model that knows everything”.

I think in terms of:

Agents that do one thing
Service nodes that mutate state or call external systems
Routers that decide which agent comes next, based on structured outputs

Everything is described in YAML, so the cognition graph is explicit.

A very simplified OrKa-style flow where an LLM decides which branch to take might look like this:

orchestrator:
  id: audio_aging_flow
  strategy: sequential
  queue: redis

agents:
  - id: audio_to_features
    type: service
    kind: audio_feature_extractor
    next: llm_age_sensor

  - id: llm_age_sensor
    type: llm
    model: local_llm_1
    prompt: |
      You are an age estimation sensor.
      Given these features, output strict JSON:
      {"age_estimate": int, "confidence": float}
    next: age_route

  - id: age_route
    type: router
    routing_key: age_estimate
    routes:
      - condition: "value < 18"
        next: underage_handler
      - condition: "value >= 18"
        next: adult_handler

  - id: underage_handler
    type: service
    kind: profile_flagger

  - id: adult_handler
    type: service
    kind: profile_flagger

The LLM here is just one node (llm_age_sensor). Its output becomes a field (age_estimate) that the router uses in a deterministic way.

If you replay the same input, the router will make the same decision for the same parsed values.

That guarantee is not automatic. It depends on the correctness of routing behavior. Which brings me to the latest OrKa release.

Why this matters beyond OrKa

You do not have to care about OrKa to care about this pattern.

If you are building any system around generative models, ask yourself a few questions:

Where does the probabilistic behavior end?

Is there a clear boundary where the LLM output is turned into a typed object and validated? Or does the “magic” just flow deep into your code base?
Who owns the final decision?

Does the model decide what happens, or does deterministic code decide based on model measurements?
Can you replay a run?

If a user reports something weird, can you reconstruct the full chain: input → model output → routing → system decision?
What happens if you swap models?

If you change from a proprietary model to a local one, do you only change the sensor, or do you need to rewrite half the app?
What is the unit of testability?

Can you test downstream logic with synthetic objects, without involving the LLM at all?

My bias is clear:

I want LLMs to be pluggable, swappable, measurable, and constrained.

I want the core of the system to feel boring in a good way.

That is what OrKa is trying to encode at the framework level:

model calls as agents, routing as explicit configuration, memory and traces as first class concepts, all tied together in a way that can be inspected, not guessed.

A small mental shift that changes system design

If I had to compress this article into one mental shift, it would be this:

Stop asking “What can the LLM do?”

Start asking “What kind of object do I need so that my system can behave deterministically, and how can I use an LLM to produce that object?”

Examples:

Instead of “write me a reply email”, think “I need an EmailReplyPlan with fields: tone, key_points, call_to_action, and I will let deterministic templates render the final email.”
Instead of “decide what to do next for this customer”, think “I need a NextAction object with action_type, priority, and reason, and my orchestration layer will decide which internal systems to call.”
Instead of “summarize this call for the CRM”, think “I need a CallSummary object with sentiment, topics, promises_made, follow_up_tasks, and my CRM logic will handle storage and workflows.”

In all of these, the LLM is powerful, but the real system lives around it.

You can inspect those objects. You can aggregate them. You can feed them into analytics and classic algorithms. You can design them once and evolve them over time.

And if you embrace orchestration tools, you can also define how these objects move, which nodes can create or transform them, and under what conditions routing happens.

Closing thoughts

So, to tie the threads:

GenAI is great at generating content and reading context. That is not a small thing. It is a massive shift in what we can build in reasonable time.
But models are not the system. They are components in the system. Treat them like sensors that emit measurements.
Wrap model outputs into strict, typed objects. Validate them. Version them. Use them as the raw material for deterministic logic, not as the final answer.
Orchestrate flows so that routing is explicit, traceable, and reproducible. If the routing itself is fuzzy, you just moved the black box one step further.
In OrKa v0.9.10, tightening routing behavior was not a cosmetic refactor. It was necessary to keep this philosophy consistent in the framework I am building. If I want OrKa to be a cognitive execution layer, it needs to behave like infrastructure, not like another probabilistic blob around the model.

If you are curious about OrKa, you can read more and follow the roadmap at orkacore.com. I am not claiming it is the answer. It is simply my current attempt to encode this belief in code:

LLMs should feed deterministic systems, not replace them.

If that idea resonates with you, then we are probably trying to solve similar problems, just with different tools.

OrKa v0.9.10: fixing routing is not a cosmetic change

I just cut a release of OrKa v0.9.10, focused on a fix in routing behavior.

I will not pretend this is some huge “launch” moment. It is a pretty boring fix if you look only at the diff. But for the philosophy in this article, it is critical.

What was wrong?

In some edge cases, the router:

Would evaluate conditions on slightly stale context, or
Could pick a next node that was not the one you would expect from the latest structured output, especially after more complex flows with forks and joins

This is exactly the type of thing that breaks the “LLM as sensor, system as deterministic spine” model.

When your router does not behave deterministically, you get:

Non reproducible traces
Confusing logs
Surprises during replay
The feeling that the orchestrator itself is “magical” instead of mechanical

That is the opposite of what OrKa is supposed to be.

So in v0.9.10 I focused on:

Making sure routing decisions always use the last committed output of the relevant agent
Making context selection explicit, not implicit
Tightening the mapping between routing_key and the object field it reads
Hardening the trace so that, for a given input plus memory state, the same routing path is taken every time

In more human words:

If your LLM says:

{ "route": "adult_handler" }

then OrKa should take that path, and you should be able to see exactly why in the trace.

No surprises. No “the orchestrator is a bit mysterious too”.

Only the LLM is allowed to be fuzzy. The rest must behave like infrastructure.

🎬 Watch the Video

The uncomfortable truth: LLMs are not your system

Think of LLMs as sensors, not brains

A concrete example: detecting aging from audio

Why wrapping model output into objects matters

OrKa: building a deterministic spine around probabilistic agents

Why this matters beyond OrKa

A small mental shift that changes system design

Closing thoughts

OrKa v0.9.10: fixing routing is not a cosmetic change

Why My Portfolio Has Superpowers and Villain Galleries

Most DSA Problems Are Repetitions

How big is the EU tech dependence on the US?

Heated Rivalry lived up to the hype — here are 9 other movies and shows I wish I’d watched sooner

Outages, cable cuts, power failures, and more – 2025 was a rough year for the internet; these were its toughest moments

Fitbit users have been given more time to migrate their accounts over to Google

The uncomfortable truth: LLMs are not your system

Think of LLMs as sensors, not brains

A concrete example: detecting aging from audio

Why wrapping model output into objects matters

OrKa: building a deterministic spine around probabilistic agents

Why this matters beyond OrKa

A small mental shift that changes system design

Closing thoughts

OrKa v0.9.10: fixing routing is not a cosmetic change

Similar Posts