Oscillation Detection

4 min read
Suggest an edit

The Problem: "Thrashing" and the Cloud Bill of Doom

In a declarative auto-scaling system, you might tell the system: "If CPU is above 50%, add a server. If CPU is below 50%, remove a server."

What happens in reality?

  1. CPU hits 51%. The system adds a server.
  2. Because there is a new server, the average CPU drops to 49%.
  3. The system immediately deletes the server.
  4. The CPU spikes back to 51%.
  5. The system adds a server...

This is an Oscillation Loop. The system will infinitely boot and destroy cloud servers thousands of times a minute, completely thrashing the API and potentially racking up a $100,000 AWS bill over the weekend.

The Code: Hysteresis in Ved

Ved handles oscillation through two layers: Logical Hysteresis (Deadbands) written by the developer, and Algorithmic Backpressure enforced by the runtime.

Here is how a developer writes safe, oscillation-proof logic using a "Deadband" (a safe resting range):

domain AutoScaler {
  state {
    cpu_average: int
    active_servers: int
  }

  // The "Deadband": The system rests anywhere between 40% and 60%
  goal LoadBalanced {
    predicate cpu_average >= 40 && cpu_average <= 60
  }

  transition ScaleUp {
    step {
      // Only scale up if it crosses the UPPER threshold
      if cpu_average > 60 {
        emit Cloud.AddServer()
        active_servers += 1
      }
    }
  }

  transition ScaleDown {
    step {
      // Only scale down if it drops below the LOWER threshold
      if cpu_average < 40 && active_servers > 1 {
        emit Cloud.RemoveServer()
        active_servers -= 1
      }
    }
  }
}

How it Executes (The Runtime Safety Net)

If the developer writes the code correctly (like above), the system safely rests in the 40-60% zone.

But what if a junior engineer makes a mistake and writes a flawed goal like predicate cpu_average == 50? It is mathematically almost impossible to hold a CPU at exactly 50%. The system will start thrashing.

Here is how the Ved Runtime's Algorithmic Oscillation Detection kicks in to save the day:

  1. The Thrash: The flawed code runs. ScaleUp, ScaleDown, ScaleUp, ScaleDown fire in rapid succession across 10 ticks.
  2. The Detection: The Scheduler mathematically monitors the Domain's memory graph. It notices that the active_servers variable is rapidly alternating between 4 and 5 without the Domain ever achieving Quiescence (sleep).
  3. The Backpressure: Instead of letting the loop run infinitely, the Ved runtime physically intervenes. It triggers Algorithmic Backpressure. It throttles the Domain, drastically reducing its execution priority.
  4. The Alert: The runtime emits a structured warning to the telemetry system: V-WRN-042: State Oscillation Detected in AutoScaler. The Domain is put in a "cooldown" state to physically prevent API abuse until a human fixes the logic.

Behavior

  • State never stabilizes
  • runtime detects oscillation

Outcome

  • warning generated
  • system flagged unstable

Why This Matters

1. The Power of the "Deadband"

In imperative scripts, engineers often try to fix thrashing by adding arbitrary sleep(300) commands (waiting 5 minutes between scaling events). This is a sloppy hack that makes the system sluggish to respond to real emergencies. Ved encourages the use of Deadbands (a range of acceptable realities). By making the goal a range (40 to 60), the system remains lightning-fast to react to a spike of 70%, but comfortably ignores a flutter between 49% and 51%. It models physics rather than just code.

2. The Runtime as a "Parent"

In standard languages (like Go or Python), the compiler/runtime assumes the developer is a genius. If the developer writes an infinite loop, the runtime happily executes it until the server bursts into flames. Ved acts like a responsible parent. It watches the execution loop, realizes the developer wrote mathematically unstable logic, and steps in to apply the brakes. It protects the infrastructure from the code.

3. Mathematical Proofs over Testing

Catching an oscillation bug in a staging environment is notoriously difficult because you have to perfectly simulate the exact traffic conditions that cause the flutter. Often, these bugs only show up in production. Because Ved's linter (ved lint) and runtime are built on formal state-machine logic, they can analyze the code structurally. The compiler can look at your Transitions and say: "Warning: Transition A and Transition B are perfectly opposed, and your Goal boundary is too narrow. This has a 99% probability of oscillation."


Summary

Ved can detect:

non-converging system behavior