Daft RAG

After 8 months of concerted effort, I have invented a new paradigm for open-domain question answering systems.

Here is the question:

Which University, The University of Saskatchewan or The University of Detroit Mercy was founded first?

If I ask Mistral Chat it responds :

The University of Detroit Mercy was founded first. It was established in 1877. The University of Saskatchewan was founded later, in 1907.

Semem, my oh-so-sophisticated Semantic Web Memory for Intelligent Agents (which along with a lot of other things uses Wikipedia calls and mistral-small-latest under the hood) says :

The University of Detroit Mercy was founded first, in 1990. The University of Saskatchewan was founded later, in 1907.

According to Wikipedia :

The University of Detroit Mercy was founded in 1877
The University of Saskatchewan was founded in 1907

It's ok, this result was actually very pleasing. I have only just started kicking the tyres of what's been implemented. The graph (in a SPARQL store) is getting very nicely populated/augmented using a bunch of different algorithms and the retrieval steps are basically working. It looks like the issue here is simply that the generated corpuscles (what I'm calling little bits of a corpus) are not being SELECTed properly to go into the final results.

PS. Yes, it was a mismatch between the shape of the query and the shape of the graph. After fixing that, for this question Semem now gives :

The provided context does not contain information about the founding dates of the University of Saskatchewan or the University of Detroit Mercy. Therefore, I cannot answer this question based on the given context.

Which is a correct answer.

Even more pleasing is the result of one of the other test questions :

Are Sorghastrum and Artabotrys found in the same areas?

Based on the provided context, there is no information suggesting that Sorghastrum and Artabotrys are found in the same areas. The context only provides specific details about Sorghastrum in Paraguay and some species of Artabotrys, but it does not indicate any overlap in their habitats or geographical distribution. Therefore, I cannot confirm whether they are found in the same areas.

Ok, Mistral chat gave a much more detailed answer in 1s after having done a Web search. But I'm still calling this a win.

Here is BeerQA Enhanced Workflow (v2)

Daft RAG

Voltage Controlled Motor Module

Demo vid : VCMM

Schematic :

schematic

This worked on the 3rd attempt. First try, nothing happened but one transistor got hot. Then I discovered I'd got the collector wires the wrong way around. I thought I must have fried the transistors, but maybe not, after I'd replaced them I experimented a bit more. My estimate for a resistor to go in series with the motor had been way out (120R), it needed something really low. But it occurred to me that an inductance should work there, and I had a old PCB (from a satellite tuner PSU) on the desk next to me with the perfect candidate. I'd got the old board there for the 1000uF 16v caps. I didn't bother breadboarding, went straight to stripboard. It would either work or not. The components that were candidates for tweaking were easy access.

pcb

This time it actually worked. After trying it for a few seconds I felt to see if the transistors were getting warm. Burnt my fingers. Motors really aren't my thing. So I made a couple of little heatsinks and hooked it up to the modular for the vid.

I think what I'll do is try it with a PSU of its own next. With a dummy load first, to see if the circuit is behaving as I'd expected. I really don't get why it should be drawing so much current with such a farty little motor, unless the transistors are biased on when they shouldn't be. Dunno. The motor is the main unknown for me.

Voltage Controlled Motor Module

Distortion Module

I think I already posted this schematic :

distortion schematic

Since then I soldered it up and made a front panel. It worked on the second attempt!

First time I'd labeled the power connector back-to-front. Which was annoying since I knew the red wire on Eurorack cables was perversely the negative. I must have doubled on it. Anyway, it fried the op-amp. Horrible to desolder.

fried op amp

The front panel went well :

One Simple Trick! - print onto glossy paper, laminate.

End result was reasonably smart -

distortion module

I couldn't be bothered tweaking the layout further.

The font is Thrums Serif, btw. I designed it a few years back because I wanted to see how it was done, and wanted one that worked for me on web pages. I didn't really get it to look good on web pages, but serendipity - exactly what I want for front panels.

Distortion Module

Wave Function Collapse for Sounds

An LV2 plugin.

Repo : https://github.com/danja/collapse-lv2

Demo : YouTube (hope that works, I got a copyright warning)

It generates signal based on what it's seen, is a kind of machine learning lite. It's very noisy. Which is what I was hoping for. Aside from the obvious noise use, it may be useful as waveform shaper.

I stumbled on WaveFunctionCollapse, "This program generates bitmaps that are locally similar to the input bitmap.". It occurred to me that the same algorithm might be usable on audio streams. I wrote a long prompt for DeepSeek which had a good long discussion with itself before producing the material below. This I passed to Claude chat together with the instructions for it to make an LV2 plugin for me. I want to use it in Reaper, my DAW of choice. (I've already had a go at vibe coding with amen-chaos and midi-chaos-amen with considerable success, as far as I took them). Claude gave me some patchy files which I stuck up at GitHub before setting Claude Code on them.

~~So far I've not got anything usable, I ran out of Claude tokens. Apparently Google Gemini for coding is now free, so I may give that a shot later.~~ Gemini was slow. Went back to Claude and now it works!

Anyhow, below is what DeepSeek gave me, it has the basic idea. I've a feeling it won't be any good without taking into account the periodicity of the input. This could potentially operate at the waveform level, or (depending on how it performs) over beats or bars. Dunno.

I added period detection with autocorrelation in at the start. It's very picky about which settings will actually produce a processed output. But it really messes things up, which is exactly what I was hoping for.

To apply the wave function collapse (WFC) technique to audio waveforms in near real-time, we reinterpret the grid-based approach for time-series data. Here's a conceptual adaptation and JavaScript implementation:

Core Concept Adaptation:

Grid Representation:
- X-axis: Represents time (one cycle or window of the waveform).
- Y-axis: Represents quantized amplitude levels (discrete buckets for continuous sample values).
- Z-axis (optional): Encodes local patterns (e.g., sequences of amplitude changes).
Pattern Extraction:
- Slice the input waveform into overlapping windows (e.g., 10-sample segments).
- Quantize amplitudes (e.g., 256 levels) to create discrete "tiles".
- Catalog valid neighbor transitions (e.g., which 10-sample blocks can follow each other).
Real-Time Generation:
- Maintain a "collapsing" buffer of recent samples (context window).
- For each new sample, resolve constraints by selecting valid patterns that match the context.
- Output the next sample from the chosen pattern.

JavaScript Implementation:

class WaveformWFC {
  constructor(inputSamples, patternSize = 10, quantLevels = 256) {
    this.patternSize = patternSize;
    this.quantLevels = quantLevels;
    this.patterns = new Map(); // pattern -> {nextPatterns: Set, count: number}

    // Quantize input [-1, 1] to discrete levels
    this.quantize = (sample) =>
      Math.max(0, Math.min(quantLevels - 1,
        Math.floor((sample + 1) * quantLevels / 2)));

    this.dequantize = (level) =>
      (level / (quantLevels - 1)) * 2 - 1;

    this.extractPatterns(inputSamples);
  }

  extractPatterns(samples) {
    // Build patterns from sliding windows
    for (let i = 0; i < samples.length - this.patternSize; i++) {
      const window = samples.slice(i, i + this.patternSize);
      const quantPattern = window.map(s => this.quantize(s));
      const key = quantPattern.join(',');

      // Track pattern frequency and allowed successors
      if (!this.patterns.has(key)) {
        this.patterns.set(key, { nextPatterns: new Set(), count: 0 });
      }
      const data = this.patterns.get(key);
      data.count++;

      // Record valid next pattern (overlapping window)
      if (i < samples.length - this.patternSize - 1) {
        const nextWindow = samples.slice(i + 1, i + this.patternSize + 1);
        const nextKey = nextWindow.map(s => this.quantize(s)).join(',');
        data.nextPatterns.add(nextKey);
      }
    }
  }

  generateSample(context) {
    // Quantize recent context (patternSize - 1 samples)
    const quantContext = context.map(s => this.quantize(s));
    const contextKey = quantContext.join(',');

    // Find patterns starting with current context
    const candidates = [];
    for (const [pattern, data] of this.patterns) {
      const patternArr = pattern.split(',').map(Number);
      const matchesContext = quantContext.every(
        (val, i) => val === patternArr[i]
      );

      if (matchesContext) {
        // Weight by occurrence frequency
        for (let i = 0; i < data.count; i++) {
          candidates.push(patternArr[patternArr.length - 1]); // Last sample in pattern
        }
      }
    }

    // Fallback: Random sample if no candidates
    if (candidates.length === 0) {
      return Math.random() * 2 - 1; // [-1, 1]
    }

    // Random weighted selection
    const nextLevel = candidates[Math.floor(Math.random() * candidates.length)];
    return this.dequantize(nextLevel);
  }
}

// Usage Example
const audioContext = new AudioContext();
const inputSamples = [...]; // Float32Array from source audio

const wfc = new WaveformWFC(inputSamples);
const outputBuffer = new Float32Array(44100); // 1sec buffer
let context = new Array(wfc.patternSize - 1).fill(0); // Initial silence

for (let i = 0; i < outputBuffer.length; i++) {
  outputBuffer[i] = wfc.generateSample(context);

  // Update context (remove oldest, add new sample)
  context.shift();
  context.push(outputBuffer[i]);
}

// Play generated audio
const buffer = audioContext.createBuffer(1, outputBuffer.length, 44100);
buffer.copyToChannel(outputBuffer, 0);
const source = audioContext.createBufferSource();
source.buffer = buffer;
source.connect(audioContext.destination);
source.start();

Key Optimizations for Real-Time:

Context Hashing:
- Precompute lookup tables for quantized context keys to candidate patterns.
Approximate Matching:
- Use Euclidean distance thresholds for noisy inputs.
Pattern Pruning:
- Discard low-frequency patterns to reduce state space.
Overlap Handling:
- Generate samples in blocks (e.g., 128 samples) to amortize lookup costs.

How Structure is Preserved:

Local Constraints:
- Transition rules enforce that amplitude sequences (e.g., attack-decay in notes) replicate input behaviors.
Global Coherence:
- Weighted pattern selection maintains statistical properties (e.g., amplitude distribution).
Continuity:
- Overlapping context buffers prevent discontinuities between generated segments.

This approach mirrors WFC's constraint-solving in a temporal domain, transforming input waveforms into novel outputs with similar micro/macro structures.

Wave Function Collapse for Sounds

Journal 2025-06-23

New Domain : tensegrity.it

I've added a new DNS domain to my collection, so I now have :

danny.ayers.name - my personal site (this blog)
hyperdata.it - my quasi-company site
strandz.it - for a project, in planning
tensegrity.it - my current raft of code projects

They are all undeniably Under Construction.

The new one came about because I realised that my the things I'm working on strongly resemble a tensegrity structure. Only after setting up a GitHub repo to record this did I think to check for the domain name. Right now the repo just contains a sketchy overview. I'll use the new domain for now just as a tech blog, keeping the .name domain for non-coding stuff.

Journal 2025-06-23