Files
probe/REMEDIATION_PROBE-DP-001.md
ClearGrow Agent 39a696bdd2
Some checks failed
ClearGrow Probe CI / Build Development Firmware (push) Has been cancelled
ClearGrow Probe CI / Build Production Firmware (push) Has been cancelled
ClearGrow Probe CI / CI Status Summary (push) Has been cancelled
Initial commit: migrate from GitHub
2025-12-10 09:32:24 -07:00

8.7 KiB

REMEDIATION: PROBE-DP-001 - No Offline Data Buffering

Status: IMPLEMENTED Priority: HIGH Date: 2025-12-09 Platform: Probe (nRF52840)

Problem Statement

When the Thread network is unavailable, sensor readings are lost. There is no buffering mechanism to store readings for later transmission when network connectivity returns.

Solution Implemented

1. New Data Buffer Module

Files Added:

  • /root/cleargrow/probe/include/data_buffer.h - Public API
  • /root/cleargrow/probe/src/data_buffer.c - Implementation

Architecture:

  • Fixed-size ring buffer (48 readings by default)
  • Thread-safe access with mutex protection
  • FIFO ordering (oldest readings transmitted first)
  • Graceful overflow handling (drops oldest when full)
  • No dynamic memory allocation (stack-friendly)

Memory Usage:

  • Buffer size: 48 readings * ~200 bytes/reading = ~9.6KB
  • Conservative for nRF52840 with 256KB RAM
  • Configurable via CONFIG_DATA_BUFFER_SIZE

2. Integration Points

main.c Changes:

  1. Initialization (Phase 4):

    data_buffer_init();  // Initialize ring buffer
    
  2. Thread State Callback (lines 186-194):

    • Detects offline→online transitions
    • Automatically triggers buffer flush
    • Signals transmit thread via semaphore
  3. Transmit Thread (lines 320-404):

    • Priority 1: Flush buffered data when connected
    • Priority 2: Get current sensor reading
    • Decision logic:
      • If offline → buffer reading
      • If online → attempt transmission
      • If transmission fails → buffer reading
    • Handles all failure modes (overflow, network errors)
  4. Status Reporting (lines 608-621):

    • Logs buffer statistics every 60 seconds
    • Shows: Buffer: <count>/<capacity> readings (total: <adds> adds, <drops> drops)

CMakeLists.txt Changes:

  • Added src/data_buffer.c to build targets

3. API Functions

Function Purpose Thread-Safe
data_buffer_init() Initialize buffer N/A
data_buffer_add() Buffer a reading Yes
data_buffer_get() Remove oldest reading Yes
data_buffer_peek() Read without removing Yes
data_buffer_flush() Transmit all buffered data Yes
data_buffer_count() Get current buffer size Yes
data_buffer_is_empty() Check if empty Yes
data_buffer_is_full() Check if full Yes
data_buffer_get_stats() Get statistics Yes

4. Buffer Flush Logic

int data_buffer_flush(int (*send_callback)(const probe_sensor_data_t *data))
{
    while (!empty) {
        peek_oldest_reading();
        ret = send_callback(reading);

        if (ret == 0) {
            // Success - remove from buffer
            data_buffer_get();
            transmitted++;
        } else if (ret == -ENOTCONN || ret == -ETIMEDOUT) {
            // Network issue - stop flushing
            break;
        } else if (ret == -EINVAL) {
            // Data rejected - discard and continue
            data_buffer_get();
        } else {
            // Other error - stop flushing
            break;
        }
    }
    return transmitted;
}

5. Overflow Handling

When buffer is full (48 readings):

  1. Oldest reading is dropped (head advances)
  2. New reading is added at tail
  3. Warning logged: "Buffer overflow, oldest reading dropped"
  4. Statistics counter incremented: total_drops

Example Scenario:

Time  | Event           | Buffer State | Action
------|-----------------|--------------|-------
T0    | Network offline | 0/48         | -
T5    | Reading #1      | 1/48         | Buffered
T10   | Reading #2      | 2/48         | Buffered
...   | ...             | ...          | ...
T240  | Reading #48     | 48/48 (full) | Buffered
T245  | Reading #49     | 48/48        | Oldest dropped, new buffered
T250  | Network online  | 48/48        | Flush starts
T251  | Sent reading #2 | 47/48        | -
T252  | Sent reading #3 | 46/48        | -
...   | ...             | ...          | ...
T300  | All flushed     | 0/48         | Back to normal

Acceptance Criteria

Met Requirements:

  1. Ring buffer implemented - 48 readings, configurable size
  2. Readings buffered when offline - Automatic via transmit thread
  3. Buffer flushed when online - Triggered on state change + periodic checks
  4. Oldest readings dropped on overflow - FIFO with graceful degradation
  5. No memory leaks - Static allocation, mutex-protected
  6. Works with existing polling - Transparent integration

Additional Features:

  • Comprehensive statistics tracking
  • Detailed logging (INFO/WARN/ERR levels)
  • Error handling for all failure modes
  • Thread-safe access throughout

Testing Recommendations

Unit Tests (Simulated):

// Test 1: Basic buffering
data_buffer_init();
add_reading(&reading1);
assert(data_buffer_count() == 1);
get_reading(&out);
assert(out == reading1);

// Test 2: Overflow
for (int i = 0; i < 50; i++) {
    data_buffer_add(&reading);
}
assert(data_buffer_count() == 48);  // Max capacity
get_stats(&adds, &drops, ...);
assert(drops == 2);  // 2 oldest dropped

// Test 3: Flush with failures
data_buffer_add(&reading1);
data_buffer_add(&reading2);
data_buffer_add(&reading3);
// Callback fails on reading2
int flushed = data_buffer_flush(send_with_failure);
assert(flushed == 1);  // Only reading1 sent
assert(data_buffer_count() == 2);  // reading2, reading3 remain

Integration Tests (Hardware):

  1. Offline Buffering:

    • Power on probe without Thread network
    • Verify readings buffer up (check logs)
    • Enable Thread network
    • Verify buffered readings transmitted
    • Check buffer empties
  2. Overflow Behavior:

    • Keep probe offline for >4 minutes (48 readings * 5s = 240s)
    • Verify oldest readings dropped
    • Check total_drops counter increases
    • Verify no crashes or memory corruption
  3. Network Flapping:

    • Repeatedly disconnect/reconnect Thread network
    • Verify buffer fills/empties correctly
    • Check for memory leaks (monitor RAM usage)
  4. Transmission Failures:

    • Simulate CoAP server errors (4.xx, 5.xx)
    • Verify readings stay buffered on 5.xx
    • Verify readings discarded on 4.xx
    • Check flush resumes after transient errors

Performance Impact

Memory:

  • Static RAM: 9.6KB (buffer) + 200 bytes (metadata) = ~10KB
  • Stack: Minimal (mutex only)
  • Impact: 3.9% of 256KB total RAM (acceptable)

CPU:

  • data_buffer_add(): O(1) - single memcpy + index math
  • data_buffer_flush(): O(n) where n = buffered readings
  • Impact: Negligible - flush happens during network operation (already expensive)

Power:

  • No additional power draw (no timers or interrupts)
  • Flush operation uses same CoAP code path as normal transmission
  • Impact: Neutral

Future Enhancements (Not Implemented)

  1. Flash Persistence (CONFIG_DATA_BUFFER_PERSIST):

    • Save buffer to NVS flash on power-down
    • Restore on boot (survives reboots)
    • Useful for long outages or crashes
  2. Priority Queuing:

    • Tag readings as high/low priority
    • Transmit high-priority first
    • Drop low-priority on overflow
  3. Compression:

    • Delta encoding for consecutive readings
    • Reduce buffer size or increase capacity
  4. Network Quality Hints:

    • Track transmission success rate
    • Adjust buffer size dynamically
    • Pre-emptive buffering on weak signal

Configuration

# Data buffer size (number of readings)
# Default: 48 (240 seconds at 5s interval)
# Range: 8-128
CONFIG_DATA_BUFFER_SIZE=48

# Enable flash persistence (future)
CONFIG_DATA_BUFFER_PERSIST=n

Compile-Time Tunables:

  • CONFIG_DATA_BUFFER_SIZE - Max readings to buffer
  • Modify in include/data_buffer.h if not using Kconfig

Known Limitations

  1. No persistence across reboots - Buffer cleared on reset
  2. Fixed FIFO order - No priority queuing
  3. No compression - Full-size readings stored
  4. Single buffer - Not per-sensor-type

These are acceptable tradeoffs for the initial implementation.

Verification

Build Status:

  • Source files: Created and added to build system
  • Integration: Complete in main.c
  • API: Fully documented in header
  • Thread safety: Mutex-protected throughout

Code Quality:

  • Follows Zephyr coding style
  • Comprehensive error handling
  • Detailed logging at appropriate levels
  • No dynamic memory allocation
  • Clear, maintainable structure

Conclusion

The offline data buffering implementation successfully addresses PROBE-DP-001. The solution:

  • Prevents data loss during network outages
  • Automatically flushes when connectivity returns
  • Gracefully handles overflow conditions
  • Integrates transparently with existing code
  • Maintains thread safety and memory efficiency

Ready for hardware testing and validation.