Hive Scale Code

Hive Scale Code Evaluation

Date: March 11, 2026
Firmware: 2026.03.11.1
Dataset: Mar 8–11 2026 (Database Export)


1. Catastrophic Instability (Primary Problem)

Data Timeline

Period Weight Temp (°F) Humidity
Mar 8, 15:00–23:00 0 → ~21 lbs (setup) 75 → 64 65–88%
Mar 8, 23:00 – Mar 9, 05:00 21 → ~10 lbs (gradual drift) 64 → ~40 85–93%
Mar 9, 05:00 – 20:00 Wild swings: +10 to −243 lbs 40–55 85–95%
Mar 9, ~20:00 Jump to ~23 lbs (re-tare?) ~70 ~70%
Mar 9, 20:00 – Mar 11, 03:00 Stable 22–28 lbs 65–80 60–92%

The instability from March 9 05:00–20:00 is not a temperature compensation problem — the raw ADC values themselves are erratic. During that period the corrected weight bounces between +8 and −243 lbs within 10-minute intervals.

Likely Causes

A. Moisture / condensation on the load cell.
At 40 °F with 90%+ humidity, condensation forms on strain gauge surfaces, creating intermittent leakage paths across the Wheatstone bridge. This produces exactly the kind of random large jumps seen — not gradual drift, but sudden wild swings. This is the single most common failure mode for outdoor load cells without IP-rated enclosures. Load cells are sealed but an additional filling of silicone has been applied

B. ADS1220 register corruption.
There is no periodic verification that the ADS1220 configuration registers still hold the correct values. An EMI spike or power glitch can flip a bit in the gain or mux register, causing the ADC to read a completely different signal. At Gain = 128, flipping even one bit in REG0 could change the channel, gain, or PGA bypass.

C. The outlier filter is too permissive and self-defeating.
The 50 lb sanity limit is enormous for a hive scale. If a bad value sneaks through (say −36 lbs), it becomes part of currentLbs via the moving average, and then the next bad value only needs to be within 50 lbs of that — so −86 lbs passes. The reference keeps chasing the bad data downward. This ratcheting effect explains how the system eventually reaches −243 lbs.

D. SPI bus interference from OneWire timing.
Every loop iteration calls readTempF() twice (bit-banging OneWire protocol with interrupt masking for ~750 ms each at 12-bit resolution) right before reading the ADS1220 over SPI. The OneWire library disables interrupts during timing-critical bit operations, which can affect SPI state.


2. Weight Calculation vs. Industrial Standard

The Standard Two-Parameter Thermal Compensation Model

Per OIML R76, NIST Handbook 44, and load cell datasheets:

Z(T) = z₀ + z₁ · (T - T₀)        — temperature-dependent zero offset
G(T) = g₀ + g₁ · (T - T₀)        — temperature-dependent gain / span

Ŵ(t) = [ y(t) - Z(T(t)) ] / G(T(t))

       y(t) - [ z₀ + z₁ · (T(t) - T₀) ]
Ŵ(t) = ─────────────────────────────────────
           g₀ + g₁ · (T(t) - T₀)

Where:

  • y(t) = raw ADC reading
  • z₀ = zero offset at calibration temperature T₀
  • z₁ = zero temperature coefficient (raw counts per °F)
  • g₀ = gain at T₀ (raw counts per lb)
  • g₁ = gain temperature coefficient (change in counts/lb per °F)

What the Code Currently Implements

rawLbs = (avgRaw - zeroOffset) / countsPerLb;
corrected = rawLbs + tempCoeff * (currentTempHive1 - calTempF);

Which expands to:

Ŵ(t) = (y(t) - z₀) / g₀  +  c₁ · (T - T₀)

Gaps

Industrial Model Current Code Problem
z₁ (zero temp coeff) Not modeled (z₁ = 0) Zero drift with temperature is uncompensated in raw domain
g₁ (gain temp coeff) Not modeled (g₁ = 0) Span drift is uncompensated; countsPerLb is constant
Correction scales with load Flat additive tempCoeff × ΔT Adds same lbs/°F whether hive weighs 10 lbs or 100 lbs
Separate zero & span terms Single merged coefficient The 0.660 lbs/°F is only accurate at the load it was fitted (~24 lbs)

Proper Implementation

float deltaT   = currentTempHive1 - calTempF;
float zeroAdj  = (float)zeroOffset + z1 * deltaT;   // z₀ + z₁·ΔT
float gainAdj  = countsPerLb      + g1 * deltaT;    // g₀ + g₁·ΔT
float correctedLbs = (float)(avgRaw - (long)zeroAdj) / gainAdj;

Determining z₁ and g₁ requires multi-temperature calibration:

  • z₁: Tare at two different temperatures → z₁ = (zero_hot − zero_cold) / (T_hot − T_cold)
  • g₁: Load known weight at two different temperatures → g₁ = (cpl_hot − cpl_cold) / (T_hot − T_cold)

3. Software Architecture Issues

A. DS18B20 Blocking Reads Starve All Other Tasks

Each requestTemperatures() blocks for ~750 ms at 12-bit resolution. Two sensors = ~1.5 s per loop where OTA, the web server, and the watchdog get no service. Switch to non-blocking reads:

tempHive1.setWaitForConversion(false);
tempHive1_1.setWaitForConversion(false);
// Issue requestTemperatures(), read result on the NEXT loop pass

B. Moving Average Buffer Initialized to Zero

The buffer starts filled with zeros. The first real reading (say 25 lbs) gets averaged with nine zeros, producing ~2.5 lbs. After reboot mid-operation this severely biases readings until the buffer fills. Seed with the first valid reading instead.

C. Dead-Band Filter Ratchets with Bad Data

The dead-band compares smoothedLbs (moving average) to currentLbs (last accepted). Bad values entering the moving average gradually shift smoothedLbs. Once it crosses the 0.05 lb threshold, the corrupted value is accepted. The next corrupted value only needs to shift 0.05 lbs further. This creates unlimited drift accumulation.

D. DRDY Timeout Returns 0 — a Valid Reading

readADS1220Raw() returns 0 on timeout. The caller checks if (raw == 0) continue;, but 0 is a theoretically valid ADC output near the zero point. Non-zero but corrupted values from partial SPI reads pass through unchecked. Use INT32_MIN or a flag instead.

E. No ADS1220 Health Monitoring

Config registers are written once during initADS1220() and never verified again. A single corrupted register can silently produce garbage for hours.

F. NVS Flash Wear

calTempF is saved to NVS on every tare/calibrate. While infrequent, Preferences doesn't do wear-leveling within a single key. Over years of deployment this could wear the flash sector.


4. Recommendations (Priority Order)

  1. Weatherproof the load cell and wiring. Conformal coat connections, use IP65+ junction boxes, and consider potted cable terminations. The March 9 failure signature (wild random jumps at low temp + high humidity) strongly indicates moisture ingress.

  2. Add ADS1220 register verification. Every N readings, read back all 4 config registers and compare against expected values. Re-initialize on mismatch.

  3. Implement proper two-parameter thermal model. Determine z₁ and g₁ from multi-temperature calibration to separate zero drift from span drift.

  4. Switch to non-blocking DS18B20 reads. Issue requestTemperatures() and read 750 ms later rather than blocking.

  5. Replace mean averaging with median filtering. A 5-sample median rejects up to 2 outliers automatically without threshold tuning.

  6. Use a better sentinel for DRDY timeout. Return INT32_MIN instead of 0.

  7. Seed the moving average buffer with the first valid reading rather than zeros.

  8. Add a rate-of-change limiter. If smoothed weight changes by more than ~2 lbs between consecutive iterations (physically impossible for a beehive over 3 seconds), flag as suspicious.


5. Summary

The code is well-organized with many good features (OTA, web interface, NVS persistence, ring-buffer logging). The primary issues are:

  • The weight calculation uses a simplified additive thermal correction instead of the industrial model with separate zero and span temperature coefficients
  • The catastrophic March 9 instability is almost certainly a hardware/environmental issue (moisture) that software outlier rejection fails to contain because the filter reference drifts with the bad data
  • DS18B20 blocking reads create 1.5 s dead zones per loop
  • There is no ADS1220 register verification, so a corrupted register silently produces garbage


Part 2: Calibration Data Review (hive_thermal_analysis_v2)

Source: hive_thermal_analysis_v2.html + analysis_data.json
Analysis by: Claude (prior session), reviewed against Part 1 findings
Data: 360 points, Mar 8–10 2026, SB210-150K load cell, 24.55 lb cal weight


6. What the Calibration Analysis Gets Right

  • SPI failure diagnosis is solid. Raw ADC values hitting −1.0M to −1.8M counts is unambiguously a hardware/communication failure, not software. Both evaluations agree.
  • Bogus-flagging is correct. The ~70 flagged points were properly excluded from the regression.
  • Regression methodology is sound. Using two clean windows (A: Mar 9 19:00–Mar 10 07:30, B: Mar 10 19:00–23:30) was the right approach given the available data.
  • NVS persistence of T₀ and posting both corrected/uncorrected weights is good practice.

7. Critical Finding: The Correction Has the Wrong Sign

The uncorrected data (from before thermal compensation was deployed) shows:

Temp Uncorrected Weight True Weight Error
74.8°F 24.66 lbs 24.55 lbs +0.11 (essentially correct)
68.5°F 27.17 lbs 24.55 lbs +2.62 (reads too high)
57.3°F 20.24 lbs 24.55 lbs −4.31 (reads too low)

Higher temperature produces higher uncorrected readings. The regression correctly captures this: slope = +0.660 lbs/°F. The HTML text states correctly: "Below [74°F] the scale under-reads; above it over-reads."

But the recommended code applies the correction in the wrong direction:

correctedLbs = rawLbs + tempCoeff * (currentTempHive1 - calTempF);
// tempCoeff = +0.660

At 57.3°F (where the scale already reads too low):

20.24 + 0.660 × (57.3 − 74) = 20.24 − 11.02 = 9.22 lbs

The correction subtracts from an already-too-low reading, making it dramatically worse. The true weight is 24.55 lbs.

Why the Sign Is Inverted

The regression models: displayed_weight = true_weight + slope × (T − T₀)

To invert and recover the true weight: true_weight = displayed_weight − slope × (T − T₀)

The firmware formula should be:

correctedLbs = rawLbs - tempCoeff * (currentTempHive1 - calTempF);
//                    ^ MINUS, not plus

Or equivalently, keep the + and use tempCoeff = -0.660.

Impact on Current Firmware

At any temperature below 74°F (which is most of the time outdoors), the correction subtracts weight from an already-low reading. At 50°F:

correction = +0.660 × (50 − 74) = −15.84 lbs

A hive that should read 60 lbs would display ~44 lbs.


8. Critical Finding: The 0.660 Coefficient Is Inflated by Creep

Even with the sign fixed, 0.660 lbs/°F is approximately 4–10× too large. The regression is contaminated by load cell creep (time-dependent mechanical settling under constant load).

Proof: Windows A and B Disagree at the Same Temperature

At ~64°F:

  • Window A (~14 hours after loading): 25.5–25.7 lbs
  • Window B (~28 hours after loading): 26.6–26.7 lbs

That's a ~1.0 lb difference at the same temperature. The only variable is time. This is creep.

Proof: Window B Drifts at Constant Temperature

At a flat 65.8°F, consecutive readings in Window B:

Time Temp Weight Raw ADC
20:25 65.8°F 27.02 226,862
20:35 65.8°F 26.97 226,033
20:45 65.8°F 26.94 225,456
20:56 65.8°F 26.91 225,052
21:06 65.8°F 26.87 224,710
21:16 65.8°F 26.84 224,167

Temperature is flat. Weight drops 0.18 lbs in 50 minutes — pure creep at ~0.22 lbs/hour.

Over the full 4 hours of Window B, creep accounts for ~0.9 lbs. The total weight change across the 4.8°F temperature swing was only ~0.7 lbs. Creep exceeds the apparent thermal signal.

How Creep Inflates the Regression

The combined regression fits a line through:

  • Window A: lower temps + lower weights (earlier creep state)
  • Window B: higher temps + higher weights (later creep state)

The creep offset between windows gets attributed to temperature, inflating the slope from a true ~0.05–0.15 lbs/°F to 0.660. The R² = 0.87 is misleading — creep creates a time-correlated signal that looks like temperature correlation.

Estimated True Thermal Coefficient

Isolating just the first few points of Window B (where creep has minimal time to act):

Temp Change Weight Change Time Span
68.5 → 66.6°F (−1.9°F) 27.17 → 27.17 (0.00 lbs) 30 min

The weight doesn't change at all over a 1.9°F temperature drop in the first 30 minutes. The true thermal coefficient at this load is likely 0.05–0.15 lbs/°F, possibly even smaller.


9. Revised Stance on z₁/g₁ Separation

The HTML analysis argues: "you don't need to separate [z₁/g₁] for firmware, one coefficient handles both."

Updated position: Agree. A single combined coefficient IS fine for a beehive scale. The full two-parameter industrial model (from Part 1, Section 2) is overkill until accuracy requirements tighten beyond ±0.5 lbs. The Part 1 recommendation for z₁/g₁ separation was too aggressive for the current use case.

However, the single-coefficient approach only works when:

  1. The coefficient is derived from creep-settled data (which this dataset is not)
  2. The sign is correct (which it currently is not)
  3. The operating load range doesn't change dramatically (reasonable for a hive)

10. Updated Recommendations

Immediate (today)

  1. Set tempCoeff = 0.0 to disable thermal correction. The current correction makes readings worse, not better. No correction is better than a wrong-sign, over-magnitude correction.

Short-term (this week)

  1. Fix SPI reliability. Add ADS1220 register read-back verification every N readings. Re-initialize on mismatch.
  2. Tighten the outlier filter. Reduce the sanity limit from 50 lbs to 5–10 lbs. Replace the 5-sample mean with a 5-sample median.

Next Calibration Run

  1. Load the 24.55 lb weight and wait 48+ hours before starting data collection. Creep settles logarithmically — most happens in the first 24h. The HTML analysis acknowledges this ("do a dedicated empty run after the hive has been sitting loaded for 48+ hours") but didn't follow its own advice for the loaded-run regression.
  2. Let it run for 2+ full day/night temperature cycles after creep settles.
  3. Derive the thermal coefficient from that clean data. Expect a value in the 0.05–0.20 lbs/°F range.
  4. Apply with the correct sign: rawLbs - tempCoeff * (T - T₀).

11. Stance Change Summary

Topic Part 1 Evaluation After Calibration Data Review
SPI failures Moisture/condensation Could be connector OR moisture — remains #1 priority
z₁/g₁ separation Needed for industrial standard Overkill for now; single coefficient is fine
tempCoeff = 0.660 Didn't question the value Inflated ~4–10× by creep AND applied with wrong sign
R² = 0.87 Seemed decent Misleading — creep creates false time-correlated signal
Correction formula rawLbs + coeff × ΔT Sign is inverted; should be rawLbs − coeff × ΔT
Priority order Hardware → thermal model → filtering Hardware → disable broken correction → clean cal run → filtering

Part 3 — Firmware Changes Implemented (v2026.03.11.3)

12. Ambient Temperature Source Fix

Problem: The original code used currentTempHive1 (DS18B20 inside the hive) for both the thermal compensation formula and the calibration temperature snapshot. Hive internal temperature is 90–100 °F and nearly constant — it does not track the outdoor/ambient temperature that actually drives load cell drift.

Fix: Added a manually entered ambientTempF variable (web UI + serial + NVS-persistent). The thermal compensation formula and all tare/calibrate T₀ snapshots now prefer ambient temp over hive internal temp.

  • Web page: new Thermal Compensation section with input fields for Ambient °F and TempCoeff lbs/°F.
  • Routes: POST /setambient, POST /settcoeff
  • Serial: a command to set ambient temp interactively.
  • NVS key: "ambtemp"

13. Outlier Filter Bug After Calibration

Problem: After calibrating, currentLbs retained the stale pre-calibration value (e.g. −92 lbs). The outlier filter then rejected valid post-calibration readings (e.g. 24.9 lbs) because |24.9 − (−92)| = 116.9 > 50 lb sanity limit.

Fix: Both calibrateWithKnownWeight() and calibrate() now reset currentLbs = 0.0 and currentLbsUncorrected = 0.0 after saving, matching the existing behavior in tare().

14. Sign Correction Applied

The thermal formula was changed from rawLbs + coeff × (T − T₀) to rawLbs − coeff × (T − T₀) in v2026.03.11.2. This is preserved and now operates on ambientTempF instead of currentTempHive1.

15. Industrial Standard Compliance — z₁ Only vs. z₁ + g₁

The full industrial model has two terms:

corrected = raw − z₁×(T−T₀) − g₁×(T−T₀) × raw
  • z₁ (zero drift): flat offset shift with temperature, independent of load.
  • g₁ (span/gain drift): sensitivity change with temperature, proportional to load.

Our implementation uses z₁ only: corrected = rawLbs − tempCoeff × (ambientTempF − calTempF).

Why this is sufficient for beehive monitoring:

The g₁ term scales with both load and temperature change. For a ~50 lb hive with a 30 °F daily swing, the g₁ error on a typical load cell is 0.01–0.05 lbs — well below the signals that matter for beekeeping decisions:

Event Typical magnitude
Nectar flow 1–3 lbs/day gain
Swarm 5–8 lb sudden drop
Robbing Steady decline over hours
Winter consumption ~0.5 lb/week

Isolating g₁ from z₁ requires calibrating with multiple known weights at multiple temperatures (lab-grade procedure). A single combined z₁ coefficient derived from a creep-settled 48h+ run is practical and accurate enough for this application.

16. Next Steps — 48-Hour Calibration Run

  1. Leave a constant known weight on the scale for 48+ hours (do not tare or recalibrate).
  2. After 48 hours, query InfluxDB for hive2_weight (or hive2_weight_uncorr) alongside tempf (ambient temp already in the database).
  3. Scatter plot weight vs. tempf — discard the first 48 hours of data (creep contamination).
  4. Linear regression slope = tempCoeff (expected range: 0.05–0.20 lbs/°F).
  5. Enter the coefficient via the web page Thermal Compensation section. It persists across reboots via NVS.

Subscribe to Smoke House Apiaries

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe