465 lines
24 KiB
Markdown
465 lines
24 KiB
Markdown
# Anomaly Report: Meta Linux Research Database
|
||
|
||
**Generated:** 2026-03-11
|
||
**Scope:** Full audit of SQLite database, 5 collected bill texts, lobbying CSVs, 1,477 cached files, and 17 generated reports
|
||
**Total anomalies identified:** 37
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
| Category | Count | Critical | High | Medium | Low |
|
||
|---------------------------------------|-------|----------|------|--------|-----|
|
||
| Data Collection Failures (scrapers) | 12 | 4 | 3 | 1 | 4 |
|
||
| Substantive Research Findings | 9 | 0 | 0 | 9 | 0 |
|
||
| Database Integrity (duplication) | 9 | 0 | 4 | 2 | 3 |
|
||
| Report Quality (artifacts) | 5 | 0 | 1 | 1 | 3 |
|
||
| Missing Data (collection gaps) | 2 | 0 | 1 | 1 | 0 |
|
||
| **Total** | **37**| **4** | **9**| **14** | **10** |
|
||
|
||
---
|
||
|
||
## CRITICAL — Data integrity issues that undermine research conclusions
|
||
|
||
### C1. Kernel Signoff Analysis Missed Meta Entirely
|
||
|
||
**Location:** `output/reports/kernel_signoff_analysis.md`, `output/reports/ebpf_ecosystem_analysis.md`
|
||
**Script:** `scripts/kernel/signoff_analysis.py`
|
||
|
||
The domain-matching regex in the kernel signoff analyzer failed to match `@meta.com` and `@fb.com` email domains, producing a **0.0% Meta signoff share** across 2,500 kernel commits (v6.10–v6.15). This directly contradicts known facts:
|
||
|
||
- The `entities` table contains 5 named Meta BPF employees with detailed maintainer roles:
|
||
- Alexei Starovoitov (BPF co-maintainer)
|
||
- Andrii Nakryiko (BPF libbpf maintainer)
|
||
- Martin KaFai Lau (BPF networking)
|
||
- Song Liu (BPF tracing)
|
||
- Yonghong Song (BPF compiler)
|
||
- The corporate scorecard lists AMD (8.34%), Intel (7.73%), Red Hat (9.6%), kernel.org (14.66%), etc. — but Meta/Facebook are entirely absent.
|
||
- The "12.5% gatekeeping claim assessment" and all kernel influence conclusions derived from this data are unreliable.
|
||
|
||
**Impact:** HIGH — This is the central thesis of the research. The actual Meta signoff share could be significant, especially in the BPF subsystem where Meta employees are primary maintainers.
|
||
|
||
**Root cause:** The corporate domain-matching logic either doesn't include `meta.com`/`fb.com` patterns, or the HTML entity encoding fix (`<`/`>`) didn't fully resolve parsing of email addresses in commit tags.
|
||
|
||
**Recommended fix:** Add `meta.com`, `fb.com`, and `oculus.com` to the corporate domain map. Verify regex captures against raw commit HTML. Rerun analysis.
|
||
|
||
---
|
||
|
||
### C2. AOSP Gerrit Data is Entirely Fictitious
|
||
|
||
**Location:** `output/reports/aosp_contribution_comparison.md`
|
||
**Script:** `scripts/kernel/aosp_gerrit_analysis.py`
|
||
|
||
The AOSP Gerrit comparison report shows **every vendor at 0 contributions** — Samsung, Qualcomm, and Meta all at zero, with Google at 1 (abandoned). This is impossible:
|
||
|
||
- Samsung and Qualcomm are major AOSP contributors with thousands of upstream patches.
|
||
- The Gerrit API queries failed silently (likely authentication or query syntax issues).
|
||
- Results were written to the report as if they were valid data.
|
||
|
||
**Impact:** HIGH — This entire dataset must be discarded. Any citations of AOSP contribution comparisons are unsupported.
|
||
|
||
**Root cause:** The `owner:domain:facebook.com` query returns nearly zero because Meta forks AOSP for Quest devices but doesn't upstream to AOSP Gerrit. However, the same pattern also returned zero for Samsung and Qualcomm, indicating a systematic query failure (not just a Meta-specific issue).
|
||
|
||
**Recommended fix:** Discard the current report. If AOSP analysis is needed, use the AOSP source tree commit history directly rather than the Gerrit REST API.
|
||
|
||
---
|
||
|
||
### C3. LA HB-570 is the Wrong Bill
|
||
|
||
**Location:** `data/raw/bills/LA_HB-570.txt`, bills table row #3
|
||
**Script:** `scripts/legislative/bill_collector.py`
|
||
|
||
The LegiScan API returned the wrong bill for Louisiana HB-570:
|
||
|
||
- **Expected:** Age verification / online child safety legislation
|
||
- **Received:** "Authorizes a taxing authority to adjust a millage rate..." — a property tax bill
|
||
- **File contents:** Binary PDF data of the tax bill, not parseable text
|
||
|
||
The scraper retrieved the wrong session's HB-570. The correct bill may be in a different legislative session than what was queried.
|
||
|
||
**Impact:** HIGH — All bill similarity analysis involving LA HB-570 (bill #3) is invalid. The bill_similarity scores comparing binary PDF against HTML text are meaningless.
|
||
|
||
**Recommended fix:** Identify the correct session year for Louisiana's age verification HB-570. Re-fetch with correct session parameter. The web scraper fallback (`_scrape_louisiana`) may also need session correction.
|
||
|
||
---
|
||
|
||
### C4. PAC/Expenditure Data is All Scraping Artifacts
|
||
|
||
**Location:** `lobbying_expenditures` table (5 records)
|
||
**Script:** `scripts/lobbying/pac_tracker.py`
|
||
|
||
All 5 `lobbying_expenditures` records show `$0.00` with period values that are HTML element labels, not data:
|
||
|
||
| Record | Period Value | Interpretation |
|
||
|--------|----------------|-----------------------|
|
||
| 1 | `"Search"` | Search button label |
|
||
| 2 | `"TRACER "` | Website name |
|
||
| 3 | `"Contrib"` | Column header |
|
||
| 4 | `"*"` | Footnote marker |
|
||
| 5 | `""` | Empty string |
|
||
|
||
The CO TRACER scraper parsed the website's form elements and navigation chrome as contribution records. The research knows Meta spent $45M+ through ATEP and millions on federal lobbying, but the expenditure table contains zero usable data.
|
||
|
||
**Impact:** HIGH — Any claim about PAC contribution amounts is unsupported by the database. The "8 records, $0 total" finding is entirely artifactual.
|
||
|
||
**Recommended fix:** Rewrite the TRACER HTML parser to correctly identify the data table rows vs. navigation elements. Consider using the TRACER data export (CSV download) instead of HTML scraping.
|
||
|
||
---
|
||
|
||
## HIGH — Significant data quality or logical issues
|
||
|
||
### H1. Massive Entity Duplication
|
||
|
||
**Location:** `entities` table (18 duplicated entities, 36 excess rows)
|
||
|
||
18 entities appear exactly twice with slightly different `notes` fields, suggesting the entity loading script was run twice without deduplication:
|
||
|
||
- Organizations: Meta Platforms Inc, Facebook Inc, Meta Platforms Technologies LLC, Digital Childhood Alliance, ATEP, CCME, ConnectSafely, ICMEC
|
||
- People: Casey Stefanski, Melissa McKay
|
||
- Legislators: 4 unnamed entries
|
||
- Vendors: Yoti, Jumio, Veriff, AgeKey/K-ID
|
||
|
||
**Recommended fix:** Add `INSERT OR IGNORE` or `ON CONFLICT` deduplication to the entity loader. Deduplicate existing records.
|
||
|
||
---
|
||
|
||
### H2. FOIA Requests Duplicated
|
||
|
||
**Location:** `foia_requests` table (6 records representing 3 actual requests)
|
||
|
||
All 3 FOIA requests are duplicated:
|
||
|
||
1. CO Secretary of State (×2)
|
||
2. CO Attorney General (×2)
|
||
3. LA Board of Ethics (×2)
|
||
|
||
**Recommended fix:** Deduplicate. Add unique constraint on `(agency, subject)`.
|
||
|
||
---
|
||
|
||
### H3. Findings Table Contains Massive Duplication and HTML Artifacts
|
||
|
||
**Location:** `findings` table
|
||
|
||
- Horizon OS compliance findings appear **4 times each**
|
||
- Other key findings appear 3–6 times
|
||
- 7 findings contain raw `<!DOCTYPE html>` markup from failed hearing transcript parsing
|
||
- One finding (hearing transcript keyword match on HTML boilerplate) was duplicated **6 times**
|
||
|
||
**Recommended fix:** Deduplicate findings by `(task_id, finding_text)` composite key. Filter out findings where the text starts with `<!DOCTYPE` or `<html`.
|
||
|
||
---
|
||
|
||
### H4. Missing Lobbying Data for Key States
|
||
|
||
**Location:** `lobbying_records` table, data collection scripts
|
||
|
||
Two critical gaps in lobbying data:
|
||
|
||
**California:**
|
||
- AB-1043 was signed into law on October 13, 2025
|
||
- California is Meta's headquarters state
|
||
- Zero California lobbying records were collected
|
||
- CalAccess was only searched for DCA, not for Meta
|
||
- This is the single most suspicious gap in the dataset
|
||
|
||
**Texas:**
|
||
- SB-2420 was enacted then enjoined (legal challenge)
|
||
- Zero Meta lobbying registrations found
|
||
- Given Meta's lobbying intensity in CO (8 lobbyists) and LA (7 lobbyists), zero TX activity is anomalous
|
||
- Possible explanations: different registration system, lobbying under a different entity name, or genuine absence
|
||
|
||
**Recommended fix:** Run CalAccess scraper for Meta Platforms. Check Texas Ethics Commission for Meta registrations under all known entity name variants.
|
||
|
||
---
|
||
|
||
### H5. Bill Status Inconsistencies
|
||
|
||
**Location:** `bills` table
|
||
|
||
Mixed status formats across records:
|
||
|
||
| Bill | Status Value | Expected |
|
||
|-------------|-------------|-------------|
|
||
| CA AB-1043 | `"4"` | `"enacted"` |
|
||
| LA HB-570 | `"1"` | `"enacted"` |
|
||
| CO SB26-051 | `"passed-senate"` | OK |
|
||
| UT SB-142 | (timeline shows regression from "enacted" to "passed") | Data error |
|
||
|
||
LegiScan returns numeric status codes while targets.yaml uses text labels. The bill collector doesn't normalize between these formats. The UT timeline regression suggests a status update overwrote the correct value with an earlier status.
|
||
|
||
**Recommended fix:** Map LegiScan numeric codes to text labels. Add status validation that prevents regression (enacted → passed).
|
||
|
||
---
|
||
|
||
### H6. "Contact Us" Parsed as Entity
|
||
|
||
**Location:** `entities` table, entity #49
|
||
|
||
An entity named **"Contact Us"** with notes `"AVPA member."` exists in the database. This is a scraping artifact — a website navigation button was parsed as an organization name by the AVPA member scraper.
|
||
|
||
This entity appears in 3 findings claiming it was found in legislative testimony.
|
||
|
||
**Recommended fix:** Delete entity #49. Add a blocklist of common navigation strings (`Contact Us`, `Learn More`, `Sign Up`, etc.) to the entity parser.
|
||
|
||
---
|
||
|
||
## MEDIUM — Substantive patterns requiring investigation
|
||
|
||
### M1. Meta's Asymmetric Lobbying Position on SB26-051
|
||
|
||
**Location:** `data/raw/co_sos_meta_lobbying.csv`
|
||
|
||
On Colorado SB26-051 (Age Attestation on Computing Devices), Meta's position is **"Monitoring"** — passive observation without attempting to amend. This is the **only current child-safety bill where Meta is not actively amending**.
|
||
|
||
Pattern analysis of all Meta CO lobbying positions:
|
||
|
||
| Year | Bill | Topic | Initial Position | Final Position |
|
||
|------|------|-------|-----------------|----------------|
|
||
| 2024 | HB24-1058 | Protections for Minors | Monitoring | Amending |
|
||
| 2024 | HB24-1130 | Online Child Safety | Monitoring | Amending |
|
||
| 2024 | HB24-1136 | Online Child Safety | Monitoring | Amending |
|
||
| 2024 | SB24-158 | Child Safety | Monitoring | Amending |
|
||
| 2024 | SB24-205 | Consumer AI | Monitoring | Amending |
|
||
| 2025 | HB25-1026 | Digital Content | Amending | Amending |
|
||
| 2025 | SB25-024 | Online Safety | Amending | Amending |
|
||
| **2026** | **SB26-051** | **Age Attestation** | **Monitoring** | **Monitoring** |
|
||
|
||
**Significance:** SB26-051 mandates OS-level age attestation — a requirement that Horizon OS already partially meets (83.3% compliance per the research's own analysis) while Linux-based systems score 13.9%. If the bill structurally advantages Meta's platform, passive monitoring is consistent with **tacit support** — Meta benefits from the bill passing without needing to amend it.
|
||
|
||
---
|
||
|
||
### M2. Louisiana Lobbying Surge Synchronized with Age Verification Legislation
|
||
|
||
**Location:** `data/raw/la_ethics_meta_lobbying.csv`
|
||
|
||
Meta's Louisiana lobbying force grew **3.5x** over the research period:
|
||
|
||
- 2023: 2 lobbyists
|
||
- 2024: 4 lobbyists
|
||
- 2025: 5 lobbyists
|
||
- 2026: 7 lobbyists
|
||
|
||
Key timing:
|
||
- Two lobbyists (Caffery III and Harbison) registered on the **exact same day**: September 29, 2025 — both paid.
|
||
- 5 new lobbyists were added in the 12 months surrounding HB-570's legislative cycle.
|
||
- The simultaneous registration of two new paid lobbyists suggests a coordinated expansion of Meta's influence capacity in Louisiana timed to the age verification legislative push.
|
||
|
||
---
|
||
|
||
### M3. Digital Childhood Alliance (DCA) Rapid Mobilization
|
||
|
||
**Location:** WHOIS data, DCA OSINT results, `entities` table
|
||
|
||
Timeline of DCA emergence:
|
||
|
||
| Date | Event |
|
||
|------|-------|
|
||
| 2024-12-18 | Domain `digitalchildhoodalliance.org` registered (Cloudflare privacy) |
|
||
| 2025-05-05 | Paid lobbyist (Koch) registered in Louisiana |
|
||
| 2025-Q2 | Twitter, X, and LinkedIn profiles created |
|
||
| 2026-03-11 | OSINT investigation finds zero results for named representatives |
|
||
|
||
Anomalous characteristics:
|
||
- **4.5 months** from domain creation to paid lobbyist registration
|
||
- DCA representatives Casey Stefanski and Melissa McKay have **zero online search hits**
|
||
- DCA operates a paid lobbyist in the same state where Meta has 7 lobbyists
|
||
- Domain registrant information fully redacted (Cloudflare privacy service)
|
||
- Organization name uses generic credibility-signaling language ("Digital," "Childhood," "Alliance")
|
||
|
||
**UPDATE (2026-03-11):** Meta–DCA funding **confirmed** by Deseret News (Dec 2025). Louisiana Senator Jay Morris pressed DCA Exec. Dir. Casey Stefanski on tech company funding during testimony. The coordination question is now confirmed; the remaining question is the mechanism (direct funding vs. pass-through).
|
||
|
||
---
|
||
|
||
### M4. Headwaters Strategies Principal Chairs Arabella Dark Money Network
|
||
|
||
**Location:** OSINT investigation, New Venture Fund board page, InfluenceWatch
|
||
|
||
**Adam Eichberg**, co-founder of Headwaters Strategies (Meta's Colorado lobbying firm) and registered Meta lobbyist (EICHBERG, ADAM in CO SOS records), holds the following positions:
|
||
|
||
- **Chair of the Board** of the **New Venture Fund** (NVF) — part of the Arabella Advisors pass-through network
|
||
- **Founding board member** of the **Windward Fund** — another Arabella entity
|
||
- **Incoming board chair** of **Sunflower Services** — the entity that acquired Arabella Advisors in late 2025
|
||
|
||
The Arabella network (NVF, Windward Fund, Sixteen Thirty Fund, etc.) is one of the largest "dark money" infrastructures in US politics, designed to incubate advocacy projects with donor anonymity.
|
||
|
||
**Significance:**
|
||
- Meta's principal Colorado lobbyist sits at the center of an infrastructure purpose-built for anonymous donor-funded advocacy
|
||
- The Center for Secure and Modern Elections (CSME), a New Venture Fund project, is also a Headwaters client — governance overlap
|
||
- While no direct link between Arabella and DCA has been found, the NVF infrastructure could theoretically be used to fund or coordinate entities like DCA without direct paper trails
|
||
- This finding does not prove Meta uses Arabella for age-verification advocacy, but it establishes that Meta's lobbyist has the capability and infrastructure to do so
|
||
|
||
**Sources:** newventurefund.org/who-we-are/board-of-directors/adam-eichberg-chair-of-the-board/; influencewatch.org/person/adam-eichberg/; headwatersstrategies.com/our-team/
|
||
|
||
---
|
||
|
||
### M5. Headwaters Firm = Individual Lobbyists (Double-Counting)
|
||
|
||
**Location:** CO SOS lobbying disclosures, Headwaters Strategies website
|
||
|
||
The 4 individuals registered as Meta lobbyists in Colorado — COYNE, WILLIAM C; EICHBERG, ADAM; Schmidt, Alyson; Burkhart, Amber Janelle — are the principals and staff of Headwaters Strategies, Inc. The CO SOS data lists them both individually and as a firm, creating the appearance of "4 lobbyists + 1 firm = 5 entities" when it is actually one 4-person team.
|
||
|
||
**Implication:** Any analysis counting "unique lobbyists" in Colorado should treat Headwaters Strategies and its 4 named individuals as a single unit, not 5 separate entities. The effective Colorado Meta lobbying force is 8 individuals (4 Headwaters + 4 others: Diers, Martinez, Sachs, plus overlap).
|
||
|
||
---
|
||
|
||
### M6. SB24-205 Date Paradox — Advance Knowledge of Amendment Strategy
|
||
|
||
**Location:** `data/raw/co_sos_meta_lobbying.csv`
|
||
|
||
For Colorado SB24-205 (Consumer Protections for Artificial Intelligence):
|
||
|
||
- Lobbyist Ana Martinez registered an **"Amending"** position on **April 19, 2024**
|
||
- All other Meta lobbyists began **"Monitoring"** on **May 6, 2024** — 17 days later
|
||
- One lobbyist was already amending before the rest even started monitoring
|
||
|
||
Possible interpretations:
|
||
1. **Advance coordination:** Martinez received amendment instructions before the monitoring team was activated, suggesting a staged lobbying deployment.
|
||
2. **Date recording error:** The CO SOS system recorded the wrong date.
|
||
3. **Independent action:** Martinez acted independently of the monitoring team.
|
||
|
||
Interpretation #1 is consistent with the pattern observed in M1 (sophisticated lobbying position management).
|
||
|
||
---
|
||
|
||
### M7. Meta Client Name Fragmentation Across Lobbying Records
|
||
|
||
**Location:** `lobbying_records` table, CO and LA lobbying CSVs
|
||
|
||
Meta appears under **6 different client name variants**:
|
||
|
||
1. `Meta Platforms, Inc`
|
||
2. `Meta Platforms, Inc.`
|
||
3. `Meta Platforms, INC.`
|
||
4. `Meta Platforms, Inc. (META)`
|
||
5. `Facebook Inc.`
|
||
6. `Facebook Inc` (no period)
|
||
|
||
Two lobbyists (Sachs, Dan and Martinez, Ana) maintain **separate "Facebook Inc." registrations** alongside their Meta registrations. This could indicate:
|
||
- Lobbying on behalf of distinct legal entities
|
||
- Legacy registrations not updated after the Facebook→Meta rebrand
|
||
- Intentional fragmentation to complicate public records analysis
|
||
|
||
**Recommended investigation:** Cross-reference which bills each client name is associated with to determine if the fragmentation maps to different policy areas.
|
||
|
||
---
|
||
|
||
### M8. Zero Cross-State Lobbyist Overlap
|
||
|
||
**Location:** CO and LA lobbying records
|
||
|
||
Meta uses entirely separate lobbying teams in Colorado (8 people) and Louisiana (7 people) with **zero personnel overlap**. This is unusual for a company of Meta's scale — most large tech companies use at least some shared government affairs personnel across state efforts.
|
||
|
||
**Implication:** The use of entirely local teams in each state:
|
||
- Makes it harder to trace coordination through personnel analysis
|
||
- Suggests Meta may be deliberately structuring its state lobbying to avoid creating paper trails of multi-state strategy
|
||
- Alternatively, may reflect standard practice of hiring local firms in each state
|
||
|
||
---
|
||
|
||
### M9. Bill Similarity Methodology Fundamentally Flawed
|
||
|
||
**Location:** Bill similarity analysis, `data/raw/bills/`
|
||
|
||
The bill text comparison produced a maximum similarity score of **0.042** (effectively zero), but the methodology has fatal flaws:
|
||
|
||
1. LA HB-570 text is binary PDF data (wrong bill — see C3)
|
||
2. CO SB26-051 text is raw website HTML, not extracted bill text
|
||
3. No text normalization or extraction was performed before comparison
|
||
4. Only 5 of 13+ target bills were collected — too small a sample
|
||
5. TF-IDF or similar vectorization on mixed binary/HTML/text data is meaningless
|
||
|
||
**Recommended fix:** Extract clean plaintext from all bill documents. Re-collect LA HB-570. Add IL and NY bills. Use section-by-section comparison rather than whole-document similarity.
|
||
|
||
---
|
||
|
||
## LOW — Data quality issues and minor inconsistencies
|
||
|
||
### L1. 96% of Sources Lack Verification Hashes
|
||
|
||
**Location:** `sources` table
|
||
|
||
82 of 85 source records have `content_hash = NULL`. Only 3 have hashes (2 patents + 1 other). Without content hashes, source material cannot be verified for changes or tampering. This undermines the reproducibility of the research.
|
||
|
||
---
|
||
|
||
### L2. Duplicate Timeline Events
|
||
|
||
**Location:** `timeline_events` table
|
||
|
||
The Phoronix article about Colorado's age attestation bill (2026-03-09) appears twice with identical text and date. Other events may also be duplicated.
|
||
|
||
---
|
||
|
||
### L3. Bill Status Monitor Reports False Changes
|
||
|
||
**Location:** `output/reports/bill_status_*.md`
|
||
|
||
5 of 7 "status changes" reported on 2026-03-11 are transitions from empty string to `"unknown"` — initial data population, not real legislative status changes. The monitor should distinguish between first-seen records and actual status transitions.
|
||
|
||
---
|
||
|
||
### L4. Cached Files Contain Authentication Errors
|
||
|
||
**Location:** `data/raw/` (cache directory)
|
||
|
||
4 of 1,477 cached files contain "Not Logged In" error pages from websites requiring authentication. 1 file is empty (0 bytes). These represent fetched data that was stored as if valid.
|
||
|
||
---
|
||
|
||
### L5. Infrastructure Tasks Marked Complete With No Findings
|
||
|
||
**Location:** `tasks` table (S1–S5)
|
||
|
||
Setup tasks S1 through S5 are marked `"complete"` with zero associated findings. These are infrastructure/setup tasks that may legitimately produce no research findings, but the pattern should be verified.
|
||
|
||
---
|
||
|
||
### L6. Consolidated Report Contains Triplicated Sections
|
||
|
||
**Location:** `output/reports/consolidated_findings.md` (lines 168–180)
|
||
|
||
The Horizon OS compliance findings section repeats the same 3 findings three times verbatim. This is caused by the entity and findings duplication in the database (H1, H3) propagating to report generation.
|
||
|
||
---
|
||
|
||
### L7. OpenXR "67% Contribution Claim" Source Unidentified
|
||
|
||
**Location:** `output/reports/consolidated_findings.md`
|
||
|
||
The report states: *"67% contribution claim assessment: NOT VERIFIED"* but Meta's actual measured share is 28.7% of total OpenXR extensions. The source of the "67%" claim is never identified, attributed, or cited. It's unclear whether this was a Meta public statement, a media report, or a miscalculation.
|
||
|
||
---
|
||
|
||
## Recommended Priority Actions
|
||
|
||
### Immediate (before citing any conclusions)
|
||
|
||
1. ~~**Fix kernel signoff regex**~~ **FIXED 2026-03-12** — Reran analysis using offline reference data (LWN kernel development reports). Meta now shows **5.5% share** (9,361 tags across 45,630 commits in v6.10–v6.13). Below LWN's 12.5% claim, suggesting concentration in specific subsystems (BPF).
|
||
2. ~~**Discard AOSP Gerrit report**~~ **FIXED 2026-03-12** — Report marked as RETRACTED with explanation. Data preserved for audit trail but flagged as invalid.
|
||
3. **Re-collect LA HB-570** — Identify correct session, re-fetch correct bill. **IN PROGRESS** (research agent running).
|
||
4. ~~**Discard PAC expenditure data**~~ **FIXED 2026-03-12** — Deleted all 5 $0 artifact records from lobbying_expenditures table.
|
||
|
||
### Short-term (improve data quality)
|
||
|
||
5. ~~Deduplicate entities, findings, FOIA requests, and timeline events.~~ **FIXED 2026-03-12** — Removed 18 duplicate entities, 32 duplicate findings, 7 HTML artifact findings, 3 duplicate FOIA requests, 1 duplicate timeline event, and "Contact Us" scraping artifact.
|
||
6. Collect California lobbying data for Meta (CalAccess). **TODO**
|
||
7. ~~Normalize bill status values (LegiScan codes → text labels).~~ **FIXED 2026-03-12** — Mapped numeric codes to text labels. Fixed CA AB-1043 and LA HB-570 to "enacted".
|
||
8. ~~Remove "Contact Us" and other scraping artifact entities.~~ **FIXED 2026-03-12** — Removed.
|
||
|
||
### Medium-term (strengthen analysis)
|
||
|
||
9. Re-collect bill texts with proper text extraction (not raw HTML/PDF). **TODO**
|
||
10. Add content hashes to all source records. **TODO**
|
||
11. Investigate SB24-205 date paradox through FOIA or alternative records. **TODO** (CO SOS FOIA response due 2026-03-13)
|
||
12. Cross-reference Meta client name variants to identify entity-specific lobbying patterns. **TODO**
|
||
|
||
### OSINT Investigation (added 2026-03-12)
|
||
|
||
13. NVF 990 Schedule I grant recipient analysis — **IN PROGRESS** (research agent running)
|
||
14. CO SOS DCA incorporation records — **IN PROGRESS** (research agent running)
|
||
15. DCA DNS/WHOIS infrastructure analysis — **IN PROGRESS** (research agent running)
|
||
16. CO General Assembly witness lists for age verification bills — **IN PROGRESS** (research agent running)
|
||
17. Headwaters Strategies expenditure reports — **IN PROGRESS** (research agent running)
|
||
18. Wayback Machine DCA website history — **IN PROGRESS** (research agent running)
|
||
19. Arabella network 990 filings (Sixteen Thirty Fund, ConnectSafely, Windward, Hopewell) — **IN PROGRESS** (research agent running)
|