Scraping & Product Data
Switchyard scrapes product data from 7 retailers daily. This guide explains the data flow, architecture, and how to use the Scraped Products dashboard.Supported Retailers
| Retailer | Scraper | Store ID Format |
|---|---|---|
| HEB | heb_scraper.py | heb-{store_number} |
| Walmart | walmart_scraper.py | walmart-{store_number} |
| Target | target_scraper.py | target-{store_number} |
| Costco | costco_scraper.py | costco-{warehouse_number} |
| Whole Foods | whole_foods_scraper.py | whole_foods-{store_id} |
| Central Market | central_market_scraper.py | central_market-{store_id} |
| Trader Joe’s | trader_joes_scraper.py | trader_joes-{store_id} |
Data Flow
What Scrapers Collect
Product Data (source_products)
On first discovery of a new barcode, scrapers create a product record:- name - Product name from retailer
- barcode - UPC/EAN (canonical identifier)
- brand - Brand name
- image_url - Primary product image
- size - Package size (e.g., “16 oz”)
- size_uom - Unit of measure
- category_id - Mapped to Goods taxonomy
- description - Product description
Availability Data (goods_retailer_mapping)
Updated on every scrape run:- store_name - Which retailer (heb, walmart, etc.)
- retailer_location_id - Specific store ID
- store_item_id - Retailer’s internal SKU
- store_location - Aisle location (e.g., “A12”, “Aisle 5”)
- is_active - Currently available at this store
- last_seen_at - Timestamp of last successful scrape
Pricing Data (goods_retailer_pricing)
New record inserted when price changes:- price - Effective price
- list_price - Regular price
- sale_price - Sale price (if on sale)
- is_on_sale - Currently on sale flag
- price_per_unit - Unit pricing (e.g., $0.25/oz)
- effective_from - When this price started
- effective_to - When this price ended (null = current)
Write Separation
Scrapers have limited write access to protect data integrity:| Table | Scraper Permissions |
|---|---|
source_products | Create new products only. Cannot update existing core attributes. |
goods_retailer_mapping | Full control - update availability and location on every run |
goods_retailer_pricing | Insert only - new price records, never update/delete |
goods_product_attributes | No access - admin-only curation decisions |
inventory | No access - admin-only RFC stock management |
Scraped Products Dashboard
The admin dashboard includes a Scraped Products page for monitoring all scraped data.List View
Displays all 114K+ products with:- Product image and name
- Barcode (UPC)
- Category
- Number of retailers carrying the product
- Lowest current price across retailers
- Last seen timestamp (data freshness)
- Whether product is already sellable
Detail View
Shows a single product with: Retailer Comparison Table:| Retailer | Price | Sale | Unit Price | Aisle | Available | Last Seen |
|---|---|---|---|---|---|---|
| HEB | $4.99 | - | $0.31/oz | A12 | Yes | 2 hrs ago |
| Walmart | $4.47 | $3.97 | $0.25/oz | A5 | Yes | 1 hr ago |
| Target | $5.29 | - | $0.33/oz | B23 | No | 3 days ago |
Running Scrapers
Scrapers can be run from the admin dashboard or via CLI.From Admin Dashboard
- Navigate to Scrapers in the sidebar
- Select a retailer and store
- Click Run
From CLI
Data Freshness
Scrapers updatelast_seen_at on every successful run. The dashboard shows freshness indicators:
| Status | Meaning |
|---|---|
| Fresh | Seen within 24 hours |
| Recent | Seen within 7 days |
| Stale | Not seen for 7+ days |
is_active = false and deactivation_reason = 'DISCONTINUED'.
Category Mapping
Scrapers map retailer-specific categories to the Goods taxonomy:category_mapping.py module handles this translation for each retailer.
Troubleshooting
Product not appearing
- Check if barcode is valid (6+ characters)
- Verify scraper completed successfully
- Check
goods_retailer_mappingfor the product/store combo
Price not updating
- New prices are inserted, not updated
- Check
effective_to IS NULLfilter for current prices - Verify the scraper is writing to correct store ID
Missing aisle location
Not all retailers provide aisle information. Checkstore_location field:
- HEB: Usually available
- Walmart: Available via store-specific API
- Target: Available via store-specific API
- Others: May be limited or unavailable