Configuration Guide
This guide walks you through configuring the Health Misinformation Detection Platform for your research environment.
Environment Setup
1. Create Environment File
Copy the example environment file and customize it:
2. Reddit API Configuration
Reddit API Setup
You'll need to create a Reddit application to get API credentials.
- Go to Reddit Apps
- Click "Create App" or "Create Another App"
- Choose "script" as the application type
- Fill in your application details
Add your credentials to .env
:
# Reddit API Configuration
REDDIT_CLIENT_ID=your_reddit_client_id
REDDIT_CLIENT_SECRET=your_reddit_client_secret
REDDIT_USER_AGENT="MisinformationResearch/1.0 by YourUsername"
3. Database Configuration
Database Required
PostgreSQL with pgvector extension is required for full functionality.
# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/misinformation_db
4. Translation Services (Optional)
For multilingual analysis, configure Google Translate:
5. Application Settings
Customize application behavior:
# Gradio Configuration
GRADIO_SHARE=False
GRADIO_PORT=7860
# Data Collection Settings
MAX_POSTS_PER_SUBREDDIT=1000
DATA_COLLECTION_INTERVAL_HOURS=24
# Analysis Settings
MIN_COMMENT_LENGTH=10
MAX_NETWORK_NODES=5000
# Logging
LOG_LEVEL=INFO
LOG_FILE=logs/misinformation_analysis.log
6. Vector Database Settings
# pgvector database settings
PGVECTOR_DIMENSION=384
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Semantic analysis settings
SIMILARITY_THRESHOLD=0.7
CLUSTERING_MIN_POSTS=5
PROPAGATION_TIME_WINDOW_HOURS=72
Database Setup
PostgreSQL Installation
Database Initialization
-
Create Database:
-
Run Migrations:
-
Verify Setup:
Configuration Validation
Test Your Setup
# Test Reddit API connection
python -c "from src.reddit_scraper import RedditScraper; RedditScraper().test_connection()"
# Test database connection
python -c "from src.data_persistence import DataPersistenceManager; DataPersistenceManager().test_connection()"
# Test translation service (if configured)
python -c "from src.translation_service import TranslationService; TranslationService().test_connection()"
Configuration Troubleshooting
Common Issues
- Reddit API Rate Limits: Ensure your user agent is descriptive and unique
- Database Connection: Verify PostgreSQL service is running and credentials are correct
- Translation API: Check Google Cloud credentials and API key validity
- Permissions: Ensure the application has write access to data/ and logs/ directories
Next Steps
With your configuration complete, you can: