I frequently think up ideas for applications and I’m going to start entering them here. I like brainstorming even if I don’t decide to pursue development.
Problem: I’ve used craigslist rss feeds to search for things I’m interested in for quite some time. Apartments, computer parts, etc. What drives me nuts is over-posters. The users who delete and repost their listing as much as every single day. Recently I experienced a case where the user used huge high resolution photos which were unnecessary for the junky old switch they were selling. If I’m not interested in an item, I generally don’t want to see it again.
Solution: RSS De-duplicator
Features:
- Multi-user application
- Users create custom RSS proxy feed with applied de-duping filters
- Feeds can be aggregate of multiple external feeds
- New RSS feeds have configurable de-duping
- Configurable time frame that determines how long to look for duplicates of each new post
- Configurable thresh hold that assigns match status (IE 90% words and/or word order match)
- Ability for users to review what has posts have been marked duplicates
Implementation thoughts:
- A website would provide the control panel and RSS feed proxy
- To avoid scraping sites too often, proxy would not not connect to external feeds, only local database cache
- Service application scrapes configured sites on a schedule and stores feed information in database
- Proxy RSS feeds generate filtered results from local database