App Idea: RSS De-duplicator

I frequently think up ideas for applications and I’m going to start entering them here. I like brainstorming even if I don’t decide to pursue development.

Problem: I’ve used craigslist rss feeds to search for things I’m interested in for quite some time. Apartments, computer parts, etc. What drives me nuts is over-posters. The users who delete and repost their listing as much as every single day. Recently I experienced a case where the user used huge high resolution photos which were unnecessary for the junky old switch they were selling.  If I’m not interested in an item, I generally don’t want to see it again.

Solution: RSS De-duplicator
Features:

  • Multi-user application
  • Users create custom RSS proxy feed with applied de-duping filters
  • Feeds can be aggregate of multiple external feeds
  • New RSS feeds have configurable de-duping
    • Configurable time frame that determines how long to look for duplicates of each new post
    • Configurable thresh hold that assigns match status (IE 90% words and/or word order match)
  • Ability for users to review what has posts have been marked duplicates

Implementation thoughts:

  • A website would provide the control panel and RSS feed proxy
  • To avoid scraping sites too often, proxy would not not connect to external feeds, only local database cache
  • Service application scrapes configured sites on a schedule and stores feed information in database
  • Proxy RSS feeds generate filtered results from local database
This entry was posted in General and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *