Website with data extraction, automatisation and data showcase

  • Job Duration01 to 03 months
  • Project LevelExpensive
  • Project deadlineExpired

Project detail

Objective: A database website that automatically downloads data, recalculates it, and updates the available data every 6 months and displays it on the public website. It is benchmark that compares self-employed based on scraped data.

Main elements of the system:
• Visitor / Admin environment
• Database
• Data acquisition (a total of 10 domains via Api and Scrapping)
• User registration and management
o After registration, users have more data to preview
o Assign profile of the user with benchmarked profile of specific ID (the user’s ID number is paired with the ID number of the system’s operation after verification)
o Adding data to the automation from the user by uploading a PDF (max. 3 types of forms) and answering Questionnaire (several questions)
• Data automation – calculations
o Retrieval from API and Extraction (Scraping and uploading PDF by users)
o Update every 6 months (API and Scraping)
o Data presentation (total summary + 1 classification on benchmark page, individually in profiles according to ID number)
• Language mutation – the visitor part must be accessed for translation into English, German and Slovak (main language).
• The site must be responsive, the admin environment secured

Procedure of automated system work:
– Scrapper downloads data from websites and stores it in a database (updated every 6 months)
– Data recalculations will take place for evaluation (update every 6 months, previous data are kept in the database)
– The overall ranking will be compiled, the ranking by category will be updated, the data for profiles will be updated (according to ID number)
– Profile is a page for a specific ID number with data (partially accessible without registration, extended after registration)
o Overall position
o By position according to SK NACE
o Basic data (name, ID number, address)
o Extended data / after visitor registration, user login /
– The user (before registering as a visitor) gains access to more detailed information on the profiles after registration
– Benchmark is the place where data are presented (always for half-yearly updates)
by the overall benchmark
o benchmark by category (SK NACE)
o Through the URL link user can access more detailed profiles of each ID number
– Blog – a place to publish articles
– About the project – information about the project, evaluation procedure, other information needed for the public
– Contact – contact details of the project implementer

ADMIN
– Access to Export – complete benchmark for evaluation periods (to EXCEL, each evaluation to a bookmark)
– User management
o to confirm the request to pair the ID profile with the user profile
o Ban user access
o Full user profile (plus possible modification as an administrator)
– Blogging (add/update blogs)
– Language mutation (translation management)
– System information
o Data extraction error
o Other error messages

Benchmark
1. Data level – comprehensive overall evaluation from the second data level, comprehensive evaluation (overall without filter) and according to SK NACE (filter)
2. Level of data – evaluation in groups of indicators, cumulative recalculations of interrelated indicators
3. Data level – indicators -> data obtained by scraper and from API, or from form and uploaded documents from users

Data from users (in user profile and during registration)
– Registration (basic data, email verification)
– Questionnaire (data as web URL, additional information via Questionnaire – Yes / No / number)
– PDF Documents
o Financial statements – has to be uploaded repeated every year, it is necessary to read data from it / specifically for the benchmark
o self-employed ID card / photo – one-time activity /

The construct itself can be:
• Python / Django created by CMS web (RECOMMENDED)
• WordPress with extensions for created plugins

Skills Required

Industry Categories

Freelancer type required for this project