OLX Scrapy in PyCharm
Project detail
Project is Scrapy code in PyCharm with full settings functionality (proxy and other parts in settings).
Test is needed for final approval for production.
Description project :
I need all data from this site: ”
https://www.olx.ba/pretraga?id=18
” That means all data given for cars and persons who created post. All data for autos is given in second level of selects–this is example:
https://www.olx.ba/r/44743851/413389
Code needs to be able to export data in excel/json or csv format, code needs to be created in PyCharm in Scrapy.
All fileds needs to be created and if data or felds not exists it needs to be inserted “N/A” — that’s because of data will be inserted in SQL database and exported in Excel. That means all structure of json or csv needs to be structured as table, with primary keys trough fields name.
After you finish coding, I will test code it and pay your effort.
Project is Scrapy code in PyCharm with full functionality and I needed to test it before we agreed that is functional and operative for production.
I need all cars data from this site.
It is platform for sale, like Amazon and I need to track car sale with all details of cars and persons who created sale post.
I will send attached as example.
After you send me code, I will run this in PyCharm, after that I will check all fields and quality data for cc 75.000 cars with cc 20 felds.
I have already some other parts in my PyCharm project, so this is only one piece of puzzle, and in that order final project code needs to have appropriate settings and all parts needed for my testing and running in production for Windows systems.
As I already said, web site is like Amazon, sale platform and I need to get data in position when I run code, no history.
It need to be created boot to prevent telecom or web site to block scraping data and define appropriate USER AGENT in Scrapy settings and with proxy.
Example for fields are download in project.
Note for you:
Avoid IP blocking, you need to integrate proxies and user agent with scraper.
Without that you can’t extract data from OLX.