not Logged in

A Job Scrapper

Searching for a job

While I was searching for a Job, I got tired of the different job platforms which were available.
I also was not satisfied with the filtering options i got provided from the web pages.

Instead of searching for a job i could...!

So I began to write a little scrapper to scrap the important data from different sites and put them into a sqlite3 database.

I implemented a little config file where i can input the different urls from different job sites, which turns out was a great idea. I am by no mean a professional but i implemented it in a way to be able to fast add new sites.

It was my first time using sqlite3 and I was amazed how easy it was to integrate it with python code. It was also the first time of using python intensively. I choose python because of the fast development time and the availability of a very good html parser called beautifulsoup.

I wrote the whole scrapper as a cmd line tool and used sqlitebrowser for browsing the data. For me it is important to be able to use a software without a gui. But I wanted more.

i love the commandline but ... i need a gui for this job

So i read me into pyside/qt, because it gets fast results. It is easy to integrase a webview and easy to integrate a sqlite database. So it felt more like gluing together the different librarys than coding to me. But it was a nice progression curve.

I ended with a gui of the database, some preconfigured filters, like regional filter, and a webview at the right side.

I wanted to trigger the scrapper from the gui and had to read into QThread which took some time to understand but worked out great in the end.

Struggles

The cmd line script detects if it is run by a gui or by a cmd line, so i had to ask the user for the password for the login of some sites(because of premium content). this was no problem at the comandline just do an x = input("PW").

For the gui version i ran a QDialog which prompts for the password. This worked unti i implemented threading. It turns out only the main thread should define and run the gui elements, sow i had to rewrite it. But it turns out it was pretty decent done by for- and backward signalling from the main thread to the worker thread. In the end i can run the scrapper in the background without stalling the gui.

Conclusion

It was an awesome journey, i learned a lot about python, sqlite3, qt and html. Of course i could have done a lot in a different and better way, but i'm quite happy with the result. I will put the results onto my gitea.

For another project i would use a different language, because:
  • a. i don't like the indented style
  • b. my code did get confusing very fast, which i tried to fix with splitting and reworking the py files a lot.