Italian Laws Unigram Viewer on the Edge With Cloudflare Pages

I didn’t know much about search engines or information retrieval. So, as usual, my journey began with a Google search.

That’s where I found the well known Apache Lucene. I started digging and learning all about it. I discovered that there are many other optimizations and pre-processes that are essentials for serving a search endpoint, and that there is a project called Solr that is doing this for me.

Solr is a search engine built on top of Apache Lucene, and it comes with sane defaults and a handy HTTP API. The first part is indexing and processing the laws, I used pysolr client to loop trough each law and add them to the index. The document format only contains the text (not stored) and the publication date (stored), which is all I need to reconstruct the term usage plot.

Website

Tags: Italian Laws