Tag: Datasets

The Ultimate List of SQL Datasets for Data Scientists

SQL (Structured Query Language) is a foundational tool for data scientists, enabling them to interact with and analyze vast datasets efficiently. In this article, we’ve compiled the ultimate list of SQL datasets that every data scientist should know about. Whether you’re honing your SQL ...

Removing Unequal Data Distribution Bias from Datasets for Binomial Classification

In the realm of machine learning, achieving accurate and reliable results often hinges on the quality of the dataset being used. One common challenge that arises in binary classification tasks is unequal data distribution bias. When one class significantly outnumbers the other, the model tends to fa...

EasyPortrait: Face Parsing and Portrait Segmentation Dataset

Apps with video calls have grown in popularity in recent years. Many use them daily for work, school, or to keep in touch with friends and family. Therefore, the functionality of video conferencing software strongly began to increase, adding new features based on Machine Learning models to their eco...

How to Boost Pandas Speed And Process 10M-row Datasets in Milliseconds

“Great… another article on how to make Pandas n times faster.” I think I have said that countless times for the past three years I have been using Pandas. The most recent one I saw said, “make Pandas 71,803 times faster”. But I won’t give you that...

Mastering Imbalanced NLP Datasets

Natural Language Processing (NLP) has found applications in various domains, including sentiment analysis, chatbots, and content moderation. One common challenge in NLP projects is dealing with imbalanced datasets, where one class of data significantly outnumbers the other. In this blog, we’ll...

Hashing in Spark/Databricks: A Faster Way to Find New Records in Large Datasets

Hey Bob, how’s it going with comparing those two gigantic datasets?” Mike yells across the cubicles. “Still at it. The row-by-row comparison is killing me, and my coffee supply,” Bob replies, visibly stressed. Mike chuckles, “Well, have you ever tried MD5 hashing?...

Recognizing the Standard Discrimination in Datasets

Much of the public discourse about tech originally focused on how the outcomes of tech tools, e.g., software, affected people. Then, the conversation turned to algorithms and how they are designed to be exclusionary. Now, we’re having more conscious discussions about the data, both input and o...

15 Best Open-Source Autonomous Driving Datasets

In recent years, more and more companies and research institutions have made their autonomous driving datasets open to the public. However, the best datasets are not always easy to find, and scouring the internet for them takes time. To help, we at SiaSearch have put together a list of ...

3 Self-Driving Car Datasets for Deep Learning Research

You are interested in autonomous driving and want to study 2D/3D object detection & tracking, lane/drivable area segmentation, semantic/instance segmentation, self localization and scene flow estimation for self-driving cars but don’t know where to start? Then continue reading, and you ...