Datawaza#

Datawaza is a collection of tools for data exploration, visualization, data cleaning, pipeline creation, model iteration, and evaluation. It builds upon core libraries like pandas, matplotlib, seaborn, and scikit-learn.

Modules#

Explore

Quickly explore and visualize your data.

Clean

Clean your data and engineer features.

Model

Create pipelines, iterate and evaluate models.

Tools

Additional utilities and helper functions.

Installation#

The latest releases can be found on PyPi. Install Datawaza with pip:

pip install datawaza

See the Change Log for a history of changes.

User Guide#

User Guide is a Jupyter notebook that walks through how to use the Datawaza functions. It’s probably the best place to start, and then you can reference the function specs organized by module above.

Source Code#

You can find the Datawaza repo on Github. Please submit any issues there. It’s distributed under the GNU General Public License. Contributions are welcome!

What is Waza?#

Waza (技) means “technique” in Japanese. In martial arts like Aikido, it is paired with words like “suwari-waza” (sitting techniques) or “kaeshi-waza” (reversal techniques). So we’ve paired it with “data” to represent Data Science techniques: データ技 “data-waza”.

Origin Story#

Most of these functions were created while I was pusuring a Professional Certificate in Machine Learning & Artificial Intelligence <https://em-executive.berkeley.edu/professional-certificate-machine-learning-artificial-intelligence> from U.C. Berkeley. With every assignment, I tried to simplify repetitive tasks and streamline my workflow. They served me well, so I’m publishing this library in the hope that it may help others.

Reference#