Data Science today :: Where do you or your skill fit in ?

Data Science and developers.

In today’s data science mania, here is a quick view of how data is moved from its raw stage to useful report. As developer you can find where you or your skill fit in this flow.

dataviz flow
data how it gets transformed from raw to stunning report.

Image courtesy : Oreilly

1. Source / Scrape : Like wiki pages, todays data source could be anything. Splunk query, Server log, any device which produces data in any format (.csv, .log, .json, .xml, .xls etc..)

2. Data Cleansing : Pandas is a good option, but its not limited to that. Community Edition of Pentaho PDI can help in data cleansing. SSDT (former SSIS) has transformations which help in data cleansing. MySQL with RegEX will help in cleaning data.

2a. Database : Database is key. Any RDBMS such as Microsoft SQL Server, Postgres, MySQL, Oracle) will do, but my vote is for MySQL 8, as it has NOSQL with API capability.

3. Explore : Jupyter / Anaconda / IPython + Pandas + Matplotlib is a good combination. SPARK with Zeppelin will also work, but too much work setting up the cluster.

4. Deliver : REST API is one option, accessing via DB (using ORM tools such as SQL Alchemy) or direct SQL will be time saving option.

5. Transform : D3 JS powerful tool if you are Javascript expert. For others, Pentaho community edition is good alternative. If you have access to Tableau (paid tool) nothing beats that.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: