Seminar «Data Analysis in Python» from May 03 to May 04, 2017
After this course you will be able to process, summarize and visualize tabular data efficiently using the pandas library.
When | May 03, 2017 09:00 AM
to
May 04, 2017 05:00 PM |
---|---|
Where | Veit Schiele Communications GmbH, Mansteinstr. 7, D-10783 Berlin |
Contact Name | Veit Schiele |
Contact Phone | +49 30 8185667-1 |
Add event to calendar | ![]() ![]() |




Target Audience
Analysts, researchers and engineers who would like to handle larger data sets more efficiently.
Prerequisites
Basic knowledge of Python
Course Description
The pandas Python library is a practical everyday tool for the analysis of tabular data. This course improves your skillset for working with datasets ranging from a few dozen to a several million entries in Python. The course uses hands-on examples to cover exploratory data analysis, extracting relevant summaries and creating attractive diagrams. The integration of pandas with interactive environments like IPython und Jupyter will allow you to support answers to many questions with data quickly.
Course Duration
2 days
Course Outline
Day 1 | Day 2 |
---|---|
Introduction to pandas | Aggregation |
Data Wrangling | Analyzing Time Series |
Summarizing Data | Geographical Data |
Data Visualization | pandas Best Practices |
Day 1
Introduction to pandas
- Your environment for interactive data analysis
- overview of the pandas library
- Series
- DataFrames
- Improvements in Python 3
- Jupyter Notebooks
Data Wrangling
- reading CSV- and Excel files to pandas
- sorting data
- transposing tables
- selecting rows and columns
- saving pandas-tables
Summarizing data
- extracting statistical metrics
- merging tables
- hierarchical indexing
- crosstables
- pivot tables
Data Visualization
- creating diagrams with matplotlib
- using matplotlib from within pandas
- visualizing data in Jupyter notebooks
- heatmaps
- multi-panel diagrams
- creating high-quality figures
- other libraries for visualizing data
Day 2
Aggregation
- iterating rows and columns
- grouping
- aggregation functions
- transformation functions
- applying your own functions
Analyzing Time Series
- series of timestamps
- rescaling time series
- changing timezones
- handling data with gaps
- rolling means
- simple predictions
Geographical Data
- storing coordinates in pandas
- drawing maps with Basemap
Best Practices
- myths and facts
- Numpy
- machine learning models in scikit-learn
- alternative libraries and modeling strategies
- handling huge datasets
- do's and don'ts