Basic introduction to tabular data analysis with python or R
Before starting to work with single-cell data, we think it is useful to try our hands on a simpler example dataset. The goal of this exercise is to enable you to
- set up a workspace for data analysis
- load and inspect tabular datasets
- perform basic data transformations like standardization
- filter and subset data based on conditions
- create basic visualizations
The dataset we will use is the penguins dataset, a common example dataset for data analysis and ML methods.
Yes, you can (and will) use AI chatbots, but please consider:
AI chatbots make it very easy to write a lot of code fast, but progressing faster than you can understand is dangerous and not sustainable.
Therefore, we recommend the following practices throughout the course (and beyond):
- Think before you consult AI - what are you trying to accomplish?
- Use AI as a coding tutor rather than a coder, e.g. ask “I want to normalize my count data, what do I need to consider?” rather than just asking for the code.
- Type out the code rather than just copy-pasting it.
- This forces you to slow down and focus on each command. It should also lead to better retention of commands and thus ultimately to more flexibility in coding freely and quickly.
- Don’t run code you do not understand.
- You must be able to assume full accountability for your results, especially as a scientist.
- In the worst case, copying code and executing it just like that can make you vulnerable to hacker attacks.
- Try “old school” web searches from time to time to find answers - if available, use the official package documentation to solve problems.
Exercises
We have prepared a set of exercises, which are available as jupyter notebook for python users and as an R Markdown file for R users. Please download them and work your answers into the prepared spaces.