# Initialize Otter
import otter
= otter.Notebook("lab0-gettingstarted.ipynb") grader
Lab 0: Getting started
Lab 0: Getting started
This lab is meant to help you familiarize yourself with using the LSIT server and Jupyter notebooks. Some light review of numpy arrays is also included.
Objectives
- Keyboard shortcuts, running cells, and viewing documentation in Jupyter Notebooks
- Review functions, lists, and loops
- Review NumPy arrays: indexing, attributes, and operations on arrays
Collaboration Policy
Data science is a collaborative activity. While you may talk with others about the course assignments, we ask that you write your solutions individually and do not copy them from others.
By submitting your work in this course, whether it is homework, a lab assignment, or a quiz/exam, you agree and acknowledge that this submission is your own work and that you have read the policies regarding Academic Integrity: https://studentconduct.sa.ucsb.edu/academic-integrity. The Office of Student Conduct has policies, tips, and resources for proper citation use, recognizing actions considered to be cheating or other forms of academic theft, and students’ responsibilities. You are required to read the policies and to abide by them.
If you collaborate with others, we ask that you indicate their names on your submission.
Jupyter notebooks
Jupyter notebooks are organized into ‘cells’ that can contain either text or codes. For example, this is a text cell.
Technically, Jupyter is an application/interface that runs atop a kernel – a programming-language-specific independent environment in which code cells are executed. This basic organization allows for interactive computing with text integration.
Selecting a cell and pressing Enter
will enter edit mode and allow you to edit the cell. From edit mode, pressing Esc
will revert to command mode and allow you to navigate the notebook’s cells.
In edit mode, most of the keyboard is dedicated to typing into the cell’s editor. Thus, in edit mode there are relatively few shortcuts. In command mode, the entire keyboard is available for shortcuts, so there are many more. Here are a few useful ones:
Ctrl
+Return
: Evaluate the current cellShift
+Return
: Evaluate the current cell and move to the next- Saving the notebook:
s
- Basic navigation: up one cell
k
, down one cellj
a
: create a cell aboveb
: create a cell belowdd
: delete a cellz
: undo the last cell operationm
: convert a cell to markdowny
: convert a cell to code
Take a moment to find out what the following commands do:
- Cell editing:
x, c, v, z
- Kernel operations:
i
,0
(press twice)
# Practice the above commands on this cell
Running Cells and Displaying Output
Run the following cell.
print("Hello, World!")
In Jupyter notebooks, all print statements are displayed below the cell. Furthermore, the output of only the last line is displayed following the cell upon execution.
"Will this line be displayed?"
print("Hello" + ",", "world!")
5 + 3
Viewing Documentation
To output the documentation for a function, use the help()
function.
help(print)
You can also use Jupyter to view function documentation inside your notebook. The function must already be defined in the kernel for this to work.
Below, click your mouse anywhere on print()
and use Shift
+ Tab
to view the function’s documentation.
print('Welcome to this course!')
Importing Libraries
In this course, we will be using common Python libraries to help us retrieve, manipulate, and perform operations on data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.
import pandas as pd
import numpy as np
Practice questions and numpy review
Most assignments for this class will be given as notebooks organized into explanation and prompts followed by response cells; you will complete assignments by filling in all of the response cells.
Many response cells are followed by a test cell that performs a few checks on your work. Please be aware that test cells don’t always confirm that your response is correct or incorrect. They are meant to give you some useful feedback, but it’s your responsibility to interpret the feedback – please be sure to read and think about test output if tests fail, and make your own assessment of whether you need to revise your response.
Below are a few practice questions for you to familiarize yourself with the process. These assume familiarity with basic python syntax and the numpy package.
Question 1
Write a function summation
that evaluates the following summation for \(n \geq 1\):
\[\sum_{i=1}^{n} \left(i^3 + 5 i^3\right)\]
Hint: np.arange(5).sum()
will generate an array comprising \(1, 2, \dots, 5\) and then add up the elements of the array.
def summation(n):
"""Compute the summation i^3 + 5 * i^3 for 1 <= i <= n."""
...
"q1") grader.check(
Use your function to compute the sum for…
# n = 2
...
# n = 20
...
Question 2
The core of numpy is the array. Let’s use np.array
to create an array. It takes a sequence, such as a list or range (remember that list elements are included between the square brackets [
and ]
, such as [1, 5, 3]
).
Below, create an array containing the values 1, 2, 3, 4, and 5 (in that order) and assign it the name my_array
.
= ... my_array
"q2") grader.check(
Numpy arrays are integer-indexed by position, with the first element indexed as position 0. Elements can be retrieved by enclosing the desired positions in brackets []
.
3] my_array[
To retrieve consecutive positions, specify the starting index and the ending index separated by :
, for instance, arr[from:to]
. This syntax is non-inclusive of the left endpoint, meaning that the starting index is not included in the output.
2:4] my_array[
In addition to values in the array, we can access attributes such as array’s shape and data type that can be retrieved by name using syntax of the form array.attr
. Some useful attributes are:
.shape
, a tuple with the length of each array dimension.size
, the length of the first array dimension.dtype
, the data type of the entries (float, integer, etc.)
A full list of attributes is here.
my_array.shape
my_array.size
my_array.dtype
Arrays, unlike Python lists, cannot store items of different data types.
# A regular Python list can store items of different data types
1, '3'] [
# Arrays will convert everything to the same data type
1, '3']) np.array([
# Another example of array type conversion
5, 8.3]) np.array([
Arrays are also useful in performing vectorized operations. Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays.
For example, observe the following:
# Python list addition will concatenate the two lists
1, 2, 3] + [4, 5, 6] [
# NumPy array addition will add them element-wise
1, 2, 3]) + np.array([4, 5, 6]) np.array([
Arrays can be subsetted by index position, as shown above, or by a logical vector of the same length. For example:
= np.arange(4, 10)
example_arr example_arr
Suppose we want the last three elements. One option is to use index position:
3:6] example_arr[
Or a logical vector:
False, False, False, True, True, True])] example_arr[np.array([
The latter approach allows one to subset based on a condition defined by the values of the vector. For example, we can use the condition \(x \geq 7\) to obtain the logical vector used above.
>= 7 example_arr
And then we can subset just as before:
>= 7] example_arr[example_arr
You’ll see this done frequently, and it’s sometimes referred to as filtering, because we’re selectively removing values.
Question 3
Given the array random_arr
, create an array containing all values \(x\) such that \(2x^4 > 1\). Name the array valid_values
.
# for reproducibility - setting the seed will result in the same random draw each time
42)
np.random.seed(
# draw 60 uniformly random integers between 0 and 1
= np.random.rand(60)
random_arr
# solution here
= ... valid_values
"q3") grader.check(
A note on np.arange
and np.linspace
Usually we use np.arange
to return an array that steps from a
to b
with a fixed step size s
. While this is fine in some cases, we sometimes prefer to use np.linspace(a, b, N)
, which divides the interval [a, b]
into N equally spaced points.
np.arange(start, stop, step)
produces an array with all the numbers starting at start
, incremendted up by step
, stopping before stop
is reached. For example, the value of np.arange(1, 6, 2)
is an array with elements 1, 3, and 5 – it starts at 1 and counts up by 2, then stops before 6. np.arange(4, 9, 1)
is an array with elements 4, 5, 6, 7, and 8. (It doesn’t contain 9 because np.arange
stops before the stop value is reached.)
np.linspace
always includes both end points while np.arange
will not include the second end point b
. For this reason, especially when we are plotting ranges of values we tend to prefer np.linspace
.
Notice how the following two statements have different parameters but return the same result.
-5, 6, 1.0) np.arange(
-5, 5, 11) np.linspace(
Check your understanding. Will np.arange(1, 10)
produce an array that contains 10
? Add a cell below and check to confirm your answer.
Submission
- Save the notebook.
- Restart the kernel and run all cells. (CAUTION: if your notebook is not saved, you will lose your work.)
- Carefully look through your notebook and verify that all computations execute correctly. You should see no errors; if there are any errors, make sure to correct them before you submit the notebook.
- Download the notebook as an
.ipynb
file. This is your backup copy. - Export the notebook as PDF and upload to Gradescope.
To double-check your work, the cell below will rerun all of the autograder tests.
grader.check_all()