Lab 0: Getting started

# Initialize Otter
import otter
grader = otter.Notebook("lab0-gettingstarted.ipynb")

Lab 0: Getting started

This lab is meant to help you familiarize yourself with using the LSIT server and Jupyter notebooks. Some light review of numpy arrays is also included.

Objectives

Keyboard shortcuts, running cells, and viewing documentation in Jupyter Notebooks
Review functions, lists, and loops
Review NumPy arrays: indexing, attributes, and operations on arrays

Collaboration Policy

Data science is a collaborative activity. While you may talk with others about the course assignments, we ask that you write your solutions individually and do not copy them from others.

By submitting your work in this course, whether it is homework, a lab assignment, or a quiz/exam, you agree and acknowledge that this submission is your own work and that you have read the policies regarding Academic Integrity: https://studentconduct.sa.ucsb.edu/academic-integrity. The Office of Student Conduct has policies, tips, and resources for proper citation use, recognizing actions considered to be cheating or other forms of academic theft, and studentsâ€™ responsibilities. You are required to read the policies and to abide by them.

If you collaborate with others, we ask that you indicate their names on your submission.

Jupyter notebooks

Jupyter notebooks are organized into ‘cells’ that can contain either text or codes. For example, this is a text cell.

Technically, Jupyter is an application/interface that runs atop a kernel – a programming-language-specific independent environment in which code cells are executed. This basic organization allows for interactive computing with text integration.

Selecting a cell and pressing Enter will enter edit mode and allow you to edit the cell. From edit mode, pressing Esc will revert to command mode and allow you to navigate the notebook’s cells.

In edit mode, most of the keyboard is dedicated to typing into the cell’s editor. Thus, in edit mode there are relatively few shortcuts. In command mode, the entire keyboard is available for shortcuts, so there are many more. Here are a few useful ones:

Ctrl + Return : Evaluate the current cell
Shift + Return: Evaluate the current cell and move to the next
Saving the notebook: s
Basic navigation: up one cell k, down one cell j
a : create a cell above
b : create a cell below
dd : delete a cell
z : undo the last cell operation
m : convert a cell to markdown
y : convert a cell to code

Take a moment to find out what the following commands do:

Cell editing: x, c, v, z
Kernel operations: i, 0 (press twice)

# Practice the above commands on this cell

Running Cells and Displaying Output

Run the following cell.

print("Hello, World!")

In Jupyter notebooks, all print statements are displayed below the cell. Furthermore, the output of only the last line is displayed following the cell upon execution.

"Will this line be displayed?"

print("Hello" + ",", "world!")

5 + 3

Viewing Documentation

To output the documentation for a function, use the help() function.

help(print)

You can also use Jupyter to view function documentation inside your notebook. The function must already be defined in the kernel for this to work.

Below, click your mouse anywhere on print() and use Shift + Tab to view the function’s documentation.

print('Welcome to this course!')

Importing Libraries

In this course, we will be using common Python libraries to help us retrieve, manipulate, and perform operations on data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.

import pandas as pd
import numpy as np

Practice questions and numpy review

Most assignments for this class will be given as notebooks organized into explanation and prompts followed by response cells; you will complete assignments by filling in all of the response cells.

Many response cells are followed by a test cell that performs a few checks on your work. Please be aware that test cells don’t always confirm that your response is correct or incorrect. They are meant to give you some useful feedback, but it’s your responsibility to interpret the feedback – please be sure to read and think about test output if tests fail, and make your own assessment of whether you need to revise your response.

Below are a few practice questions for you to familiarize yourself with the process. These assume familiarity with basic python syntax and the numpy package.

Question 1

Write a function summation that evaluates the following summation for \(n \geq 1\):

\[\sum_{i=1}^{n} \left(i^3 + 5 i^3\right)\]

Hint: np.arange(5).sum() will generate an array comprising \(1, 2, \dots, 5\) and then add up the elements of the array.

def summation(n):
    """Compute the summation i^3 + 5 * i^3 for 1 <= i <= n."""
    ...

grader.check("q1")

Use your function to compute the sum for…

# n = 2
...

# n = 20
...

Question 2

The core of numpy is the array. Let’s use np.array to create an array. It takes a sequence, such as a list or range (remember that list elements are included between the square brackets [ and ], such as [1, 5, 3]).

Below, create an array containing the values 1, 2, 3, 4, and 5 (in that order) and assign it the name my_array.

my_array = ...

grader.check("q2")

Numpy arrays are integer-indexed by position, with the first element indexed as position 0. Elements can be retrieved by enclosing the desired positions in brackets [].

my_array[3]

To retrieve consecutive positions, specify the starting index and the ending index separated by :, for instance, arr[from:to]. This syntax is non-inclusive of the left endpoint, meaning that the starting index is not included in the output.

my_array[2:4]

In addition to values in the array, we can access attributes such as array’s shape and data type that can be retrieved by name using syntax of the form array.attr. Some useful attributes are:

.shape, a tuple with the length of each array dimension
.size, the length of the first array dimension
.dtype, the data type of the entries (float, integer, etc.)

A full list of attributes is here.

my_array.shape

my_array.size

my_array.dtype

Arrays, unlike Python lists, cannot store items of different data types.

# A regular Python list can store items of different data types
[1, '3']

# Arrays will convert everything to the same data type
np.array([1, '3'])

# Another example of array type conversion
np.array([5, 8.3])

Arrays are also useful in performing vectorized operations. Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays.

For example, observe the following:

# Python list addition will concatenate the two lists
[1, 2, 3] + [4, 5, 6]

# NumPy array addition will add them element-wise
np.array([1, 2, 3]) + np.array([4, 5, 6])

Arrays can be subsetted by index position, as shown above, or by a logical vector of the same length. For example:

example_arr = np.arange(4, 10)
example_arr

Suppose we want the last three elements. One option is to use index position:

example_arr[3:6]

Or a logical vector:

example_arr[np.array([False, False, False, True, True, True])]

The latter approach allows one to subset based on a condition defined by the values of the vector. For example, we can use the condition \(x \geq 7\) to obtain the logical vector used above.

example_arr >= 7

And then we can subset just as before:

example_arr[example_arr >= 7]

You’ll see this done frequently, and it’s sometimes referred to as filtering, because we’re selectively removing values.

Question 3

Given the array random_arr, create an array containing all values \(x\) such that \(2x^4 > 1\). Name the array valid_values.

# for reproducibility - setting the seed will result in the same random draw each time
np.random.seed(42)

# draw 60 uniformly random integers between 0 and 1
random_arr = np.random.rand(60)

# solution here
valid_values = ...

grader.check("q3")

A note on `np.arange` and `np.linspace`

Usually we use np.arange to return an array that steps from a to b with a fixed step size s. While this is fine in some cases, we sometimes prefer to use np.linspace(a, b, N), which divides the interval [a, b] into N equally spaced points.

np.arange(start, stop, step) produces an array with all the numbers starting at start, incremendted up by step, stopping before stop is reached. For example, the value of np.arange(1, 6, 2) is an array with elements 1, 3, and 5 – it starts at 1 and counts up by 2, then stops before 6. np.arange(4, 9, 1) is an array with elements 4, 5, 6, 7, and 8. (It doesn’t contain 9 because np.arange stops before the stop value is reached.)

np.linspace always includes both end points while np.arange will not include the second end point b. For this reason, especially when we are plotting ranges of values we tend to prefer np.linspace.

Notice how the following two statements have different parameters but return the same result.

np.arange(-5, 6, 1.0)

np.linspace(-5, 5, 11)

Check your understanding. Will np.arange(1, 10) produce an array that contains 10? Add a cell below and check to confirm your answer.

Submission

Save the notebook.
Restart the kernel and run all cells. (CAUTION: if your notebook is not saved, you will lose your work.)
Carefully look through your notebook and verify that all computations execute correctly. You should see no errors; if there are any errors, make sure to correct them before you submit the notebook.
Download the notebook as an .ipynb file. This is your backup copy.
Export the notebook as PDF and upload to Gradescope.

To double-check your work, the cell below will rerun all of the autograder tests.

grader.check_all()

Lab 0: Getting started

Objectives

Collaboration Policy

Jupyter notebooks

Running Cells and Displaying Output

Viewing Documentation

Importing Libraries

Practice questions and numpy review

Question 1

Question 2

Question 3

A note on np.arange and np.linspace

Submission

A note on `np.arange` and `np.linspace`