[1]:

from pendsim import sim, controller, viz
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import HTML

Linear Quadratic Regulator (LQR) Notebook¶

LQR is a form of optimal control that allows us to stabilize a system while minimizing the “cost” of doing so. An important feature of LQR is that we can tune the weighting of variables in the cost function, which allows us to prioritize certain variables over others (e.g amount of energy used vs performance).

Let’s consider a general quadratic cost function \(J\) of the form

\[J = \int_0^\infty [x^T Q x + u^T R u]dt\]

where the vectors \(x\) and \(u\) represent the system’s state and control input, respectively. The objective of an LQR controller is to choose the control input \(u\) to minimize \(J\).

We use the matrices \(Q\) and \(R\) as weights to penalize the terms that depend on \(x\) and \(u\), respectively. A larger value of \(Q\) places a higher weight on our state \(x\), while a larger value of \(R\) places a higher weight on our control input \(u\).

Imagine that we want to drive the system’s state to zero as fast as possible without much regard for the energy required to get there. Then we would choose a large value for \(Q\) and a small value for \(R\). If we want to use less energy or actuation effort, we would choose a larger value of \(R\).

To begin our LQR controller design, let’s create a simulation of the inverted pendulum on a cart that experiences an external horizontal force to try to knock over the pendulum:

[ ]:

dt, t_final = 0.01, 20
pend = sim.Pendulum(
    2.0,  # Large mass, 2.0 kg
    1.0,  # Small mass, 1.0 kg
    2.0,  # length of arm, 2.0 meter
    initial_state=np.array([0.0, 0.0, 0.0, 0.0]),
)
def force_func(t):
    return 10 * np.exp( -(  ((t-2.0)/0.1)**2) )
simu = sim.Simulation(dt, t_final, force_func)

Our choice of \(Q\) and \(R\) is how we “tune” our LQR controller to fit our needs.

For a state vector \(x\) with length n, the \(Q\) matrix should be size nxn.

The state vector \(x\) is \([p, \dot p, \theta, \dot\theta]\), where \(p\) is the cart’s position, \(\dot p\) is the cart’s velocity, \(\theta\) is the pendulum angle measured from the upright position, and \(\dot\theta\) is the pendulum’s angular velocity.

Then the \(Q\) matrix is a 4 x 4 matrix that we choose to be diagonal as:

\[Q = \begin{bmatrix}a&0&0&0\cr0&b&0&0\cr0&0&c&0\cr0&0&0&d\end{bmatrix}\]

where \(a\), \(b\), \(c\), and \(d\) are weighting factors for the states \(p\), \(\dot p\), \(\theta\), and \(\dot \theta\), respectively.

Since \(u\) is a scalar in this case, \(R\) is also a scalar.

Let’s begin with an LQR controller that places a weight of 100,000 on \(\theta\), 1,000 on \(\dot\theta\), 1 on the input \(u\), and zero weighting on \(p\) and \(\dot p\).

[ ]:

Q = np.array([[0,0,0,0], [0,0,0,0],[0,0,100000,0],[0,0,0,1000]])
R = 1
window = 10 # used in internal calculations
lqr_controller = controller.LQR(pend, dt, Q, R, window)
results = simu.simulate(pend, lqr_controller)

Now to plot the results:

[ ]:

fig, ax = plt.subplots(nrows=2, sharex=True)
ax[0].plot(results[("state", "t")], label=r"$\theta$")
ax[0].set_ylabel(r"$\theta$")

ax[1].plot(results[("control action", "control action")], label="Control Action")
ax[1].set_ylabel("Control Action")
ax[1].set_xlabel("Time Step")
plt.show()

We see that the LQR controller effectively stabilizes the pendulum in the upright position after the external force is applied.

Let’s look at a visualization of our LQR simulation. Note: External force is red and control force is blue.

[ ]:

visu = viz.Visualizer(results, pend, dt=dt, speed=1)
ani = visu.animate()
HTML(ani.to_html5_video())

Tuning R¶

Let’s see what happens as we gradually increase \(R\), meaning we increase the “penalty” on the control input

[ ]:

Q = np.array([[0,0,0,0], [0,0,0,0],[0,0,100000,0],[0,0,0,1000]])
R = 0
window = 10 # used in internal calculations

increase_by = 0.5
n = 12
conts = []
pends = [pend] * n
R_values = []
for _ in range(n):
    # increase the gain
    conts.append(controller.LQR(pend, dt, Q, R, window))
    R_values.append(R)
    R += increase_by
# simulate each controller
all_results = simu.simulate_multiple(pends, conts)

nrows, ncols = 4, 3
fig1, ax1 = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, sharey=False, figsize=(15,12))
axn, ax_idxs = 0, {}
for i in range(nrows):
    for j in range(ncols):
        ax_idxs[axn] = (i, j)
        axn += 1
for g, (idx, res), (axi, axj) in zip(R_values, all_results.groupby(level=0), ax_idxs.values()):
    res.index = res.index.droplevel(0)
    ax1[axi, axj].plot(res[('state', 't')])
    ax1[axi, axj].set_title(f'R={g:.2f}')
# label plots
for i in range(nrows):
    ax1[i, 0].set_ylabel('theta (rad)')
for j in range(ncols):
    ax1[-1, j].set_xlabel('time (s)')

print('\n\n\n\n                                                                        \033[1m THETA VS TIME \033[0m')
plt.show()
print('\n\n\n\n                                                                        \033[1m U VS TIME \033[0m')

#Repeat for U vs time

fig2, ax2 = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, sharey=False, figsize=(15,12))
axn, ax_idxs = 0, {}
for i in range(nrows):
    for j in range(ncols):
        ax_idxs[axn] = (i, j)
        axn += 1
for g, (idx, res), (axi, axj) in zip(R_values, all_results.groupby(level=0), ax_idxs.values()):
    res.index = res.index.droplevel(0)
    ax2[axi, axj].plot(res[('control action', 'control action')])
    ax2[axi, axj].set_title(f'R={g:.2f}')
# label plots
for i in range(nrows):
    ax1[i, 0].set_ylabel('Control Action')
for j in range(ncols):
    ax1[-1, j].set_xlabel('time (s)')

plt.show()

As we increase \(R\), the controller becomes more conservative with its actions (i.e, smaller actions) because those control inputs are penalized more harshly.

Increasing \(R\) can be useful if we want a more conservative controller, e.g., we want the system to consume less energy.

However, increasing \(R\) too much can cause the controller to be unable to take large enough actions to stabilize the system. We see that the pendulum falls over with larger values of \(R\).

In general, a large \(R\) will cause the system to take smaller actions and react “slower.”

Tuning Q¶

Choosing \(R\) to be 5, let’s see what happens when we vary \(Q\), which increases the penalty on the state. Here we will only penalize the pendulum angle and angular velocity and not the cart’s position.

[ ]:

Q = np.array([[0,0,0,0], [0,0,0,0],[0,0,0,0],[0,0,0,0]])
R = 5
window = 10 # used in internal calculations

increase_c = 7e4
increase_d = 3e4
n = 12
conts = []
pends = [pend] * n
Q_values = []
for _ in range(n):
    # increase the gain
    Q_values.append(Q)
    Q = Q + np.array([[0,0,0,0], [0,0,0,0],[0,0,increase_c,0],[0,0,0,increase_d]])
    conts.append(controller.LQR(pend, dt, Q, R, window))
# simulate each controller
all_results = simu.simulate_multiple(pends, conts)

nrows, ncols = 4, 3
fig1, ax1 = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, sharey=False, figsize=(15,12))
axn, ax_idxs = 0, {}
for i in range(nrows):
    for j in range(ncols):
        ax_idxs[axn] = (i, j)
        axn += 1
for g, (idx, res), (axi, axj) in zip(Q_values, all_results.groupby(level=0), ax_idxs.values()):
    res.index = res.index.droplevel(0)
    ax1[axi, axj].plot(res[('state', 't')])
    ax1[axi, axj].set_title(f'C={g[2,2]:.2f} and D={g[3,3]:.2f}')
# label plots
for i in range(nrows):
    ax1[i, 0].set_ylabel('theta (rad)')
for j in range(ncols):
    ax1[-1, j].set_xlabel('time (s)')

print('\n\n\n\n                                                                        \033[1m THETA VS TIME \033[0m')
plt.show()
print('\n\n\n\n                                                                        \033[1m U VS TIME \033[0m')


#Repeat for U vs time

fig2, ax2 = plt.subplots(nrows=nrows, ncols=ncols, sharex=True, sharey=False, figsize=(15,12))
axn, ax_idxs = 0, {}
for i in range(nrows):
    for j in range(ncols):
        ax_idxs[axn] = (i, j)
        axn += 1
for g, (idx, res), (axi, axj) in zip(Q_values, all_results.groupby(level=0), ax_idxs.values()):
    res.index = res.index.droplevel(0)
    ax2[axi, axj].plot(res[('control action', 'control action')])
    ax2[axi, axj].set_title(f'C={g[2,2]:.2f} and D={g[3,3]:.2f}')
# label plots
for i in range(nrows):
    ax2[i, 0].set_ylabel('Control Action')
for j in range(ncols):
    ax2[-1, j].set_xlabel('time (s)')

plt.show()

Initially, we see that with \(Q\) equal to zero, the system cannot be stabilized, and the pendulum falls over.

As we increase \(Q\), the system stabilizes faster. This is because \(Q\) penalizes deviation from zero angle and zero angular velocity, that is, the upright position with zero velocity, and the controller wants to minimize that penalty.

Using a large \(Q\) can be useful if we want our system to make a large action as quickly as possible without regarding the energy cost of that action.

Using a \(Q\) that’s too small may cause the controller to be unresponsive, which may lead to poor performance or instability. Using a \(Q\) that’s too large may cause the controller to use an excessively large action that may lead to actuator saturation and consuming too much energy.