Skip to content

Latest commit

 

History

History
213 lines (158 loc) · 6.59 KB

File metadata and controls

213 lines (158 loc) · 6.59 KB
jupyter
jupytext kernelspec
text_representation
extension format_name format_version jupytext_version
.md
markdown
1.3
1.16.4
display_name language name
Python 3 (ipykernel)
python
python3

Quantum Optimal Control with Reinforcement Learning

In this notebook, we will demonstrate how to use the _RL module to solve a quantum optimal control problem using reinforcement learning (RL). The goal is to use 2 Qubits to realize CNOT gate. In practice there is a control qubit and a target qubit, if the control qubit is in the state |0⟩ the target qubit remains unchanged, if the control qubit is in the state |1⟩ the CNOT gate flips the state of the target qubit.

Setup and Import Required Libraries

# If you are running this in an environment where some packages are missing,
# use this cell to install them:
# !pip install qutip stable-baselines3 gymnasium

import matplotlib.pyplot as plt
import numpy as np
import qutip as qt
from qutip_qoc import Objective, optimize_pulses

Define the Quantum Control Problem

The system starts from an initial state represented by the identity on two qubits, with the goal of achieving a CNOT gate as the target state. To accomplish this, control operators based on the Pauli matrices are defined to act on individual qubits and pairs of qubits. Additionally, a drift Hamiltonian is introduced to account for interactions between the qubits and noise, thereby modeling the dynamics of the open quantum system.

# Define the initial and target states
initial = qt.tensor(qt.qeye(2), qt.qeye(2))
target = qt.gates.cnot()

# convert to superoperator (for open system)
initial = qt.sprepost(initial, initial.dag())
target = qt.sprepost(target, target.dag())

# single qubit control operators
sx, sy, sz = qt.sigmax(), qt.sigmay(), qt.sigmaz()
identity = qt.qeye(2)

# two qubit control operators
i_sx, sx_i = qt.tensor(sx, identity), qt.tensor(identity, sx)
i_sy, sy_i = qt.tensor(sy, identity), qt.tensor(identity, sy)
i_sz, sz_i = qt.tensor(sz, identity), qt.tensor(identity, sz)

# Define the control Hamiltonians
Hc = [i_sx, i_sy, i_sz, sx_i, sy_i, sz_i]
# Hc = [i_sx, i_sy, sx_i, sy_i]
Hc = [qt.liouvillian(H) for H in Hc]

# drift and noise term for a two-qubit system
omega, delta, gamma = 0.1, 1.0, 0.1
i_sm, sm_i = qt.tensor(qt.sigmam(), identity), qt.tensor(identity, qt.sigmam())

# energy levels and interaction
Hd = omega * (i_sz + sz_i) + delta * i_sz * sz_i
Hd = qt.liouvillian(H=Hd, c_ops=[np.sqrt(gamma) * (i_sm + sm_i)])

# combined operator list
H = [Hd] + [Hc[i] for i in range(len(Hc))]

# Define the objective
objectives = [Objective(initial, H, target)]

# Define the control parameters with bounds
control_parameters = {"p": {"bounds": [(-15, 15)]}}

# Define the time interval
tlist = np.linspace(0, np.pi, 100)

# Define algorithm-specific settings
algorithm_kwargs = {
    "fid_err_targ": 0.01,
    "alg": "RL",
    "max_iter": 1000,
    "shorter_pulses": False,
}
optimizer_kwargs = {}

Note that max_iter defines the number of episodes, the 100 in tlist defines the maximum number of steps per episode.
If shorter_pulses is True, the training will be longer as the algorithm will try to optimize the episodes using as few steps as possible in addition to checking if the target infidelity is reached. If False, the algorithm takes less time and stops as soon as it finds an episode with infidelity <= target infidelity.

Initialize and Train the RL Environment

Now we will call the optimize_pulses() method, passing it the control problem we defined. The method will create an instance of the _RL class, which will set up the reinforcement learning environment and start training. Finally it returns the optimization results through an object of the Result class.

# Initialize the RL environment and start training
rl_result = optimize_pulses(
    objectives, control_parameters, tlist, algorithm_kwargs, optimizer_kwargs
)

Analyze the Results

After the training is complete, we can analyze the results obtained by the RL agent. In the above window showing the output produced by Gymansium, ideally you could observe how during training the number of steps per episode (ep_len_mean) decreases and the average reward of the episodes (ep_rew_mean) increases.

We can now see the fields of the Result class, this includes the final infidelity, the optimized control parameters and more.

print(rl_result)
# We can show the hinton matrix

fig, (ax0, ax1, ax2) = plt.subplots(1, 3, figsize=(15, 5))
ax0.set_title("Initial")
ax1.set_title("Final")
ax2.set_title("Target")

qt.hinton(initial, ax=ax0)
qt.hinton(rl_result.final_states[0], ax=ax1)
qt.hinton(target, ax=ax2)

Without dissipation and decoherence

# Define the initial and target states
initial = qt.tensor(
    qt.qeye(2), qt.qeye(2)
)  # Initial state: identity (pure state version)
target = qt.gates.cnot()  # Target state: CNOT gate

# Single qubit control operators
sx, sy, sz = qt.sigmax(), qt.sigmay(), qt.sigmaz()
identity = qt.qeye(2)

# Two qubit control operators
i_sx, sx_i = qt.tensor(sx, identity), qt.tensor(identity, sx)
i_sy, sy_i = qt.tensor(sy, identity), qt.tensor(identity, sy)
i_sz, sz_i = qt.tensor(sz, identity), qt.tensor(identity, sz)

# Define the control Hamiltonians (unitary dynamics only)
Hc = [i_sx, i_sy, sx_i, sy_i]

# Define the drift Hamiltonian (unitary dynamics only)
omega, delta = 0.1, 1.0
Hd = (
    omega * (i_sz + sz_i) + delta * i_sz * sz_i
)  # Energy levels and interaction (no decoherence)

# Combined operator list
H = [Hd] + Hc  # Drift Hamiltonian + control Hamiltonians

# Define the objective (initial state, control Hamiltonians, and target state)
objectives = [Objective(initial=initial, H=H, target=target)]

# Define the control parameters with bounds
control_parameters = {"p": {"bounds": [(-6, 6)]}}

# Define the time interval
tlist = np.linspace(0, 2 * np.pi, 100)

# Define algorithm-specific settings
algorithm_kwargs = {
    "fid_err_targ": 0.01,
    "alg": "RL",
    "max_iter": 3000,  # Maximum iterations for optimization
    "shorter_pulses": False,
}
optimizer_kwargs = {}
# Initialize the RL environment and start training
rl_result = optimize_pulses(
    objectives, control_parameters, tlist, algorithm_kwargs, optimizer_kwargs
)
print(rl_result)
# Hinton plot

fig, (ax0, ax1, ax2) = plt.subplots(1, 3, figsize=(15, 5))
ax0.set_title("Initial")
ax1.set_title("Final")
ax2.set_title("Target")

qt.hinton(initial, ax=ax0)
qt.hinton(rl_result.final_states[0], ax=ax1)
qt.hinton(target, ax=ax2)