0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

scikit-learnの Perceptron.fit() のソースコードを読み解く方法 & Perceptronは特徴ゼロの重み ωi を更新するのか?

Last updated at Posted at 2025-03-03

overview

Perceptronはpythonに加えてCython(.pyx), C で書かれている。全て読まないと理解できない。そしてChatGPTは全く役出たない。嘘ばっかり言う。

Build方法

_sgd_fast.pyx などのCythonのコードは生成しなければいけない。ネット上に見当たらない。なんで。

このbuild方法もいまいち見当たらないので適当にやりました

git clone https://github.com/scikit-learn/scikit-learn.git
pip install .
pip install meson-python
pip install ninja
pip install cython
make dev

これで build/cp310/sklearn/linear_model/_sgd_fast.pyx などが生成されます。

Perceptron.fit() のコードを追う

以下は、Perceptron.fit()から始まる一連の関数の役割と主な入力、主な出力の説明です。ChatGPTが全く役立たないので、自力でほぼ探しました。

Perceptron.fit()

BaseSGDClassifier.fit() への alias

sklearn/linear_model/_stochastic_gradient.pyBaseSGDClassifier.fit()

BaseSGDClassifier._fit()

BaseSGDClassifier._partial_fit()

BaseSGDClassifier._fit_binary()

BaseSGDClassifier.fit_binary()

build/cp310/sklearn/linear_model/_sgd_fast.pyx_plain_sgd64()

  • 役割: Cythonで実装されたSGDアルゴリズムを使用してモデルをトレーニングする。この中で重みの更新が行われる。これが学習ほぼ本体
  • 主な入力: weightsinterceptaverage_weightsaverage_interceptlosspenalty_typealphaCl1_ratiodatasetvalidation_maskearly_stoppingvalidation_score_cbn_iter_no_changemax_itertolfit_interceptverboseshuffleseedweight_posweight_neglearning_rateeta0power_tone_classtintercept_decayaverage
  • 主な出力: weightsinterceptaverage_weightsaverage_interceptn_iter_

_seq_dataset.pyxSequentialDataset64.next()

sklearn/utils/_seq_dataset.pyx.tp_sample()

これはCythonの _seq_dataset.pyx 側が pass するだけの関数で、template側がコードの実態のようだった
次のサンプルに向けてデータ位置のoffsetを足すだけのすごくシンプルな処理
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_seq_dataset.pyx.tp#L336

_weight_vector.pyxWeightVector64.add()

Perceptronは特徴ゼロの重み ωi を更新するのか?

一通りソースをみたところ、特徴ゼロの重みωiを更新しているように見えた。
WeightVector64.add()の引数にある xnnzThe number of non-zero features of "x". と説明されているが、ここに至るまでにXを特徴>0だけのフィルタで絞ったコードは見つけられなかった。結果的にすべてのXが入力されており、 xnnz はmisleadな変数名だと理解した。他のルートを通ってくると xnnz はfilterされてくると想像した。

Cythonのコードたち

せっかくbuldしたのでここに記録しておく。

_sgd_fast.pyx

"""SGD implementation"""

import numpy as np
from time import time

from cython cimport floating
from libc.math cimport exp, fabs, isfinite, log, pow, INFINITY

from .._loss._loss cimport CyLossFunction
from ..utils._typedefs cimport uint32_t, uint8_t
from ..utils._weight_vector cimport WeightVector32, WeightVector64
from ..utils._seq_dataset cimport SequentialDataset32, SequentialDataset64


cdef extern from *:
    """
    /* Penalty constants */
    #define NO_PENALTY 0
    #define L1 1
    #define L2 2
    #define ELASTICNET 3

    /* Learning rate constants */
    #define CONSTANT 1
    #define OPTIMAL 2
    #define INVSCALING 3
    #define ADAPTIVE 4
    #define PA1 5
    #define PA2 6
    """
    int NO_PENALTY = 0
    int L1 = 1
    int L2 = 2
    int ELASTICNET = 3

    int CONSTANT = 1
    int OPTIMAL = 2
    int INVSCALING = 3
    int ADAPTIVE = 4
    int PA1 = 5
    int PA2 = 6


# ----------------------------------------
# Extension Types for Loss Functions
# ----------------------------------------

cdef class Regression(CyLossFunction):
    """Base class for loss functions for regression"""

    def py_loss(self, double p, double y):
        """Python version of `loss` for testing only.

        Pytest needs a python function and can't use cdef functions.

        Parameters
        ----------
        p : double
            The prediction, `p = w^T x + intercept`.
        y : double
            The true value (aka target).

        Returns
        -------
        double
            The loss evaluated at `p` and `y`.
        """
        return self.cy_loss(y, p)

    def py_dloss(self, double p, double y):
        """Python version of `dloss` for testing only.

        Pytest needs a python function and can't use cdef functions.

        Parameters
        ----------
        p : double
            The prediction, `p = w^T x`.
        y : double
            The true value (aka target).

        Returns
        -------
        double
            The derivative of the loss function with regards to `p`.
        """
        return self.cy_gradient(y, p)


cdef class Classification(CyLossFunction):
    """Base class for loss functions for classification"""

    def py_loss(self, double p, double y):
        """Python version of `loss` for testing only."""
        return self.cy_loss(y, p)

    def py_dloss(self, double p, double y):
        """Python version of `dloss` for testing only."""
        return self.cy_gradient(y, p)


cdef class ModifiedHuber(Classification):
    """Modified Huber loss for binary classification with y in {-1, 1}

    This is equivalent to quadratically smoothed SVM with gamma = 2.

    See T. Zhang 'Solving Large Scale Linear Prediction Problems Using
    Stochastic Gradient Descent', ICML'04.
    """
    cdef double cy_loss(self, double y, double p) noexcept nogil:
        cdef double z = p * y
        if z >= 1.0:
            return 0.0
        elif z >= -1.0:
            return (1.0 - z) * (1.0 - z)
        else:
            return -4.0 * z

    cdef double cy_gradient(self, double y, double p) noexcept nogil:
        cdef double z = p * y
        if z >= 1.0:
            return 0.0
        elif z >= -1.0:
            return 2.0 * (1.0 - z) * -y
        else:
            return -4.0 * y

    def __reduce__(self):
        return ModifiedHuber, ()


cdef class Hinge(Classification):
    """Hinge loss for binary classification tasks with y in {-1,1}

    Parameters
    ----------

    threshold : float > 0.0
        Margin threshold. When threshold=1.0, one gets the loss used by SVM.
        When threshold=0.0, one gets the loss used by the Perceptron.
    """

    cdef double threshold

    def __init__(self, double threshold=1.0):
        self.threshold = threshold

    cdef double cy_loss(self, double y, double p) noexcept nogil:
        cdef double z = p * y
        if z <= self.threshold:
            return self.threshold - z
        return 0.0

    cdef double cy_gradient(self, double y, double p) noexcept nogil:
        cdef double z = p * y
        if z <= self.threshold:
            return -y
        return 0.0

    def __reduce__(self):
        return Hinge, (self.threshold,)


cdef class SquaredHinge(Classification):
    """Squared Hinge loss for binary classification tasks with y in {-1,1}

    Parameters
    ----------

    threshold : float > 0.0
        Margin threshold. When threshold=1.0, one gets the loss used by
        (quadratically penalized) SVM.
    """

    cdef double threshold

    def __init__(self, double threshold=1.0):
        self.threshold = threshold

    cdef double cy_loss(self, double y, double p) noexcept nogil:
        cdef double z = self.threshold - p * y
        if z > 0:
            return z * z
        return 0.0

    cdef double cy_gradient(self, double y, double p) noexcept nogil:
        cdef double z = self.threshold - p * y
        if z > 0:
            return -2 * y * z
        return 0.0

    def __reduce__(self):
        return SquaredHinge, (self.threshold,)


cdef class EpsilonInsensitive(Regression):
    """Epsilon-Insensitive loss (used by SVR).

    loss = max(0, |y - p| - epsilon)
    """

    cdef double epsilon

    def __init__(self, double epsilon):
        self.epsilon = epsilon

    cdef double cy_loss(self, double y, double p) noexcept nogil:
        cdef double ret = fabs(y - p) - self.epsilon
        return ret if ret > 0 else 0

    cdef double cy_gradient(self, double y, double p) noexcept nogil:
        if y - p > self.epsilon:
            return -1
        elif p - y > self.epsilon:
            return 1
        else:
            return 0

    def __reduce__(self):
        return EpsilonInsensitive, (self.epsilon,)


cdef class SquaredEpsilonInsensitive(Regression):
    """Epsilon-Insensitive loss.

    loss = max(0, |y - p| - epsilon)^2
    """

    cdef double epsilon

    def __init__(self, double epsilon):
        self.epsilon = epsilon

    cdef double cy_loss(self, double y, double p) noexcept nogil:
        cdef double ret = fabs(y - p) - self.epsilon
        return ret * ret if ret > 0 else 0

    cdef double cy_gradient(self, double y, double p) noexcept nogil:
        cdef double z
        z = y - p
        if z > self.epsilon:
            return -2 * (z - self.epsilon)
        elif z < -self.epsilon:
            return 2 * (-z - self.epsilon)
        else:
            return 0

    def __reduce__(self):
        return SquaredEpsilonInsensitive, (self.epsilon,)

def _plain_sgd64(
    const double[::1] weights,
    double intercept,
    const double[::1] average_weights,
    double average_intercept,
    CyLossFunction loss,
    int penalty_type,
    double alpha,
    double C,
    double l1_ratio,
    SequentialDataset64 dataset,
    const uint8_t[::1] validation_mask,
    bint early_stopping,
    validation_score_cb,
    int n_iter_no_change,
    unsigned int max_iter,
    double tol,
    int fit_intercept,
    int verbose,
    bint shuffle,
    uint32_t seed,
    double weight_pos,
    double weight_neg,
    int learning_rate,
    double eta0,
    double power_t,
    bint one_class,
    double t=1.0,
    double intercept_decay=1.0,
    int average=0,
):
    """SGD for generic loss functions and penalties with optional averaging

    Parameters
    ----------
    weights : ndarray[double, ndim=1]
        The allocated vector of weights.
    intercept : double
        The initial intercept.
    average_weights : ndarray[double, ndim=1]
        The average weights as computed for ASGD. Should be None if average
        is 0.
    average_intercept : double
        The average intercept for ASGD. Should be 0 if average is 0.
    loss : CyLossFunction
        A concrete ``CyLossFunction`` object.
    penalty_type : int
        The penalty 2 for L2, 1 for L1, and 3 for Elastic-Net.
    alpha : float
        The regularization parameter.
    C : float
        Maximum step size for passive aggressive.
    l1_ratio : float
        The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1.
        l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1.
    dataset : SequentialDataset
        A concrete ``SequentialDataset`` object.
    validation_mask : ndarray[uint8_t, ndim=1]
        Equal to True on the validation set.
    early_stopping : boolean
        Whether to use a stopping criterion based on the validation set.
    validation_score_cb : callable
        A callable to compute a validation score given the current
        coefficients and intercept values.
        Used only if early_stopping is True.
    n_iter_no_change : int
        Number of iteration with no improvement to wait before stopping.
    max_iter : int
        The maximum number of iterations (epochs).
    tol: double
        The tolerance for the stopping criterion.
    fit_intercept : int
        Whether or not to fit the intercept (1 or 0).
    verbose : int
        Print verbose output; 0 for quite.
    shuffle : boolean
        Whether to shuffle the training data before each epoch.
    weight_pos : float
        The weight of the positive class.
    weight_neg : float
        The weight of the negative class.
    seed : uint32_t
        Seed of the pseudorandom number generator used to shuffle the data.
    learning_rate : int
        The learning rate:
        (1) constant, eta = eta0
        (2) optimal, eta = 1.0/(alpha * t).
        (3) inverse scaling, eta = eta0 / pow(t, power_t)
        (4) adaptive decrease
        (5) Passive Aggressive-I, eta = min(alpha, loss/norm(x))
        (6) Passive Aggressive-II, eta = 1.0 / (norm(x) + 0.5*alpha)
    eta0 : double
        The initial learning rate.
    power_t : double
        The exponent for inverse scaling learning rate.
    one_class : boolean
        Whether to solve the One-Class SVM optimization problem.
    t : double
        Initial state of the learning rate. This value is equal to the
        iteration count except when the learning rate is set to `optimal`.
        Default: 1.0.
    average : int
        The number of iterations before averaging starts. average=1 is
        equivalent to averaging for all iterations.


    Returns
    -------
    weights : array, shape=[n_features]
        The fitted weight vector.
    intercept : float
        The fitted intercept term.
    average_weights : array shape=[n_features]
        The averaged weights across iterations. Values are valid only if
        average > 0.
    average_intercept : float
        The averaged intercept across iterations.
        Values are valid only if average > 0.
    n_iter_ : int
        The actual number of iter (epochs).
    """

    # get the data information into easy vars
    cdef Py_ssize_t n_samples = dataset.n_samples
    cdef Py_ssize_t n_features = weights.shape[0]

    cdef WeightVector64 w = WeightVector64(weights, average_weights)
    cdef double *x_data_ptr = NULL
    cdef int *x_ind_ptr = NULL

    # helper variables
    cdef int no_improvement_count = 0
    cdef bint infinity = False
    cdef int xnnz
    cdef double eta = 0.0
    cdef double p = 0.0
    cdef double update = 0.0
    cdef double intercept_update = 0.0
    cdef double sumloss = 0.0
    cdef double score = 0.0
    cdef double best_loss = INFINITY
    cdef double best_score = -INFINITY
    cdef double y = 0.0
    cdef double sample_weight
    cdef double class_weight = 1.0
    cdef unsigned int count = 0
    cdef unsigned int train_count = n_samples - np.sum(validation_mask)
    cdef unsigned int epoch = 0
    cdef unsigned int i = 0
    cdef int is_hinge = isinstance(loss, Hinge)
    cdef double optimal_init = 0.0
    cdef double dloss = 0.0
    cdef double MAX_DLOSS = 1e12

    cdef long long sample_index

    # q vector is only used for L1 regularization
    cdef double[::1] q = None
    cdef double * q_data_ptr = NULL
    if penalty_type == L1 or penalty_type == ELASTICNET:
        q = np.zeros((n_features,), dtype=np.float64, order="c")
        q_data_ptr = &q[0]
    cdef double u = 0.0

    if penalty_type == L2:
        l1_ratio = 0.0
    elif penalty_type == L1:
        l1_ratio = 1.0

    eta = eta0

    if learning_rate == OPTIMAL:
        typw = np.sqrt(1.0 / np.sqrt(alpha))
        # computing eta0, the initial learning rate
        initial_eta0 = typw / max(1.0, loss.cy_gradient(1.0, -typw))
        # initialize t such that eta at first sample equals eta0
        optimal_init = 1.0 / (initial_eta0 * alpha)

    t_start = time()
    with nogil:
        for epoch in range(max_iter):
            sumloss = 0
            if verbose > 0:
                with gil:
                    print("-- Epoch %d" % (epoch + 1))
            if shuffle:
                dataset.shuffle(seed)
            for i in range(n_samples):
                dataset.next(&x_data_ptr, &x_ind_ptr, &xnnz,
                             &y, &sample_weight)

                sample_index = dataset.index_data_ptr[dataset.current_index]
                if validation_mask[sample_index]:
                    # do not learn on the validation set
                    continue

                p = w.dot(x_data_ptr, x_ind_ptr, xnnz) + intercept
                if learning_rate == OPTIMAL:
                    eta = 1.0 / (alpha * (optimal_init + t - 1))
                elif learning_rate == INVSCALING:
                    eta = eta0 / pow(t, power_t)

                if verbose or not early_stopping:
                    sumloss += loss.cy_loss(y, p)

                if y > 0.0:
                    class_weight = weight_pos
                else:
                    class_weight = weight_neg

                if learning_rate == PA1:
                    update = sqnorm(x_data_ptr, x_ind_ptr, xnnz)
                    if update == 0:
                        continue
                    update = min(C, loss.cy_loss(y, p) / update)
                elif learning_rate == PA2:
                    update = sqnorm(x_data_ptr, x_ind_ptr, xnnz)
                    update = loss.cy_loss(y, p) / (update + 0.5 / C)
                else:
                    dloss = loss.cy_gradient(y, p)
                    # clip dloss with large values to avoid numerical
                    # instabilities
                    if dloss < -MAX_DLOSS:
                        dloss = -MAX_DLOSS
                    elif dloss > MAX_DLOSS:
                        dloss = MAX_DLOSS
                    update = -eta * dloss

                if learning_rate >= PA1:
                    if is_hinge:
                        # classification
                        update *= y
                    elif y - p < 0:
                        # regression
                        update *= -1

                update *= class_weight * sample_weight

                if penalty_type >= L2:
                    # do not scale to negative values when eta or alpha are too
                    # big: instead set the weights to zero
                    w.scale(max(0, 1.0 - ((1.0 - l1_ratio) * eta * alpha)))

                if update != 0.0:
                    w.add(x_data_ptr, x_ind_ptr, xnnz, update)
                if fit_intercept == 1:
                    intercept_update = update
                    if one_class:  # specific for One-Class SVM
                        intercept_update -= 2. * eta * alpha
                    if intercept_update != 0:
                        intercept += intercept_update * intercept_decay

                if 0 < average <= t:
                    # compute the average for the intercept and update the
                    # average weights, this is done regardless as to whether
                    # the update is 0

                    w.add_average(x_data_ptr, x_ind_ptr, xnnz,
                                  update, (t - average + 1))
                    average_intercept += ((intercept - average_intercept) /
                                          (t - average + 1))

                if penalty_type == L1 or penalty_type == ELASTICNET:
                    u += (l1_ratio * eta * alpha)
                    l1penalty64(w, q_data_ptr, x_ind_ptr, xnnz, u)

                t += 1
                count += 1

            # report epoch information
            if verbose > 0:
                with gil:
                    print("Norm: %.2f, NNZs: %d, Bias: %.6f, T: %d, "
                          "Avg. loss: %f"
                          % (w.norm(), np.nonzero(weights)[0].shape[0],
                             intercept, count, sumloss / train_count))
                    print("Total training time: %.2f seconds."
                          % (time() - t_start))

            # floating-point under-/overflow check.
            if (not isfinite(intercept) or any_nonfinite(weights)):
                infinity = True
                break

            # evaluate the score on the validation set
            if early_stopping:
                with gil:
                    score = validation_score_cb(weights.base, intercept)
                if tol > -INFINITY and score < best_score + tol:
                    no_improvement_count += 1
                else:
                    no_improvement_count = 0
                if score > best_score:
                    best_score = score
            # or evaluate the loss on the training set
            else:
                if tol > -INFINITY and sumloss > best_loss - tol * train_count:
                    no_improvement_count += 1
                else:
                    no_improvement_count = 0
                if sumloss < best_loss:
                    best_loss = sumloss

            # if there is no improvement several times in a row
            if no_improvement_count >= n_iter_no_change:
                if learning_rate == ADAPTIVE and eta > 1e-6:
                    eta = eta / 5
                    no_improvement_count = 0
                else:
                    if verbose:
                        with gil:
                            print("Convergence after %d epochs took %.2f "
                                  "seconds" % (epoch + 1, time() - t_start))
                    break

    if infinity:
        raise ValueError(("Floating-point under-/overflow occurred at epoch"
                          " #%d. Scaling input data with StandardScaler or"
                          " MinMaxScaler might help.") % (epoch + 1))

    w.reset_wscale()

    return (
        weights.base,
        intercept,
        None if average_weights is None else average_weights.base,
        average_intercept,
        epoch + 1
    )

def _plain_sgd32(
    const float[::1] weights,
    double intercept,
    const float[::1] average_weights,
    double average_intercept,
    CyLossFunction loss,
    int penalty_type,
    double alpha,
    double C,
    double l1_ratio,
    SequentialDataset32 dataset,
    const uint8_t[::1] validation_mask,
    bint early_stopping,
    validation_score_cb,
    int n_iter_no_change,
    unsigned int max_iter,
    double tol,
    int fit_intercept,
    int verbose,
    bint shuffle,
    uint32_t seed,
    double weight_pos,
    double weight_neg,
    int learning_rate,
    double eta0,
    double power_t,
    bint one_class,
    double t=1.0,
    double intercept_decay=1.0,
    int average=0,
):
    """SGD for generic loss functions and penalties with optional averaging

    Parameters
    ----------
    weights : ndarray[float, ndim=1]
        The allocated vector of weights.
    intercept : double
        The initial intercept.
    average_weights : ndarray[float, ndim=1]
        The average weights as computed for ASGD. Should be None if average
        is 0.
    average_intercept : double
        The average intercept for ASGD. Should be 0 if average is 0.
    loss : CyLossFunction
        A concrete ``CyLossFunction`` object.
    penalty_type : int
        The penalty 2 for L2, 1 for L1, and 3 for Elastic-Net.
    alpha : float
        The regularization parameter.
    C : float
        Maximum step size for passive aggressive.
    l1_ratio : float
        The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1.
        l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1.
    dataset : SequentialDataset
        A concrete ``SequentialDataset`` object.
    validation_mask : ndarray[uint8_t, ndim=1]
        Equal to True on the validation set.
    early_stopping : boolean
        Whether to use a stopping criterion based on the validation set.
    validation_score_cb : callable
        A callable to compute a validation score given the current
        coefficients and intercept values.
        Used only if early_stopping is True.
    n_iter_no_change : int
        Number of iteration with no improvement to wait before stopping.
    max_iter : int
        The maximum number of iterations (epochs).
    tol: double
        The tolerance for the stopping criterion.
    fit_intercept : int
        Whether or not to fit the intercept (1 or 0).
    verbose : int
        Print verbose output; 0 for quite.
    shuffle : boolean
        Whether to shuffle the training data before each epoch.
    weight_pos : float
        The weight of the positive class.
    weight_neg : float
        The weight of the negative class.
    seed : uint32_t
        Seed of the pseudorandom number generator used to shuffle the data.
    learning_rate : int
        The learning rate:
        (1) constant, eta = eta0
        (2) optimal, eta = 1.0/(alpha * t).
        (3) inverse scaling, eta = eta0 / pow(t, power_t)
        (4) adaptive decrease
        (5) Passive Aggressive-I, eta = min(alpha, loss/norm(x))
        (6) Passive Aggressive-II, eta = 1.0 / (norm(x) + 0.5*alpha)
    eta0 : double
        The initial learning rate.
    power_t : double
        The exponent for inverse scaling learning rate.
    one_class : boolean
        Whether to solve the One-Class SVM optimization problem.
    t : double
        Initial state of the learning rate. This value is equal to the
        iteration count except when the learning rate is set to `optimal`.
        Default: 1.0.
    average : int
        The number of iterations before averaging starts. average=1 is
        equivalent to averaging for all iterations.


    Returns
    -------
    weights : array, shape=[n_features]
        The fitted weight vector.
    intercept : float
        The fitted intercept term.
    average_weights : array shape=[n_features]
        The averaged weights across iterations. Values are valid only if
        average > 0.
    average_intercept : float
        The averaged intercept across iterations.
        Values are valid only if average > 0.
    n_iter_ : int
        The actual number of iter (epochs).
    """

    # get the data information into easy vars
    cdef Py_ssize_t n_samples = dataset.n_samples
    cdef Py_ssize_t n_features = weights.shape[0]

    cdef WeightVector32 w = WeightVector32(weights, average_weights)
    cdef float *x_data_ptr = NULL
    cdef int *x_ind_ptr = NULL

    # helper variables
    cdef int no_improvement_count = 0
    cdef bint infinity = False
    cdef int xnnz
    cdef double eta = 0.0
    cdef double p = 0.0
    cdef double update = 0.0
    cdef double intercept_update = 0.0
    cdef double sumloss = 0.0
    cdef double score = 0.0
    cdef double best_loss = INFINITY
    cdef double best_score = -INFINITY
    cdef float y = 0.0
    cdef float sample_weight
    cdef float class_weight = 1.0
    cdef unsigned int count = 0
    cdef unsigned int train_count = n_samples - np.sum(validation_mask)
    cdef unsigned int epoch = 0
    cdef unsigned int i = 0
    cdef int is_hinge = isinstance(loss, Hinge)
    cdef double optimal_init = 0.0
    cdef double dloss = 0.0
    cdef double MAX_DLOSS = 1e12

    cdef long long sample_index

    # q vector is only used for L1 regularization
    cdef float[::1] q = None
    cdef float * q_data_ptr = NULL
    if penalty_type == L1 or penalty_type == ELASTICNET:
        q = np.zeros((n_features,), dtype=np.float32, order="c")
        q_data_ptr = &q[0]
    cdef double u = 0.0

    if penalty_type == L2:
        l1_ratio = 0.0
    elif penalty_type == L1:
        l1_ratio = 1.0

    eta = eta0

    if learning_rate == OPTIMAL:
        typw = np.sqrt(1.0 / np.sqrt(alpha))
        # computing eta0, the initial learning rate
        initial_eta0 = typw / max(1.0, loss.cy_gradient(1.0, -typw))
        # initialize t such that eta at first sample equals eta0
        optimal_init = 1.0 / (initial_eta0 * alpha)

    t_start = time()
    with nogil:
        for epoch in range(max_iter):
            sumloss = 0
            if verbose > 0:
                with gil:
                    print("-- Epoch %d" % (epoch + 1))
            if shuffle:
                dataset.shuffle(seed)
            for i in range(n_samples):
                dataset.next(&x_data_ptr, &x_ind_ptr, &xnnz,
                             &y, &sample_weight)

                sample_index = dataset.index_data_ptr[dataset.current_index]
                if validation_mask[sample_index]:
                    # do not learn on the validation set
                    continue

                p = w.dot(x_data_ptr, x_ind_ptr, xnnz) + intercept
                if learning_rate == OPTIMAL:
                    eta = 1.0 / (alpha * (optimal_init + t - 1))
                elif learning_rate == INVSCALING:
                    eta = eta0 / pow(t, power_t)

                if verbose or not early_stopping:
                    sumloss += loss.cy_loss(y, p)

                if y > 0.0:
                    class_weight = weight_pos
                else:
                    class_weight = weight_neg

                if learning_rate == PA1:
                    update = sqnorm(x_data_ptr, x_ind_ptr, xnnz)
                    if update == 0:
                        continue
                    update = min(C, loss.cy_loss(y, p) / update)
                elif learning_rate == PA2:
                    update = sqnorm(x_data_ptr, x_ind_ptr, xnnz)
                    update = loss.cy_loss(y, p) / (update + 0.5 / C)
                else:
                    dloss = loss.cy_gradient(y, p)
                    # clip dloss with large values to avoid numerical
                    # instabilities
                    if dloss < -MAX_DLOSS:
                        dloss = -MAX_DLOSS
                    elif dloss > MAX_DLOSS:
                        dloss = MAX_DLOSS
                    update = -eta * dloss

                if learning_rate >= PA1:
                    if is_hinge:
                        # classification
                        update *= y
                    elif y - p < 0:
                        # regression
                        update *= -1

                update *= class_weight * sample_weight

                if penalty_type >= L2:
                    # do not scale to negative values when eta or alpha are too
                    # big: instead set the weights to zero
                    w.scale(max(0, 1.0 - ((1.0 - l1_ratio) * eta * alpha)))

                if update != 0.0:
                    w.add(x_data_ptr, x_ind_ptr, xnnz, update)
                if fit_intercept == 1:
                    intercept_update = update
                    if one_class:  # specific for One-Class SVM
                        intercept_update -= 2. * eta * alpha
                    if intercept_update != 0:
                        intercept += intercept_update * intercept_decay

                if 0 < average <= t:
                    # compute the average for the intercept and update the
                    # average weights, this is done regardless as to whether
                    # the update is 0

                    w.add_average(x_data_ptr, x_ind_ptr, xnnz,
                                  update, (t - average + 1))
                    average_intercept += ((intercept - average_intercept) /
                                          (t - average + 1))

                if penalty_type == L1 or penalty_type == ELASTICNET:
                    u += (l1_ratio * eta * alpha)
                    l1penalty32(w, q_data_ptr, x_ind_ptr, xnnz, u)

                t += 1
                count += 1

            # report epoch information
            if verbose > 0:
                with gil:
                    print("Norm: %.2f, NNZs: %d, Bias: %.6f, T: %d, "
                          "Avg. loss: %f"
                          % (w.norm(), np.nonzero(weights)[0].shape[0],
                             intercept, count, sumloss / train_count))
                    print("Total training time: %.2f seconds."
                          % (time() - t_start))

            # floating-point under-/overflow check.
            if (not isfinite(intercept) or any_nonfinite(weights)):
                infinity = True
                break

            # evaluate the score on the validation set
            if early_stopping:
                with gil:
                    score = validation_score_cb(weights.base, intercept)
                if tol > -INFINITY and score < best_score + tol:
                    no_improvement_count += 1
                else:
                    no_improvement_count = 0
                if score > best_score:
                    best_score = score
            # or evaluate the loss on the training set
            else:
                if tol > -INFINITY and sumloss > best_loss - tol * train_count:
                    no_improvement_count += 1
                else:
                    no_improvement_count = 0
                if sumloss < best_loss:
                    best_loss = sumloss

            # if there is no improvement several times in a row
            if no_improvement_count >= n_iter_no_change:
                if learning_rate == ADAPTIVE and eta > 1e-6:
                    eta = eta / 5
                    no_improvement_count = 0
                else:
                    if verbose:
                        with gil:
                            print("Convergence after %d epochs took %.2f "
                                  "seconds" % (epoch + 1, time() - t_start))
                    break

    if infinity:
        raise ValueError(("Floating-point under-/overflow occurred at epoch"
                          " #%d. Scaling input data with StandardScaler or"
                          " MinMaxScaler might help.") % (epoch + 1))

    w.reset_wscale()

    return (
        weights.base,
        intercept,
        None if average_weights is None else average_weights.base,
        average_intercept,
        epoch + 1
    )


cdef inline bint any_nonfinite(const floating[::1] w) noexcept nogil:
    for i in range(w.shape[0]):
        if not isfinite(w[i]):
            return True
    return 0


cdef inline double sqnorm(
    floating * x_data_ptr,
    int * x_ind_ptr,
    int xnnz,
) noexcept nogil:
    cdef double x_norm = 0.0
    cdef int j
    cdef double z
    for j in range(xnnz):
        z = x_data_ptr[j]
        x_norm += z * z
    return x_norm


cdef void l1penalty64(
    WeightVector64 w,
    double * q_data_ptr,
    int *x_ind_ptr,
    int xnnz,
    double u,
) noexcept nogil:
    """Apply the L1 penalty to each updated feature

    This implements the truncated gradient approach by
    [Tsuruoka, Y., Tsujii, J., and Ananiadou, S., 2009].
    """
    cdef double z = 0.0
    cdef int j = 0
    cdef int idx = 0
    cdef double wscale = w.wscale
    cdef double *w_data_ptr = w.w_data_ptr
    for j in range(xnnz):
        idx = x_ind_ptr[j]
        z = w_data_ptr[idx]
        if wscale * z > 0.0:
            w_data_ptr[idx] = max(
                0.0, w_data_ptr[idx] - ((u + q_data_ptr[idx]) / wscale))

        elif wscale * z < 0.0:
            w_data_ptr[idx] = min(
                0.0, w_data_ptr[idx] + ((u - q_data_ptr[idx]) / wscale))

        q_data_ptr[idx] += wscale * (w_data_ptr[idx] - z)

cdef void l1penalty32(
    WeightVector32 w,
    float * q_data_ptr,
    int *x_ind_ptr,
    int xnnz,
    double u,
) noexcept nogil:
    """Apply the L1 penalty to each updated feature

    This implements the truncated gradient approach by
    [Tsuruoka, Y., Tsujii, J., and Ananiadou, S., 2009].
    """
    cdef double z = 0.0
    cdef int j = 0
    cdef int idx = 0
    cdef double wscale = w.wscale
    cdef float *w_data_ptr = w.w_data_ptr
    for j in range(xnnz):
        idx = x_ind_ptr[j]
        z = w_data_ptr[idx]
        if wscale * z > 0.0:
            w_data_ptr[idx] = max(
                0.0, w_data_ptr[idx] - ((u + q_data_ptr[idx]) / wscale))

        elif wscale * z < 0.0:
            w_data_ptr[idx] = min(
                0.0, w_data_ptr[idx] + ((u - q_data_ptr[idx]) / wscale))

        q_data_ptr[idx] += wscale * (w_data_ptr[idx] - z)

_weight_vector.pyx


# cython: binding=False
#
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

cimport cython
from libc.limits cimport INT_MAX
from libc.math cimport sqrt

from ._cython_blas cimport _dot, _scal, _axpy

cdef class WeightVector64(object):
    """Dense vector represented by a scalar and a numpy array.

    The class provides methods to ``add`` a sparse vector
    and scale the vector.
    Representing a vector explicitly as a scalar times a
    vector allows for efficient scaling operations.

    Attributes
    ----------
    w : ndarray, dtype=double, order='C'
        The numpy array which backs the weight vector.
    aw : ndarray, dtype=double, order='C'
        The numpy array which backs the average_weight vector.
    w_data_ptr : double*
        A pointer to the data of the numpy array.
    wscale : double
        The scale of the vector.
    n_features : int
        The number of features (= dimensionality of ``w``).
    sq_norm : double
        The squared norm of ``w``.
    """

    def __cinit__(self,
                  double[::1] w,
                  double[::1] aw):

        if w.shape[0] > INT_MAX:
            raise ValueError("More than %d features not supported; got %d."
                             % (INT_MAX, w.shape[0]))
        self.w = w
        self.w_data_ptr = &w[0]
        self.wscale = 1.0
        self.n_features = w.shape[0]
        self.sq_norm = _dot(self.n_features, self.w_data_ptr, 1, self.w_data_ptr, 1)

        self.aw = aw
        if self.aw is not None:
            self.aw_data_ptr = &aw[0]
            self.average_a = 0.0
            self.average_b = 1.0

    cdef void add(self, double *x_data_ptr, int *x_ind_ptr, int xnnz,
                  double c) noexcept nogil:
        """Scales sample x by constant c and adds it to the weight vector.

        This operation updates ``sq_norm``.

        Parameters
        ----------
        x_data_ptr : double*
            The array which holds the feature values of ``x``.
        x_ind_ptr : np.intc*
            The array which holds the feature indices of ``x``.
        xnnz : int
            The number of non-zero features of ``x``.
        c : double
            The scaling constant for the example.
        """
        cdef int j
        cdef int idx
        cdef double val
        cdef double innerprod = 0.0
        cdef double xsqnorm = 0.0

        # the next two lines save a factor of 2!
        cdef double wscale = self.wscale
        cdef double* w_data_ptr = self.w_data_ptr

        for j in range(xnnz):
            idx = x_ind_ptr[j]
            val = x_data_ptr[j]
            innerprod += (w_data_ptr[idx] * val)
            xsqnorm += (val * val)
            w_data_ptr[idx] += val * (c / wscale)

        self.sq_norm += (xsqnorm * c * c) + (2.0 * innerprod * wscale * c)

    # Update the average weights according to the sparse trick defined
    # here: https://research.microsoft.com/pubs/192769/tricks-2012.pdf
    # by Leon Bottou
    cdef void add_average(self, double *x_data_ptr, int *x_ind_ptr, int xnnz,
                          double c, double num_iter) noexcept nogil:
        """Updates the average weight vector.

        Parameters
        ----------
        x_data_ptr : double*
            The array which holds the feature values of ``x``.
        x_ind_ptr : np.intc*
            The array which holds the feature indices of ``x``.
        xnnz : int
            The number of non-zero features of ``x``.
        c : double
            The scaling constant for the example.
        num_iter : double
            The total number of iterations.
        """
        cdef int j
        cdef int idx
        cdef double val
        cdef double mu = 1.0 / num_iter
        cdef double average_a = self.average_a
        cdef double wscale = self.wscale
        cdef double* aw_data_ptr = self.aw_data_ptr

        for j in range(xnnz):
            idx = x_ind_ptr[j]
            val = x_data_ptr[j]
            aw_data_ptr[idx] += (self.average_a * val * (-c / wscale))

        # Once the sample has been processed
        # update the average_a and average_b
        if num_iter > 1:
            self.average_b /= (1.0 - mu)
        self.average_a += mu * self.average_b * wscale

    cdef double dot(self, double *x_data_ptr, int *x_ind_ptr,
                    int xnnz) noexcept nogil:
        """Computes the dot product of a sample x and the weight vector.

        Parameters
        ----------
        x_data_ptr : double*
            The array which holds the feature values of ``x``.
        x_ind_ptr : np.intc*
            The array which holds the feature indices of ``x``.
        xnnz : int
            The number of non-zero features of ``x`` (length of x_ind_ptr).

        Returns
        -------
        innerprod : double
            The inner product of ``x`` and ``w``.
        """
        cdef int j
        cdef int idx
        cdef double innerprod = 0.0
        cdef double* w_data_ptr = self.w_data_ptr
        for j in range(xnnz):
            idx = x_ind_ptr[j]
            innerprod += w_data_ptr[idx] * x_data_ptr[j]
        innerprod *= self.wscale
        return innerprod

    cdef void scale(self, double c) noexcept nogil:
        """Scales the weight vector by a constant ``c``.

        It updates ``wscale`` and ``sq_norm``. If ``wscale`` gets too
        small we call ``reset_swcale``."""
        self.wscale *= c
        self.sq_norm *= (c * c)

        if self.wscale < 1e-09:
            self.reset_wscale()

    cdef void reset_wscale(self) noexcept nogil:
        """Scales each coef of ``w`` by ``wscale`` and resets it to 1. """
        if self.aw_data_ptr != NULL:
            _axpy(self.n_features, self.average_a,
                  self.w_data_ptr, 1, self.aw_data_ptr, 1)
            _scal(self.n_features, 1.0 / self.average_b, self.aw_data_ptr, 1)
            self.average_a = 0.0
            self.average_b = 1.0

        _scal(self.n_features, self.wscale, self.w_data_ptr, 1)
        self.wscale = 1.0

    cdef double norm(self) noexcept nogil:
        """The L2 norm of the weight vector. """
        return sqrt(self.sq_norm)

cdef class WeightVector32(object):
    """Dense vector represented by a scalar and a numpy array.

    The class provides methods to ``add`` a sparse vector
    and scale the vector.
    Representing a vector explicitly as a scalar times a
    vector allows for efficient scaling operations.

    Attributes
    ----------
    w : ndarray, dtype=float, order='C'
        The numpy array which backs the weight vector.
    aw : ndarray, dtype=float, order='C'
        The numpy array which backs the average_weight vector.
    w_data_ptr : float*
        A pointer to the data of the numpy array.
    wscale : float
        The scale of the vector.
    n_features : int
        The number of features (= dimensionality of ``w``).
    sq_norm : float
        The squared norm of ``w``.
    """

    def __cinit__(self,
                  float[::1] w,
                  float[::1] aw):

        if w.shape[0] > INT_MAX:
            raise ValueError("More than %d features not supported; got %d."
                             % (INT_MAX, w.shape[0]))
        self.w = w
        self.w_data_ptr = &w[0]
        self.wscale = 1.0
        self.n_features = w.shape[0]
        self.sq_norm = _dot(self.n_features, self.w_data_ptr, 1, self.w_data_ptr, 1)

        self.aw = aw
        if self.aw is not None:
            self.aw_data_ptr = &aw[0]
            self.average_a = 0.0
            self.average_b = 1.0

    cdef void add(self, float *x_data_ptr, int *x_ind_ptr, int xnnz,
                  float c) noexcept nogil:
        """Scales sample x by constant c and adds it to the weight vector.

        This operation updates ``sq_norm``.

        Parameters
        ----------
        x_data_ptr : float*
            The array which holds the feature values of ``x``.
        x_ind_ptr : np.intc*
            The array which holds the feature indices of ``x``.
        xnnz : int
            The number of non-zero features of ``x``.
        c : float
            The scaling constant for the example.
        """
        cdef int j
        cdef int idx
        cdef double val
        cdef double innerprod = 0.0
        cdef double xsqnorm = 0.0

        # the next two lines save a factor of 2!
        cdef float wscale = self.wscale
        cdef float* w_data_ptr = self.w_data_ptr

        for j in range(xnnz):
            idx = x_ind_ptr[j]
            val = x_data_ptr[j]
            innerprod += (w_data_ptr[idx] * val)
            xsqnorm += (val * val)
            w_data_ptr[idx] += val * (c / wscale)

        self.sq_norm += (xsqnorm * c * c) + (2.0 * innerprod * wscale * c)

    # Update the average weights according to the sparse trick defined
    # here: https://research.microsoft.com/pubs/192769/tricks-2012.pdf
    # by Leon Bottou
    cdef void add_average(self, float *x_data_ptr, int *x_ind_ptr, int xnnz,
                          float c, float num_iter) noexcept nogil:
        """Updates the average weight vector.

        Parameters
        ----------
        x_data_ptr : float*
            The array which holds the feature values of ``x``.
        x_ind_ptr : np.intc*
            The array which holds the feature indices of ``x``.
        xnnz : int
            The number of non-zero features of ``x``.
        c : float
            The scaling constant for the example.
        num_iter : float
            The total number of iterations.
        """
        cdef int j
        cdef int idx
        cdef double val
        cdef double mu = 1.0 / num_iter
        cdef double average_a = self.average_a
        cdef double wscale = self.wscale
        cdef float* aw_data_ptr = self.aw_data_ptr

        for j in range(xnnz):
            idx = x_ind_ptr[j]
            val = x_data_ptr[j]
            aw_data_ptr[idx] += (self.average_a * val * (-c / wscale))

        # Once the sample has been processed
        # update the average_a and average_b
        if num_iter > 1:
            self.average_b /= (1.0 - mu)
        self.average_a += mu * self.average_b * wscale

    cdef float dot(self, float *x_data_ptr, int *x_ind_ptr,
                    int xnnz) noexcept nogil:
        """Computes the dot product of a sample x and the weight vector.

        Parameters
        ----------
        x_data_ptr : float*
            The array which holds the feature values of ``x``.
        x_ind_ptr : np.intc*
            The array which holds the feature indices of ``x``.
        xnnz : int
            The number of non-zero features of ``x`` (length of x_ind_ptr).

        Returns
        -------
        innerprod : float
            The inner product of ``x`` and ``w``.
        """
        cdef int j
        cdef int idx
        cdef double innerprod = 0.0
        cdef float* w_data_ptr = self.w_data_ptr
        for j in range(xnnz):
            idx = x_ind_ptr[j]
            innerprod += w_data_ptr[idx] * x_data_ptr[j]
        innerprod *= self.wscale
        return innerprod

    cdef void scale(self, float c) noexcept nogil:
        """Scales the weight vector by a constant ``c``.

        It updates ``wscale`` and ``sq_norm``. If ``wscale`` gets too
        small we call ``reset_swcale``."""
        self.wscale *= c
        self.sq_norm *= (c * c)

        if self.wscale < 1e-06:
            self.reset_wscale()

    cdef void reset_wscale(self) noexcept nogil:
        """Scales each coef of ``w`` by ``wscale`` and resets it to 1. """
        if self.aw_data_ptr != NULL:
            _axpy(self.n_features, self.average_a,
                  self.w_data_ptr, 1, self.aw_data_ptr, 1)
            _scal(self.n_features, 1.0 / self.average_b, self.aw_data_ptr, 1)
            self.average_a = 0.0
            self.average_b = 1.0

        _scal(self.n_features, self.wscale, self.w_data_ptr, 1)
        self.wscale = 1.0

    cdef float norm(self) noexcept nogil:
        """The L2 norm of the weight vector. """
        return sqrt(self.sq_norm)

_seq_dataset.pyx

"""Dataset abstractions for sequential data access."""

import numpy as np

cimport cython
from libc.limits cimport INT_MAX

from ._random cimport our_rand_r
from ._typedefs cimport float32_t, float64_t, uint32_t

#------------------------------------------------------------------------------

cdef class SequentialDataset64:
    """Base class for datasets with sequential data access.

    SequentialDataset is used to iterate over the rows of a matrix X and
    corresponding target values y, i.e. to iterate over samples.
    There are two methods to get the next sample:
        - next : Iterate sequentially (optionally randomized)
        - random : Iterate randomly (with replacement)

    Attributes
    ----------
    index : np.ndarray
        Index array for fast shuffling.

    index_data_ptr : int
        Pointer to the index array.

    current_index : int
        Index of current sample in ``index``.
        The index of current sample in the data is given by
        index_data_ptr[current_index].

    n_samples : Py_ssize_t
        Number of samples in the dataset.

    seed : uint32_t
        Seed used for random sampling. This attribute is modified at each call to the
        `random` method.
    """

    cdef void next(self, float64_t **x_data_ptr, int **x_ind_ptr,
                   int *nnz, float64_t *y, float64_t *sample_weight) noexcept nogil:
        """Get the next example ``x`` from the dataset.

        This method gets the next sample looping sequentially over all samples.
        The order can be shuffled with the method ``shuffle``.
        Shuffling once before iterating over all samples corresponds to a
        random draw without replacement. It is used for instance in SGD solver.

        Parameters
        ----------
        x_data_ptr : float64_t**
            A pointer to the float64_t array which holds the feature
            values of the next example.

        x_ind_ptr : np.intc**
            A pointer to the int array which holds the feature
            indices of the next example.

        nnz : int*
            A pointer to an int holding the number of non-zero
            values of the next example.

        y : float64_t*
            The target value of the next example.

        sample_weight : float64_t*
            The weight of the next example.
        """
        cdef int current_index = self._get_next_index()
        self._sample(x_data_ptr, x_ind_ptr, nnz, y, sample_weight,
                     current_index)

    cdef int random(self, float64_t **x_data_ptr, int **x_ind_ptr,
                    int *nnz, float64_t *y, float64_t *sample_weight) noexcept nogil:
        """Get a random example ``x`` from the dataset.

        This method gets next sample chosen randomly over a uniform
        distribution. It corresponds to a random draw with replacement.
        It is used for instance in SAG solver.

        Parameters
        ----------
        x_data_ptr : float64_t**
            A pointer to the float64_t array which holds the feature
            values of the next example.

        x_ind_ptr : np.intc**
            A pointer to the int array which holds the feature
            indices of the next example.

        nnz : int*
            A pointer to an int holding the number of non-zero
            values of the next example.

        y : float64_t*
            The target value of the next example.

        sample_weight : float64_t*
            The weight of the next example.

        Returns
        -------
        current_index : int
            Index of current sample.
        """
        cdef int current_index = self._get_random_index()
        self._sample(x_data_ptr, x_ind_ptr, nnz, y, sample_weight,
                     current_index)
        return current_index

    cdef void shuffle(self, uint32_t seed) noexcept nogil:
        """Permutes the ordering of examples."""
        # Fisher-Yates shuffle
        cdef int *ind = self.index_data_ptr
        cdef int n = self.n_samples
        cdef unsigned i, j
        for i in range(n - 1):
            j = i + our_rand_r(&seed) % (n - i)
            ind[i], ind[j] = ind[j], ind[i]

    cdef int _get_next_index(self) noexcept nogil:
        cdef int current_index = self.current_index
        if current_index >= (self.n_samples - 1):
            current_index = -1

        current_index += 1
        self.current_index = current_index
        return self.current_index

    cdef int _get_random_index(self) noexcept nogil:
        cdef int n = self.n_samples
        cdef int current_index = our_rand_r(&self.seed) % n
        self.current_index = current_index
        return current_index

    cdef void _sample(self, float64_t **x_data_ptr, int **x_ind_ptr,
                      int *nnz, float64_t *y, float64_t *sample_weight,
                      int current_index) noexcept nogil:
        pass

    def _shuffle_py(self, uint32_t seed):
        """python function used for easy testing"""
        self.shuffle(seed)

    def _next_py(self):
        """python function used for easy testing"""
        cdef int current_index = self._get_next_index()
        return self._sample_py(current_index)

    def _random_py(self):
        """python function used for easy testing"""
        cdef int current_index = self._get_random_index()
        return self._sample_py(current_index)

    def _sample_py(self, int current_index):
        """python function used for easy testing"""
        cdef float64_t* x_data_ptr
        cdef int* x_indices_ptr
        cdef int nnz, j
        cdef float64_t y, sample_weight

        # call _sample in cython
        self._sample(&x_data_ptr, &x_indices_ptr, &nnz, &y, &sample_weight,
                     current_index)

        # transform the pointed data in numpy CSR array
        cdef float64_t[:] x_data = np.empty(nnz, dtype=np.float64)
        cdef int[:] x_indices = np.empty(nnz, dtype=np.int32)
        cdef int[:] x_indptr = np.asarray([0, nnz], dtype=np.int32)

        for j in range(nnz):
            x_data[j] = x_data_ptr[j]
            x_indices[j] = x_indices_ptr[j]

        cdef int sample_idx = self.index_data_ptr[current_index]

        return (
            (np.asarray(x_data), np.asarray(x_indices), np.asarray(x_indptr)),
            y,
            sample_weight,
            sample_idx,
        )


cdef class ArrayDataset64(SequentialDataset64):
    """Dataset backed by a two-dimensional numpy array.

    The dtype of the numpy array is expected to be ``np.float64`` (float64_t)
    and C-style memory layout.
    """

    def __cinit__(
        self,
        const float64_t[:, ::1] X,
        const float64_t[::1] Y,
        const float64_t[::1] sample_weights,
        uint32_t seed=1,
    ):
        """A ``SequentialDataset`` backed by a two-dimensional numpy array.

        Parameters
        ----------
        X : ndarray, dtype=float64_t, ndim=2, mode='c'
            The sample array, of shape(n_samples, n_features)

        Y : ndarray, dtype=float64_t, ndim=1, mode='c'
            The target array, of shape(n_samples, )

        sample_weights : ndarray, dtype=float64_t, ndim=1, mode='c'
            The weight of each sample, of shape(n_samples,)
        """
        if X.shape[0] > INT_MAX or X.shape[1] > INT_MAX:
            raise ValueError("More than %d samples or features not supported;"
                             " got (%d, %d)."
                             % (INT_MAX, X.shape[0], X.shape[1]))

        # keep a reference to the data to prevent garbage collection
        self.X = X
        self.Y = Y
        self.sample_weights = sample_weights

        self.n_samples = X.shape[0]
        self.n_features = X.shape[1]

        self.feature_indices = np.arange(0, self.n_features, dtype=np.intc)
        self.feature_indices_ptr = <int *> &self.feature_indices[0]

        self.current_index = -1
        self.X_stride = X.strides[0] // X.itemsize
        self.X_data_ptr = <float64_t *> &X[0, 0]
        self.Y_data_ptr = <float64_t *> &Y[0]
        self.sample_weight_data = <float64_t *> &sample_weights[0]

        # Use index array for fast shuffling
        self.index = np.arange(0, self.n_samples, dtype=np.intc)
        self.index_data_ptr = <int *> &self.index[0]
        # seed should not be 0 for our_rand_r
        self.seed = max(seed, 1)

    cdef void _sample(self, float64_t **x_data_ptr, int **x_ind_ptr,
                      int *nnz, float64_t *y, float64_t *sample_weight,
                      int current_index) noexcept nogil:
        cdef long long sample_idx = self.index_data_ptr[current_index]
        cdef long long offset = sample_idx * self.X_stride

        y[0] = self.Y_data_ptr[sample_idx]
        x_data_ptr[0] = self.X_data_ptr + offset
        x_ind_ptr[0] = self.feature_indices_ptr
        nnz[0] = self.n_features
        sample_weight[0] = self.sample_weight_data[sample_idx]


cdef class CSRDataset64(SequentialDataset64):
    """A ``SequentialDataset`` backed by a scipy sparse CSR matrix. """

    def __cinit__(
        self,
        const float64_t[::1] X_data,
        const int[::1] X_indptr,
        const int[::1] X_indices,
        const float64_t[::1] Y,
        const float64_t[::1] sample_weights,
        uint32_t seed=1,
    ):
        """Dataset backed by a scipy sparse CSR matrix.

        The feature indices of ``x`` are given by x_ind_ptr[0:nnz].
        The corresponding feature values are given by
        x_data_ptr[0:nnz].

        Parameters
        ----------
        X_data : ndarray, dtype=float64_t, ndim=1, mode='c'
            The data array of the CSR features matrix.

        X_indptr : ndarray, dtype=np.intc, ndim=1, mode='c'
            The index pointer array of the CSR features matrix.

        X_indices : ndarray, dtype=np.intc, ndim=1, mode='c'
            The column indices array of the CSR features matrix.

        Y : ndarray, dtype=float64_t, ndim=1, mode='c'
            The target values.

        sample_weights : ndarray, dtype=float64_t, ndim=1, mode='c'
            The weight of each sample.
        """
        # keep a reference to the data to prevent garbage collection
        self.X_data = X_data
        self.X_indptr = X_indptr
        self.X_indices = X_indices
        self.Y = Y
        self.sample_weights = sample_weights

        self.n_samples = Y.shape[0]
        self.current_index = -1
        self.X_data_ptr = <float64_t *> &X_data[0]
        self.X_indptr_ptr = <int *> &X_indptr[0]
        self.X_indices_ptr = <int *> &X_indices[0]

        self.Y_data_ptr = <float64_t *> &Y[0]
        self.sample_weight_data = <float64_t *> &sample_weights[0]

        # Use index array for fast shuffling
        self.index = np.arange(self.n_samples, dtype=np.intc)
        self.index_data_ptr = <int *> &self.index[0]
        # seed should not be 0 for our_rand_r
        self.seed = max(seed, 1)

    cdef void _sample(self, float64_t **x_data_ptr, int **x_ind_ptr,
                      int *nnz, float64_t *y, float64_t *sample_weight,
                      int current_index) noexcept nogil:
        cdef long long sample_idx = self.index_data_ptr[current_index]
        cdef long long offset = self.X_indptr_ptr[sample_idx]
        y[0] = self.Y_data_ptr[sample_idx]
        x_data_ptr[0] = self.X_data_ptr + offset
        x_ind_ptr[0] = self.X_indices_ptr + offset
        nnz[0] = self.X_indptr_ptr[sample_idx + 1] - offset
        sample_weight[0] = self.sample_weight_data[sample_idx]


#------------------------------------------------------------------------------

cdef class SequentialDataset32:
    """Base class for datasets with sequential data access.

    SequentialDataset is used to iterate over the rows of a matrix X and
    corresponding target values y, i.e. to iterate over samples.
    There are two methods to get the next sample:
        - next : Iterate sequentially (optionally randomized)
        - random : Iterate randomly (with replacement)

    Attributes
    ----------
    index : np.ndarray
        Index array for fast shuffling.

    index_data_ptr : int
        Pointer to the index array.

    current_index : int
        Index of current sample in ``index``.
        The index of current sample in the data is given by
        index_data_ptr[current_index].

    n_samples : Py_ssize_t
        Number of samples in the dataset.

    seed : uint32_t
        Seed used for random sampling. This attribute is modified at each call to the
        `random` method.
    """

    cdef void next(self, float32_t **x_data_ptr, int **x_ind_ptr,
                   int *nnz, float32_t *y, float32_t *sample_weight) noexcept nogil:
        """Get the next example ``x`` from the dataset.

        This method gets the next sample looping sequentially over all samples.
        The order can be shuffled with the method ``shuffle``.
        Shuffling once before iterating over all samples corresponds to a
        random draw without replacement. It is used for instance in SGD solver.

        Parameters
        ----------
        x_data_ptr : float32_t**
            A pointer to the float32_t array which holds the feature
            values of the next example.

        x_ind_ptr : np.intc**
            A pointer to the int array which holds the feature
            indices of the next example.

        nnz : int*
            A pointer to an int holding the number of non-zero
            values of the next example.

        y : float32_t*
            The target value of the next example.

        sample_weight : float32_t*
            The weight of the next example.
        """
        cdef int current_index = self._get_next_index()
        self._sample(x_data_ptr, x_ind_ptr, nnz, y, sample_weight,
                     current_index)

    cdef int random(self, float32_t **x_data_ptr, int **x_ind_ptr,
                    int *nnz, float32_t *y, float32_t *sample_weight) noexcept nogil:
        """Get a random example ``x`` from the dataset.

        This method gets next sample chosen randomly over a uniform
        distribution. It corresponds to a random draw with replacement.
        It is used for instance in SAG solver.

        Parameters
        ----------
        x_data_ptr : float32_t**
            A pointer to the float32_t array which holds the feature
            values of the next example.

        x_ind_ptr : np.intc**
            A pointer to the int array which holds the feature
            indices of the next example.

        nnz : int*
            A pointer to an int holding the number of non-zero
            values of the next example.

        y : float32_t*
            The target value of the next example.

        sample_weight : float32_t*
            The weight of the next example.

        Returns
        -------
        current_index : int
            Index of current sample.
        """
        cdef int current_index = self._get_random_index()
        self._sample(x_data_ptr, x_ind_ptr, nnz, y, sample_weight,
                     current_index)
        return current_index

    cdef void shuffle(self, uint32_t seed) noexcept nogil:
        """Permutes the ordering of examples."""
        # Fisher-Yates shuffle
        cdef int *ind = self.index_data_ptr
        cdef int n = self.n_samples
        cdef unsigned i, j
        for i in range(n - 1):
            j = i + our_rand_r(&seed) % (n - i)
            ind[i], ind[j] = ind[j], ind[i]

    cdef int _get_next_index(self) noexcept nogil:
        cdef int current_index = self.current_index
        if current_index >= (self.n_samples - 1):
            current_index = -1

        current_index += 1
        self.current_index = current_index
        return self.current_index

    cdef int _get_random_index(self) noexcept nogil:
        cdef int n = self.n_samples
        cdef int current_index = our_rand_r(&self.seed) % n
        self.current_index = current_index
        return current_index

    cdef void _sample(self, float32_t **x_data_ptr, int **x_ind_ptr,
                      int *nnz, float32_t *y, float32_t *sample_weight,
                      int current_index) noexcept nogil:
        pass

    def _shuffle_py(self, uint32_t seed):
        """python function used for easy testing"""
        self.shuffle(seed)

    def _next_py(self):
        """python function used for easy testing"""
        cdef int current_index = self._get_next_index()
        return self._sample_py(current_index)

    def _random_py(self):
        """python function used for easy testing"""
        cdef int current_index = self._get_random_index()
        return self._sample_py(current_index)

    def _sample_py(self, int current_index):
        """python function used for easy testing"""
        cdef float32_t* x_data_ptr
        cdef int* x_indices_ptr
        cdef int nnz, j
        cdef float32_t y, sample_weight

        # call _sample in cython
        self._sample(&x_data_ptr, &x_indices_ptr, &nnz, &y, &sample_weight,
                     current_index)

        # transform the pointed data in numpy CSR array
        cdef float32_t[:] x_data = np.empty(nnz, dtype=np.float32)
        cdef int[:] x_indices = np.empty(nnz, dtype=np.int32)
        cdef int[:] x_indptr = np.asarray([0, nnz], dtype=np.int32)

        for j in range(nnz):
            x_data[j] = x_data_ptr[j]
            x_indices[j] = x_indices_ptr[j]

        cdef int sample_idx = self.index_data_ptr[current_index]

        return (
            (np.asarray(x_data), np.asarray(x_indices), np.asarray(x_indptr)),
            y,
            sample_weight,
            sample_idx,
        )


cdef class ArrayDataset32(SequentialDataset32):
    """Dataset backed by a two-dimensional numpy array.

    The dtype of the numpy array is expected to be ``np.float32`` (float32_t)
    and C-style memory layout.
    """

    def __cinit__(
        self,
        const float32_t[:, ::1] X,
        const float32_t[::1] Y,
        const float32_t[::1] sample_weights,
        uint32_t seed=1,
    ):
        """A ``SequentialDataset`` backed by a two-dimensional numpy array.

        Parameters
        ----------
        X : ndarray, dtype=float32_t, ndim=2, mode='c'
            The sample array, of shape(n_samples, n_features)

        Y : ndarray, dtype=float32_t, ndim=1, mode='c'
            The target array, of shape(n_samples, )

        sample_weights : ndarray, dtype=float32_t, ndim=1, mode='c'
            The weight of each sample, of shape(n_samples,)
        """
        if X.shape[0] > INT_MAX or X.shape[1] > INT_MAX:
            raise ValueError("More than %d samples or features not supported;"
                             " got (%d, %d)."
                             % (INT_MAX, X.shape[0], X.shape[1]))

        # keep a reference to the data to prevent garbage collection
        self.X = X
        self.Y = Y
        self.sample_weights = sample_weights

        self.n_samples = X.shape[0]
        self.n_features = X.shape[1]

        self.feature_indices = np.arange(0, self.n_features, dtype=np.intc)
        self.feature_indices_ptr = <int *> &self.feature_indices[0]

        self.current_index = -1
        self.X_stride = X.strides[0] // X.itemsize
        self.X_data_ptr = <float32_t *> &X[0, 0]
        self.Y_data_ptr = <float32_t *> &Y[0]
        self.sample_weight_data = <float32_t *> &sample_weights[0]

        # Use index array for fast shuffling
        self.index = np.arange(0, self.n_samples, dtype=np.intc)
        self.index_data_ptr = <int *> &self.index[0]
        # seed should not be 0 for our_rand_r
        self.seed = max(seed, 1)

    cdef void _sample(self, float32_t **x_data_ptr, int **x_ind_ptr,
                      int *nnz, float32_t *y, float32_t *sample_weight,
                      int current_index) noexcept nogil:
        cdef long long sample_idx = self.index_data_ptr[current_index]
        cdef long long offset = sample_idx * self.X_stride

        y[0] = self.Y_data_ptr[sample_idx]
        x_data_ptr[0] = self.X_data_ptr + offset
        x_ind_ptr[0] = self.feature_indices_ptr
        nnz[0] = self.n_features
        sample_weight[0] = self.sample_weight_data[sample_idx]


cdef class CSRDataset32(SequentialDataset32):
    """A ``SequentialDataset`` backed by a scipy sparse CSR matrix. """

    def __cinit__(
        self,
        const float32_t[::1] X_data,
        const int[::1] X_indptr,
        const int[::1] X_indices,
        const float32_t[::1] Y,
        const float32_t[::1] sample_weights,
        uint32_t seed=1,
    ):
        """Dataset backed by a scipy sparse CSR matrix.

        The feature indices of ``x`` are given by x_ind_ptr[0:nnz].
        The corresponding feature values are given by
        x_data_ptr[0:nnz].

        Parameters
        ----------
        X_data : ndarray, dtype=float32_t, ndim=1, mode='c'
            The data array of the CSR features matrix.

        X_indptr : ndarray, dtype=np.intc, ndim=1, mode='c'
            The index pointer array of the CSR features matrix.

        X_indices : ndarray, dtype=np.intc, ndim=1, mode='c'
            The column indices array of the CSR features matrix.

        Y : ndarray, dtype=float32_t, ndim=1, mode='c'
            The target values.

        sample_weights : ndarray, dtype=float32_t, ndim=1, mode='c'
            The weight of each sample.
        """
        # keep a reference to the data to prevent garbage collection
        self.X_data = X_data
        self.X_indptr = X_indptr
        self.X_indices = X_indices
        self.Y = Y
        self.sample_weights = sample_weights

        self.n_samples = Y.shape[0]
        self.current_index = -1
        self.X_data_ptr = <float32_t *> &X_data[0]
        self.X_indptr_ptr = <int *> &X_indptr[0]
        self.X_indices_ptr = <int *> &X_indices[0]

        self.Y_data_ptr = <float32_t *> &Y[0]
        self.sample_weight_data = <float32_t *> &sample_weights[0]

        # Use index array for fast shuffling
        self.index = np.arange(self.n_samples, dtype=np.intc)
        self.index_data_ptr = <int *> &self.index[0]
        # seed should not be 0 for our_rand_r
        self.seed = max(seed, 1)

    cdef void _sample(self, float32_t **x_data_ptr, int **x_ind_ptr,
                      int *nnz, float32_t *y, float32_t *sample_weight,
                      int current_index) noexcept nogil:
        cdef long long sample_idx = self.index_data_ptr[current_index]
        cdef long long offset = self.X_indptr_ptr[sample_idx]
        y[0] = self.Y_data_ptr[sample_idx]
        x_data_ptr[0] = self.X_data_ptr + offset
        x_ind_ptr[0] = self.X_indices_ptr + offset
        nnz[0] = self.X_indptr_ptr[sample_idx + 1] - offset
        sample_weight[0] = self.sample_weight_data[sample_idx]


0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?