More than 3 years have passed since last update.

Breast Cancer Datasetの列を入れ替える

Last updated at 2021-12-10Posted at 2021-12-10

https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
からデータを入出する


> wget https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data


StatusCode        : 200
StatusDescription : OK
Content           : {56, 52, 50, 51...}
RawContent        : HTTP/1.1 200 OK
                    Accept-Ranges: bytes
                    Content-Length: 124103
                    Content-Type: application/x-httpd-php
                    Date: Fri, 10 Dec 2021 09:53:46 GMT
                    ETag: "1e4c7-2ed02fd9c0780"
                    Last-Modified: Mon, 05 Feb 1996 ...
Headers           : {[Accept-Ranges, bytes], [Content-Length, 124103], [Content-Type, application/x-httpd-php], [Date,
                    Fri, 10 Dec 2021 09:53:46 GMT]...}
RawContentLength  : 124103

Anacondaでwgetでファイルがとってこれない場合があるので、その場合はGUIからダウンロードするか、この辺りを参照する。

↓このようなデータ構成になっているので、２列目の M (malignant) を一番後ろの列に持っていきたい場合がある。


> more .\wdbc.data
842302,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189

このようなプログラムを書く


import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

import sys
args = sys.argv 

df = pd.read_csv('wdbc.data', index_col=0)

print(len(df))
print(len(df.columns))

dataExchange = df.iloc[:, [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,0]]
print(dataExchange)

dataExchange.to_csv('tmp.csv')

↓　このラインで２列目のM(またはB）を最後の列に持ってきている。


dataExchange = df.iloc[:, [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,0]]

結果


> more .\tmp.csv
842302,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,M

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up