https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
からデータを入出する
> wget https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data
StatusCode : 200
StatusDescription : OK
Content : {56, 52, 50, 51...}
RawContent : HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 124103
Content-Type: application/x-httpd-php
Date: Fri, 10 Dec 2021 09:53:46 GMT
ETag: "1e4c7-2ed02fd9c0780"
Last-Modified: Mon, 05 Feb 1996 ...
Headers : {[Accept-Ranges, bytes], [Content-Length, 124103], [Content-Type, application/x-httpd-php], [Date,
Fri, 10 Dec 2021 09:53:46 GMT]...}
RawContentLength : 124103
Anacondaでwgetでファイルがとってこれない場合があるので、その場合はGUIからダウンロードするか、この辺りを参照する。
↓このようなデータ構成になっているので、2列目の M (malignant) を一番後ろの列に持っていきたい場合がある。
> more .\wdbc.data
842302,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
このようなプログラムを書く
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import sys
args = sys.argv
df = pd.read_csv('wdbc.data', index_col=0)
print(len(df))
print(len(df.columns))
dataExchange = df.iloc[:, [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,0]]
print(dataExchange)
dataExchange.to_csv('tmp.csv')
↓ このラインで2列目のM(またはB)を最後の列に持ってきている。
dataExchange = df.iloc[:, [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,0]]
結果
> more .\tmp.csv
842302,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,M
参考