Help us understand the problem. What is going on with this article?

【困っています】PHP on Windowsでダメ文字を含んだパスの扱い方を教えてください

More than 3 years have passed since last update.

達成したいこと

  • PHP on Windowsでダメ文字を含んだパスで simplexml_load_file() を使いたい←ココ大事 $xml = new SimpleXMLElement(@file_get_contents($addslash_pass)でパースできました
  • PHP on Windowsでダメ文字を含んだパスで file_exists() などを使いたい @fopen($addslash_pass, 'r') で代替できました
  • PHP on Windowsでダメ文字を含んだパスで is_dir() などを使いたい

*$addslash_pass = $addslashes(mb_convert_encoding($xmlPath, "cp932", "utf-8"))

実行環境

PHP5.6.3
Windows 7, Windows Embeded Standard

Xampp(xampp-win32-5.6.3-0-VC11-installer)を使用しています。

現状

以下に行った調査を記します。

  1. パスにエンコード変換を噛ませない
  2. mb_convert_encoding() を行う
  3. mb_convert_encoding() を行い、addslashes() も行う

テスト時には上記の3つのテストを行ったのですが、1, 2 については順当な結果だと思われるのでここでは表の掲載を省きます。詳しい手順については、下記のテスト・コードをご覧ください。

以下結果。困っているのはアスタリスクで注釈を付けている点です。
いわゆる5C問題、ダメ文字で躓いています。
自分のコードが誤っているのか、イディオム的な書き方があるのか、PHPとShift-JISのパスの相性が悪いのかわからず、困っています。助けてください。

ファイルの扱い(addslashes + mb_convert_encoding)

is_file() file_exists() simplexml_load_file() copy() fopen()
sample.xml T T T T T
ユニコード文字.xml T T T T*2 T
ダメ文字(貼).xml F*1 F*1 F*1 T*2, 3 T*3

ディレクトリの扱い(addslashes + mb_convert_encoding)

dir() is_dir() scandir
sample_folder T T T
ユニコード文字_folder T T T
ダメ文字(貼)_folder F*4 F*4 F*4

1, 4 ここがfalseになるのが困る
2 おかしなファイル名で複製されるので、厳密にはtrueではない
3 ここがtrueになる理由がわからない

テスト・コード

<?php

header( 'Content-Type: text/html; charset=utf-8' );

if ( setlocale( LC_ALL, '' ) === false )
{
    echo "setlocale failed.";
    echo "<br/>";
}

echo "OS info: ", php_uname();
echo "<br/>";
echo "PHP version: ", phpversion();
echo "<br/>";
echo "locale: ", setlocale( LC_ALL, 0 );
echo "<br/>";
echo "mb_get_info: ", var_dump( mb_get_info() );
echo "<br/>";
echo "<br/>";


echo "<h2>ファイル読み取り時の、パスの有効性調査(is_file, file_exists, simplexml_load_file)</h2>";

$paths = array(
    "sample.xml",
    "ユニコード文字.xml",
    "ダメ文字(貼).xml",
);

foreach( $paths as $path )
{
    echo "<h3>'$path' を読み取る</h3>";

    echo '1. エンコーディング変換などは行わない';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> is_file: ", var_dump( is_file( $path ) );
    echo "<br/>";
    echo $path, " -> file_exists: ", var_dump( file_exists( $path ) );
    echo "<br/>";
    echo $path, " -> simplexml_load_file: ", var_dump( @simplexml_load_file( $path ) );
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";

    $sjis_path = mb_convert_encoding( $path, "cp932", "utf-8" );

    echo '2. mb_convert_encoding( $path ) を行う';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> is_file: ", var_dump( is_file( $sjis_path ) );
    echo "<br/>";
    echo $path, " -> file_exists: ", var_dump( file_exists( $sjis_path ) );
    echo "<br/>";
    echo $path, " -> simplexml_load_file: ", var_dump( @simplexml_load_file( $sjis_path ) );
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";

    $addslashes_path = addslashes( mb_convert_encoding( $path, "cp932", "utf-8" ) );

    echo '3. addslashes( mb_convert_encoding( $path ) ) を行う';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> is_file: ", var_dump( is_file( $addslashes_path ) );
    echo "<br/>";
    echo $path, " -> file_exists: ", var_dump( file_exists( $addslashes_path ) );
    echo "<br/>";
    echo $path, " -> simplexml_load_file: ", var_dump( @simplexml_load_file( $addslashes_path ) );
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";
}


echo "<h2>ファイル読み取り時の、パスの有効性調査(copy, fopen)</h2>";

foreach( $paths as $path )
{
    echo "<h3>'$path' を読み取る</h3>";

    echo '1. エンコーディング変換などは行わない';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> copy: ", var_dump( @copy( $path, "copied_".$path."1" ) );
    echo "<br/>";
    echo $path, " -> fopen: ", var_dump( $fp = @fopen( $path, 'r' ) ); if ( $fp ) { fclose($fp); }
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";

    $sjis_path = mb_convert_encoding( $path, "cp932", "utf-8" );

    echo '2. mb_convert_encoding( $path ) を行う';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> copy: ", var_dump( @copy( $sjis_path, "copied_".$path."2" ) );
    echo "<br/>";
    echo $path, " -> fopen: ", var_dump( $fp = @fopen( $sjis_path, 'r' ) ); if ( $fp ) { fclose($fp); }
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";

    $addslashes_path = addslashes( mb_convert_encoding( $path, "cp932", "utf-8" ) );

    echo '3. addslashes( mb_convert_encoding( $path ) ) を行う';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> copy: ", var_dump( copy( $addslashes_path, "copied_".$path."3" ) );
    echo "<br/>";
    echo $path, " -> fopen: ", var_dump( $fp = fopen( $addslashes_path, 'r' ) ); if ( $fp ) { fclose($fp); }
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";
}


echo "<h2>ディレクトリ読み取り時の、パスの有効性調査(dir, is_dir, scandir)</h2>";

$paths = array(
    "C:\\xampp\\htdocs\\multibyte_5c_path\\sample_folder",
    "C:\\xampp\\htdocs\\multibyte_5c_path\\ユニコード文字_folder",
    "C:\\xampp\\htdocs\\multibyte_5c_path\\ダメ文字(貼)_folder",
);

foreach( $paths as $path )
{
    echo "<h3>'$path' を読み取る</h3>";

    echo '1. エンコーディング変換などは行わない';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> dir: ", var_dump( @dir( $path ) );
    echo "<br/>";
    echo $path, " -> is_dir: ", var_dump( @is_dir( $path ) );
    echo "<br/>";
    echo $path, " -> scandir: ", var_dump( @scandir( $path ) );
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";

    $sjis_path = mb_convert_encoding( $path, "cp932", "utf-8" );

    echo '2. mb_convert_encoding( $path ) を行う';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> dir: ", var_dump( @dir( $sjis_path ) );
    echo "<br/>";
    echo $path, " -> is_dir: ", var_dump( @is_dir( $sjis_path ) );
    echo "<br/>";
    echo $path, " -> scandir: ", var_dump( @scandir( $sjis_path ) );
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";

    $addslashes_path = addslashes( mb_convert_encoding( $path, "cp932", "utf-8" ) );

    echo '3. addslashes( mb_convert_encoding( $path ) ) を行う';
    echo "<br/>";
    echo "<br/>";
    echo $path, " -> dir: ", var_dump( @dir( $addslashes_path ) );
    echo "<br/>";
    echo $path, " -> is_dir: ", var_dump( @is_dir( $addslashes_path ) );
    echo "<br/>";
    echo $path, " -> scandir: ", var_dump( @scandir( $addslashes_path ) );
    echo "<br/>";
    echo "<br/>";
    echo "<br/>";
}

?>
OS info: Windows NT ADMIN-PC 6.1 build 7601 (Windows 7 Business Edition Service Pack 1) i586
PHP version: 5.6.3
locale: Japanese_Japan.932
mb_get_info: array(14) { ["internal_encoding"]=> string(5) "UTF-8" ["http_output"]=> string(5) "UTF-8" ["http_output_conv_mimetypes"]=> string(31) "^(text/|application/xhtml\+xml)" ["func_overload"]=> int(0) ["func_overload_list"]=> string(11) "no overload" ["mail_charset"]=> string(5) "UTF-8" ["mail_header_encoding"]=> string(6) "BASE64" ["mail_body_encoding"]=> string(6) "BASE64" ["illegal_chars"]=> int(0) ["encoding_translation"]=> string(3) "Off" ["language"]=> string(7) "neutral" ["detect_order"]=> array(2) { [0]=> string(5) "ASCII" [1]=> string(5) "UTF-8" } ["substitute_character"]=> int(63) ["strict_detection"]=> string(3) "Off" }

...
...
...
以下省略
...
...
...
Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away