1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

素のPHPでTwitterの文字数カウントを再現

Last updated at Posted at 2020-11-03

##経緯
一括ツイートサービスの制作に際して「全角:1/半角と改行:0.5/URL:11.5」という、公式の文字数カウントに近いPHPコードを模索したので共有させて頂きます。

##コード

  • 全角を1文字/半角を0.5文字とカウント
count.php
function twiCount( $text ){
  $wholeLen = strlen(mb_convert_encoding($text, "SJIS", "ASCII,JIS,UTF-8,EUC-JP,SJIS")) / 2;
  return $wholeLen;
}
  • 改行を0.5文字とカウント
count.php
function twiCount( $text ){
  $minus_lf = 0;
  $tempWholeLen = strlen(mb_convert_encoding($text, "SJIS", "ASCII,JIS,UTF-8,EUC-JP,SJIS")) / 2;
  //改行の数を検出
  preg_match_all("\n|\r\n|\r", $text, $matches, PREG_PATTERN_ORDER);
  $lfCount = count($matches[0]);
  $minus_lf += ($lfCount/2);//改行1つを0.5文字として扱う
  $modifiedWholeLen = $tempWholeLen - $minus_lf;//減算
  return $modifiedWholeLen;
}
  • リンクを11.5文字とカウント
count.php
function twiCount( $text ){
  $minus_lf = 0;
  $tempWholeLen = strlen(mb_convert_encoding($text, "SJIS", "ASCII,JIS,UTF-8,EUC-JP,SJIS")) / 2;
  preg_match_all("\n|\r\n|\r", $text, $matches, PREG_PATTERN_ORDER);
  $lfCount = count($matches[0]);
  $minus_lf += ($lfCount/2);
  //リンクの数を検出
  $regex = "/https?:\/\/[\w!\?\/\+\-_~=;\.,\*&@#\$%\(\)'\[\]]+/";
  preg_match_all($regex, $text, $matches2, PREG_PATTERN_ORDER);
  $urls = $matches2[0];//値だけの配列
  if( !empty($urls) ){
    foreach($urls as $key => $url){//検出したURLを11.5文字としてカウント
      $urlLen = strlen( mb_convert_encoding($url, "SJIS", "ASCII,JIS,UTF-8,EUC-JP,SJIS" ) ) / 2;
      $minus_url += $urlLen - 11.5;//余分にカウントしていた分をマイナス
    }
  }
  //まとめて引く
  $modifiedWholeLen = $tempWholeLen - ($minus_lf + $minus_url);
  return $modifiedWholeLen;
}

##参考記事
実際のURL検出には以下の記事の$regexを使わせて頂きました。

サンプル

$regex = 
    '`\bhttps?+:(?://(?:(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]' .
    '|[!$&-,:;=])*+@)?+(?:\[(?:(?:[0-9a-f]{1,4}:){6}(?:' .
    '[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:\d|[1-9]\d|1\d{2}|2' .
    '[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25' .
    '[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?' .
    ':\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]))|::(?:[0-9a-f' .
    ']{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:\d|[1' .
    '-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{' .
    '2}|2[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\\' .
    'd|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])' .
    ')|(?:[0-9a-f]{1,4})?+::(?:[0-9a-f]{1,4}:){4}(?:[0-' .
    '9a-f]{1,4}:[0-9a-f]{1,4}|(?:\d|[1-9]\d|1\d{2}|2[0-' .
    '4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-' .
    '5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d' .
    '|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]))|(?:(?:[0-9a-f]{' .
    '1,4}:)?+[0-9a-f]{1,4})?+::(?:[0-9a-f]{1,4}:){3}(?:' .
    '[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:\d|[1-9]\d|1\d{2}|2' .
    '[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25' .
    '[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?' .
    ':\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]))|(?:(?:[0-9a-' .
    'f]{1,4}:){0,2}[0-9a-f]{1,4})?+::(?:[0-9a-f]{1,4}:)' .
    '{2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:\d|[1-9]\d|1\\' .
    'd{2}|2[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4' .
    ']\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5' .
    '])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]))|(?:(?:' .
    '[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?+::[0-9a-f]{1,4' .
    '}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:\d|[1-9]\d|1\d' .
    '{2}|2[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]' .
    '\d|25[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]' .
    ')\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]))|(?:(?:[' .
    '0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?+::(?:[0-9a-f]{1' .
    ',4}:[0-9a-f]{1,4}|(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25' .
    '[0-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?' .
    ':\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d|[1-9]\\' .
    'd|1\d{2}|2[0-4]\d|25[0-5]))|(?:(?:[0-9a-f]{1,4}:){' .
    '0,5}[0-9a-f]{1,4})?+::[0-9a-f]{1,4}|(?:(?:[0-9a-f]' .
    '{1,4}:){0,6}[0-9a-f]{1,4})?+::|v[0-9a-f]++\.[!$&-.' .
    '0-;=_a-z~]++)\]|(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0' .
    '-5])\.(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?:\\' .
    'd|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d|[1-9]\d|' .
    '1\d{2}|2[0-4]\d|25[0-5])|(?:[-.0-9_a-z~]|%[0-9a-f]' .
    '[0-9a-f]|[!$&-,;=])*+)(?::\d*+)?+(?:/(?:[-.0-9_a-z' .
    '~]|%[0-9a-f][0-9a-f]|[!$&-,:;=@])*+)*+|/(?:(?:[-.0' .
    '-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,:;=@])++(?:/(?:[-' .
    '.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,:;=@])*+)*+)?+|' .
    '(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,:;=@])++(?' .
    ':/(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,:;=@])*+' .
    ')*+)?+(?:\?+(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&' .
    '-,/:;=?+@])*+)?+(?:#(?:[-.0-9_a-z~]|%[0-9a-f][0-9a' .
    '-f]|[!$&-,/:;=?+@])*+)?+`i'
;

##公式との齟齬
・プロトコル省略URLの非検出(例:hoge.jp)
・日本語URLの非検出(例:https://ほげ.jp)
・存在しないドメインのリンクを検出(例:https://hoge.jphoge
・一部の全角文字を公式は0.5文字と判定(例:「―」)


1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?