ã¯ããã«ð€
Pythonã§èªç¶èšèªåŠçãè©Šããšãã«äœ¿ãããð€ Transformersãšããã¢ãžã¥ãŒã«ããããŸãã
åã¯ãã®äžã®PEGASUSãšããæç« èŠçŽã¿ã¹ã¯çšã®åŠç¿æžã¿ã¢ãã«ãå©çšããããšãããã®ã§ãããä»ã«ã¯ã©ããªããšãã§ããã®ããæ°ã«ãªã£ãŠå
¬åŒãµã€ãã調ã¹ãŠã¿ãŸããã
PEGASUSã䜿ã£ããšãã¯æç« èŠçŽãã©ãŠã¶ãéçºããŸããã
ä»åã®ç®æšã¯ãtransformersã䜿ããšäœãã§ããã®ããç¥ãããšããšããŸããæåã¯ãã®ã¢ãžã¥ãŒã«ãèŠãŠãäœãã§ããã®ããããããããªãã®ã§ããããã¯ãããããšæããŸãã
ãŸãã¯Quick tourãããããŸãïŒ
Quick tour
Transformersã©ã€ãã©ãªã®ç¹åŸŽããèŠãŠãããŸãããããã®ã©ã€ãã©ãªã¯ããã¹ãã«å¯ŸããŠã®ææ åæããã»ãªãã®å®æã翻蚳ã®ãããªæç« çæãªã©ã®èªç¶èšèªåŠçã¿ã¹ã¯ãå®è¡ããåŠç¿æžã¿ã¢ãã«ãããŠã³ããŒãããŠããŸãã
æåã¯ãæšè«ã®ãšãã«pipeline APIãã©ã®ããã«æŽ»çšããŠåŠç¿æžã¿ã¢ãã«ã䜿ããããèŠãŠãããŸãããã®åŸããå°ãæãäžããŠãããã©ã€ãã©ãªãã©ã®ããã«ã¢ãã«ãžã®ã¢ã¯ã»ã¹ãèš±å¯ããŠããããã©ã®ããã«ããŒã¿ã®ååŠçã®æå©ããããŠãããã«ã€ããŠèŠãŠãããŸãã
pipelineã®ã¿ã¹ã¯
ãã§ã«äžããããŠããã¿ã¹ã¯ã«å¯ŸããŠãåŠç¿æžã¿ã¢ãã«ãå©çšãããå Žåã®æãç°¡åãªæ¹æ³ã¯pipeline()
é¢æ°ã䜿ãããšã§ããð€Transformersã¯ããã«äœ¿ãã次ã®ã¿ã¹ã¯ãæäŸããŠãããŸãã
- ææ åæ: ãã®ããã¹ãã¯ããžãã£ããªå 容ããã¬ãã£ããªå 容ãïŒ
- è±èªã®æç« çæ: ãªã«ãããã®ã»ãªããäžãããšãã¢ãã«ãããã«ç¶ãæç« ãçæããŠãããŸãã
- åºæè¡šçŸæœåº: äžããããæç« ã®äžã§ãããããã®èªã«å¯ŸããŠãããããäœãè¡šçŸããŠãããã®ã©ãã«ãä»äžããŸããã
- 質åå¿ç: ã¢ãã«ã«ããã€ãã®æèãäžã質åãè¡ããšããã®æèããçããæœåºããŠãããŸãã
- æç« ã®ç©Žåã:
[MASK]
ã§ç©Žãéãããããããªæç« ã®ç©ŽãåããŠãããŸãã - èŠçŽ: é·ãããã¹ãã®èŠçŽãçæããŸãã
- 翻蚳: èšèªéã®ç¿»èš³ãè¡ããŸãã
- ç¹åŸŽæœåºïŒ ããã¹ãã®è¡šçŸãã³ãœã«ãæœåºããŸãã
pipeline()
ãäŸãã°ææ
åæã«ãããŠã©ã®ãããªã¯ããããããŠããããèŠãŠã¿ãŸãããã
>>> from transformers import pipeline
>>> classifier = pipeline('sentiment-analysis')
ã¯ãããŠãã®ã³ãã³ããå
¥åããããšããåŠç¿æžã¿ã¢ãã«ãštokenizer
ãããŠã³ããŒãããããã£ãã·ã¥ãããŸããtokenizer
ã®ä»äºã¯ãããã¹ããpredictionã®çæãããã¢ãã«ãåãåãããããªåœ¢ã«ååŠçããããšã§ããpipeline()
ã¯ãããã®predictionãã²ãšãŸãšãã«ãã人éãèªãããããªåœ¢ã«åŸåŠçããŠãããŸãã
äŸïŒ
>>> classifier('We are very happy to show you the ð€ Transformers library.')
[{'label': 'POSITIVE', 'score': 0.9997795224189758}]
ãªããšå¿åŒ·ãïŒpipeline()
ã¯æç« ã®ãªã¹ãã«å¯ŸããŠã䜿ãããšãã§ããŸããæç« ãååŠçãããããããšããŠã¢ãã«ãžäžããããããšãšãªããŸãããããŠæåŸã¯ããã®ããã«âèŸæžåã§è¿ãããŸãã
>>> results = classifier(["We are very happy to show you the ð€ Transformers library.",
... "We hope you don't hate it."])
>>> for result in results:
... print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
ïŒã€ç®ã®æç« ãNEGATIVEã«ãªã£ãŠããŸãããã¹ã³ã¢ã¯ããªãäžå€®ã«è¿ãã§ãã
ããã©ã«ãã§ã¯ããã®pipeline()
ã§ããŠã³ããŒããããã¢ãã«ã¯distillbert-base-uncased-finetuned-sst-2-englishãšåŒã°ããŠããŸãããã®ã¢ãã«ã¯DistillBERTã¢ãŒããã¯ãã£ã䜿ã£ãŠãããSST-2ãšåŒã°ããããŒã¿ã»ãããçšããŠææ
åæã¿ã¹ã¯çšã«ãã¡ã€ã³ãã¥ãŒãã³ã°ãããŠããŸãã
éãã¢ãã«ã䜿ã£ãŠã¿ãŸããããäŸãã°ããã©ã³ã¹èªã®ããŒã¿ã§èšç·Žãããã¢ãã«ããããšããŸããmodel hubã«ç»é²ãããŠããåŠç¿æžã¿ã¢ãã«ã®äžããnlptown/bert-base-multilingual-uncased-sentimentãšãããã®ãéžã³ãŸãã
çŽæ¥ãã¹ãpipeline()
ã«æå®ããŠäœ¿çšããããšãã§ããŸãã
>>> classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")
ãã®åé¡åšã¯ä»è±èªããã©ã³ã¹èªã ãã§ãªããªã©ã³ãèªããã€ãèªã«ã€ã¿ãªã¢èªãã¹ãã€ã³èªãŸã§æ±ãããšãã§ããŸãïŒã¢ãã«ã®ååã¯ããŒã«ã«ç°å¢ã«ä¿åããŠããåŠç¿æžã¿ã¢ãã«ã®ãã¹ã«çœ®ãæããããšãã§ããŸãã
ã¢ãã«ã®ãªããžã§ã¯ããšããã«é¢é£ä»ããtokenizerãæž¡ãããšãã§ããŸãã
ãã®ããã«ã¯ïŒã€ã®ã¯ã©ã¹ãå¿
èŠãšãªããŸããïŒã€ç®ã¯ãAutoTokenizer
ãšãããã®ã§ããAutoTokenizer
ã¯èªåãéžãã ã¢ãã«ã«é¢é£ä»ããtokenizerãããŠã³ããŒãããŠäœ¿çšããããã«äœ¿ãããŸããïŒã€ç®ã¯AutoModelForSequenceClassification
ãšåŒã°ãããã®ã§ããïŒTensorflowãå©çšããŠããã`TFAutoModelForSequenceClassificationïŒããã¯ã¢ãã«èªäœãããŠã³ããŒãããŠäœ¿çšããããã®ãã®ã§ãã
ã¢ãã«ãštokenizerãããŠã³ããŒãããŠäœ¿ãããã«ã¯ãfrom_pretrained()
ã¡ãœããã䜿çšããå¿
èŠããããŸãã
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
èªåã®ææããŠããããŒã¿ã«äŒŒããã®ã§äºååŠç¿ããããã®ãèŠã€ããããªãå Žåã¯ãèªåã®ããŒã¿ã§ã¢ãã«ããã¡ã€ã³ãã¥ãŒãã³ã°ããå¿ èŠããããŸãã
è£åŽã®åŠç
äºååŠç¿æžã¿ã¢ãã«ã®è£åŽã®åŠçã«ã€ããŠã®æç« ã¯ãã¡ã
ã¿ã¹ã¯ã®äžèŠ§
ã©ã€ãã©ãªã®ãŠãŒã¹ã±ãŒã¹ã®äžã§ãé«é »åºŠã®ãã®ã«ã€ããŠã§ããã¢ãã«ã¯è²ã ãªèšå®ãè¿œå ããããšãã§ãããšãŠãåºãçšéã§äœ¿ãããšãã§ããŸãã質åå¿çãç³»åããŒã¿ã®åé¡ãåºæè¡šçŸæœåºã®ãããªäžçªã·ã³ãã«ãªãã®ãããã§çŽ¹ä»ããŸãã
ä»åã®äŸã§ã¯ãªãŒãã¢ãã«ãå©çšããŸãããªãŒãã¢ãã«ãšããã®ã¯ãäžãããããã§ãã¯ãã€ã³ãã«å¿ããŠèªåçã«é©åãªã¢ãã«ã¢ãŒããã¯ãã£ãéžæãããã®ã¢ãã«ãã€ã³ã¹ã¿ã³ã¹ã«ããã¯ã©ã¹ã®ããšãæããŸããAutoModel
ãšããã¯ã©ã¹ãšããŠæäŸãããŠããŸãã
ã¢ãã«ãããã¿ã¹ã¯ã«å¯ŸããŠé©åã«åäœãããããã«ãã¿ã¹ã¯ã«å¯Ÿå¿ãããã§ãã¯ãã€ã³ãããã¢ãã«ãèªã¿ããå¿ èŠããããŸãããã®ãã§ãã¯ãã€ã³ããšããã®ã¯éåžžãããŒã¿ã®åºå€§ãªã³ãŒãã¹ãšå ·äœçãªã¿ã¹ã¯ã«ã€ããŠã®ãã¡ã€ã³ãã¥ãŒãã³ã°ã§äºååŠç¿ãè¡ãããŠäœæãããŠããŸãããããäœãæå³ããã®ããšèšããšã
- ãã¹ãŠã®ã¢ãã«ããã¹ãŠã®ã¿ã¹ã¯ã«å¯ŸããŠãã¡ã€ã³ãã¥ãŒãã³ã°ãããŠããããã§ã¯ãªããäœãå ·äœçãªã¿ã¹ã¯ã«ã€ããŠã¢ãã«ããã¡ã€ã³ãã¥ãŒãã³ã°ããããšãã¯é©å®å¯Ÿå¿ãããå¿ èŠãããã
- ãã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã®ã¢ãã«ã¯ç¹å®ã®ããŒã¿ã»ããã§ãã¡ã€ã³ãã¥ãŒãã³ã°ããããã®ã§ããããã®ããŒã¿ã»ããã¯èªåã®ãŠãŒã¹ã±ãŒã¹ã«åã£ãŠãããããããªãããåã£ãŠããªããããããªãããããå¿ èŠã«ãã£ãŠã¯ã¹ã¯ãªãããèªäœãããããŠé©å®å¯Ÿå¿ããå¿ èŠãããã
ãŸããã¿ã¹ã¯ã«å¯ŸããŠæšè«ãè¡ãããã®ããã€ãã®ã¡ã«ããºã ãã©ã€ãã©ãªã«ãã£ãŠå©çšå¯èœã«ãªã£ãŠããã
- Pipelines: æœè±¡åãããŠããŠéåžžã«ç°¡åã«äœ¿çšã§ãããïŒè¡ã®ã³ãŒãã§æžãå Žåãããã
- çŽæ¥ã®ã¢ãã«äœ¿çš: ããŸãæœè±¡åãããŠããªãããããtokenizerã«çŽæ¥ã¢ã¯ã»ã¹ããããšãã§ãããããŠæè»æ§ã«å¯ã¿ããã¯ãã«ã§ããã
ç³»åããŒã¿ã®åé¡(Sequence Classification)
ç³»åããŒã¿ã®åé¡ã¯ãäžããããåé¡(ã¯ã©ã¹)ã®æ°ã«ãã£ãŠããŒã¿ãåé¡ããã¿ã¹ã¯ã®ããšã§ããä»åã®ç³»åããŒã¿åé¡ã®äŸãšããŠãGLUEã®ããŒã¿ã»ããã䜿çšããŸããGLUEã®ç³»åããŒã¿åé¡ã®ã¢ãã«ããã¡ã€ã³ãã¥ãŒãã³ã°ãããå Žåã¯ãrun_glue.pyã®ãããªã¹ã¯ãªãããäœæããŠå¯Ÿå¿ããå¿ èŠããããŸãã
ããã¯ç³»åããŒã¿åé¡ã®äŸãšããŠãææ
åæãè¡ã£ãŠããããã°ã©ã ã§ããäžããããæç« ãããžãã£ãããã¬ãã£ãããå€å¥ããŸããããã¯GLUEã¿ã¹ã¯ã®SST2ãšããããŒã¿ã»ããã§ãã¡ã€ã³ãã¥ãŒãã³ã°ãããã¢ãã«ã«å¯Ÿå¿ããŠããŸãã
ã¹ã³ã¢ãšäžç·ã«"POSITIVE"ã"NEGATIVE"ã®æåãè¿ããŸãã
>>> from transformers import pipeline
>>> nlp = pipeline("sentiment-analysis")
>>> result = nlp("I hate you")[0]
>>> print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: NEGATIVE, with score: 0.9991
>>> result = nlp("I love you")[0]
>>> print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9999
æœåºåã®è³ªåå¿ç
æœåºåã®è³ªåå¿çã¯ãäžãããã質åæãšæèãã解çãæœåºããã¿ã¹ã¯ã®ããšã§ãã質åå¿çããŒã¿ã»ããã®ãµã³ãã«ã¯SQuADããŒã¿ã»ããã«ãããã®ã§ããSQuADããŒã¿ã»ããã¯å šäœçã«è³ªåå¿çã¿ã¹ã¯ã«åºã¥ãããã®ã«ãªã£ãŠããŸããSQuADã¿ã¹ã¯ã®ã¢ãã«ããã¡ã€ã³ãã¥ãŒãã³ã°ãããå Žåã¯run_tf_squad.pyã®ãããªã¹ã¯ãªãããçšæããå¿ èŠããããŸãã
ããã¯SQuADã§ãã¡ã€ã³ãã¥ãŒãã³ã°ãããã¢ãã«ãå©çšããŠè³ªåå¿çãè¡ã£ãŠãããµã³ãã«ã§ãã
>>> from transformers import pipeline
>>> nlp = pipeline("question-answering")
>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
... """
>>> result = nlp(question="What is extractive question answering?", context=context)
>>> print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
Answer: 'the task of extracting an answer from a text given a question.', score: 0.6226, start: 34, end: 96
>>> result = nlp(question="What is a good example of a question answering dataset?", context=context)
>>> print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
Answer: 'SQuAD dataset,', score: 0.5053, start: 147, end: 161
èšèªã¢ããªã³ã°
èšèªã¢ããªã³ã°ã¯ã¢ãã«ãã³ãŒãã¹ã«é©ãããã®ã«ããã¿ã¹ã¯ã®ããšãæããŸããtransformerãããŒã¹ã«ãªã£ãŠãããã¹ãŠã®äººæ°ã®ããã¢ãã«ã¯æ¢åã®èšèªã¢ãã«ãå©çšããŠäœæãããŠããŸããäŸãã°BERTã¯masked language modelingãGPT-2ã¯causal language modelingãšãã£ãå ·åã§ãã
Masked Language Modeling
Masked language modelingã¯æç« ã®äžã®èªããã¹ãã³ã°ãããããŒã¯ã³ã«çœ®ãæãããã®ãã¢ãã«ã«äžããé©åãªç©Žåããããããšããã¢ããªã³ã°ææ³ã§ããã¢ãã«ã¯ç©Žã®å³åŽã®èªãšå·ŠåŽã®æèã芳å¯ããŸãã
æèããç©ŽåããããåŠçã®ãµã³ãã«ã¯ä»¥äžã®éãã§ãã
>>> from transformers import pipeline
>>> nlp = pipeline("fill-mask")
>>> from pprint import pprint
>>> pprint(nlp(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks."))
[{'score': 0.1792745739221573,
'sequence': '<s>HuggingFace is creating a tool that the community uses to '
'solve NLP tasks.</s>',
'token': 3944,
'token_str': 'Ä tool'},
{'score': 0.11349421739578247,
'sequence': '<s>HuggingFace is creating a framework that the community uses '
'to solve NLP tasks.</s>',
'token': 7208,
'token_str': 'Ä framework'},
{'score': 0.05243554711341858,
'sequence': '<s>HuggingFace is creating a library that the community uses to '
'solve NLP tasks.</s>',
'token': 5560,
'token_str': 'Ä library'},
{'score': 0.03493533283472061,
'sequence': '<s>HuggingFace is creating a database that the community uses '
'to solve NLP tasks.</s>',
'token': 8503,
'token_str': 'Ä database'},
{'score': 0.02860250137746334,
'sequence': '<s>HuggingFace is creating a prototype that the community uses '
'to solve NLP tasks.</s>',
'token': 17715,
'token_str': 'Ä prototype'}]
Causal Language Modeling
Causal language mmodelingã¯äžããããæç« ã®ç¶ãã®èªãäºæž¬ããã¿ã¹ã¯ã®ããšã§ãããã®ã¿ã¹ã¯ã®å Žåã¯ãã¢ãã«ã¯ç©Žã®å·ŠåŽã®ã¿ã芳å¯ããŸããåºæ¬çã«ãç¶ãã®æç« ã¯ã¢ãã«ã®æåŸã®é ãç¶æ ããäºæž¬ãããŸãã
ã¢ãã«ãštokenizerã䜿ããå
¥åã®æç« ããç¶ãã®èªãäºæ³ããtop_k_top_p_filtering()
ã¡ãœãããå©çšãããµã³ãã«ã以äžã«ç€ºããŸãã
>>> from transformers import AutoModelWithLMHead, AutoTokenizer, top_k_top_p_filtering
>>> import torch
>>> from torch.nn import functional as F
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelWithLMHead.from_pretrained("gpt2")
>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "
>>> input_ids = tokenizer.encode(sequence, return_tensors="pt")
>>> # get logits of last hidden state
>>> next_token_logits = model(input_ids).logits[:, -1, :]
>>> # filter
>>> filtered_next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
>>> # sample
>>> probs = F.softmax(filtered_next_token_logits, dim=-1)
>>> next_token = torch.multinomial(probs, num_samples=1)
>>> generated = torch.cat([input_ids, next_token], dim=-1)
>>> resulting_string = tokenizer.decode(generated.tolist()[0])
>>> print(resulting_string)
Hugging Face is based in DUMBO, New York City, and has
æç« çæ
æç« çæã®ç®çã¯ãäžããããæèãã蟻è€ã®ãã£ãããã¹ããç¶ããšããŠäœãåºãããšã§ãã以äžã®äŸã§ã¯ãGPT-2ãã©ã®ããã«äœ¿ãããŠããã¹ããçæããŠãããã瀺ããŠããŸããããã©ã«ãã§ããããã§èšå®ãããŠããããã«ããã¹ãŠã®ã¢ãã«ã¯pipelinesã§äœ¿çšããããšãTop-Kãµã³ããªã³ã°ã«å¯Ÿå¿ããããã«ãªã£ãŠããŸãã
>>> from transformers import pipeline
>>> text_generator = pipeline("text-generation")
>>> print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))
[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]
æç« çæã¯çŸåšãGPT-2ãOpenAi-GPTãCTRLãXLNetãTransfo-XLãšPytorchã®ReformerãšTensorflowã®ã»ãšãã©ã®ã¢ãã«ã§ãå©çšå¯èœã«ãªã£ãŠããŸãã
åºæè¡šçŸæœåº
åºæè¡šçŸæœåºã¯ãèªãåé¡(ã¯ã©ã¹)ã«åºã¥ããŠåé¡ããŸããèªã人ãšããŠèªèããããçµç¹ãšããŠèªèããããå ŽæãšããŠèªèãããã®ãããªã©ãã«ä»ããè¡ããŸããåºæè¡šçŸæœåºã®ãµã³ãã«ã¯CoNLL-2003ãšããããŒã¿ã»ããã䜿ãããŠããŸãããã¡ã€ã³ãã¥ãŒãã³ã°ããããå Žåã¯run_pl_ner.pyã®ãããªã¹ã¯ãªãããçšæããŸãããã
åºæè¡šçŸæœåºã®ãµã³ãã«ã以äžã«ç€ºããŸãããã®åºæè¡šçŸæœåºã§ã¯ãïŒçš®é¡ã®åé¡ãè¡ãããšããŠããŸãã
O, Outside of a named entity
B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity
I-MIS, Miscellaneous entity
B-PER, Beginning of a personâs name right after another personâs name
I-PER, Personâs name
B-ORG, Beginning of an organisation right after another organisation
I-ORG, Organisation
B-LOC, Beginning of a location right after another location
I-LOC, Location
>>> from transformers import pipeline
>>> nlp = pipeline("ner")
>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very"
... "close to the Manhattan Bridge which is visible from the window."
>>> print(nlp(sequence))
[
{'word': 'Hu', 'score': 0.9995632767677307, 'entity': 'I-ORG'},
{'word': '##gging', 'score': 0.9915938973426819, 'entity': 'I-ORG'},
{'word': 'Face', 'score': 0.9982671737670898, 'entity': 'I-ORG'},
{'word': 'Inc', 'score': 0.9994403719902039, 'entity': 'I-ORG'},
{'word': 'New', 'score': 0.9994346499443054, 'entity': 'I-LOC'},
{'word': 'York', 'score': 0.9993270635604858, 'entity': 'I-LOC'},
{'word': 'City', 'score': 0.9993864893913269, 'entity': 'I-LOC'},
{'word': 'D', 'score': 0.9825621843338013, 'entity': 'I-LOC'},
{'word': '##UM', 'score': 0.936983048915863, 'entity': 'I-LOC'},
{'word': '##BO', 'score': 0.8987102508544922, 'entity': 'I-LOC'},
{'word': 'Manhattan', 'score': 0.9758241176605225, 'entity': 'I-LOC'},
{'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
]
"Hugging Face"ãšããèšèãçµç¹ãšããŠåé¡ããã"New York City"ã"DUMBO"ã"Manhattan Bridge"ãšããèšèããã¡ããšå ŽæãšããŠèªèãããŠããŸãã
èŠçŽ
èŠçŽã¯æžé¡ãèšäºãããçãããã¹ãã«ããã¿ã¹ã¯ã®ããšã§ããèŠçŽã¿ã¹ã¯ã®ãµã³ãã«ã¯é·ããã¥ãŒã¹èšäºããã¥ãŒã¹ã§æ§æãããŠããCNN/DailyMailDatasetãçšããããŠããŸãã
>>> from transformers import pipeline
>>> summarizer = pipeline("summarization")
>>> ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
... A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
... Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
... In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
... Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
... 2010 marriage license application, according to court documents.
... Prosecutors said the marriages were part of an immigration scam.
... On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
... After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
... Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
... All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
... Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
... Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
... The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
... Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
... Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
... If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
... """
>>> print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))
[{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]
èŠçŽã®pipeline()
ã¯PreTrainedModel.generate()
ã¡ãœããã«äŸåããŠããã®ã§ãpipeline()
ã«max_length
åŒæ°ãšmin_length
åŒæ°ã以äžã«æå®ããŠãªãŒããŒã©ã€ããè¡ããŸãã
翻蚳
翻蚳ã¿ã¹ã¯ã¯ããèšèªã§æžãããæç« ãéãèšèªã«ç¿»èš³ããããšã§ãã
翻蚳ã¿ã¹ã¯ã®ãµã³ãã«ã®ããŒã¿ã»ããã«ã¯WMT English to GermanãçšããŸãããã®ããŒã¿ã»ããã¯è±èªã®æç« ã®å ¥åãšãããã«å¯Ÿå¿ãããã€ãèªã®æç« ãå«ãŸããŠããŸãã
>>> from transformers import pipeline
>>> translator = pipeline("translation_en_to_de")
>>> print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]
翻蚳ã®pipeline()
ã¯PreTrainedModel.generate()
ã¡ãœããã«äŸåããŠããã®ã§ãpipeline()
ã«max_length
åŒæ°ãšmin_length
åŒæ°ã以äžã«æå®ããŠãªãŒããŒã©ã€ããè¡ããŸãã
ãããã«ð€
ããã§ãTransformersã§äœãã§ããã®ãããŒãããã€ã¡ãŒãžãã€ããããšæããŸãã
å人çã«ã¯ã翻蚳ãšèŠçŽãšè³ªåå¿çãæ°ã«ãªã£ãŠããŸãã
ãŸã äœãã§ããã®ãã®çš®é¡ãç°¡åãªå©çšæ¹æ³ã«è§Šããã ããªã®ã§ããŸãã¢ãããŒã·ã§ã³ããã£ããã©ãã©ãæãäžããŠããããšæããŸãïŒ
ããããšãããããŸããã