Merge pull request #43 from jdepoix/bugfix/cli-language-default
fixed bug in cli where no transcript could be retrieved if no language was specified
This commit is contained in:
commit
bfecd64b85
191
README.md
191
README.md
|
@ -1,74 +1,69 @@
|
||||||
|
|
||||||
# YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)
|
# YouTube Transcript/Subtitle API (including automatically generated subtitles and subtitle translations)
|
||||||
|
|
||||||
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url)
|
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) [](https://travis-ci.org/jdepoix/youtube-transcript-api) [](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master) [](http://opensource.org/licenses/MIT) [](https://pypi.org/project/youtube-transcript-api/) [](https://pypi.org/project/youtube-transcript-api/)
|
||||||
[](https://travis-ci.org/jdepoix/youtube-transcript-api)
|
|
||||||
[](https://coveralls.io/github/jdepoix/youtube-transcript-api?branch=master)
|
This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!
|
||||||
[](http://opensource.org/licenses/MIT)
|
|
||||||
[](https://pypi.org/project/youtube-transcript-api/)
|
## Install
|
||||||
[](https://pypi.org/project/youtube-transcript-api/)
|
|
||||||
|
It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/):
|
||||||
This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!
|
|
||||||
|
```
|
||||||
## Install
|
pip install youtube_transcript_api
|
||||||
|
```
|
||||||
It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/):
|
|
||||||
|
If you want to use it from source, you'll have to install the dependencies manually:
|
||||||
```
|
|
||||||
pip install youtube_transcript_api
|
```
|
||||||
```
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
If you want to use it from source, you'll have to install the dependencies manually:
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli).
|
You can either integrate this module [into an existing application](#api), or just use it via an [CLI](#cli).
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
The easiest way to get a transcript for a given video is to execute:
|
The easiest way to get a transcript for a given video is to execute:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from youtube_transcript_api import YouTubeTranscriptApi
|
from youtube_transcript_api import YouTubeTranscriptApi
|
||||||
|
|
||||||
YouTubeTranscriptApi.get_transcript(video_id)
|
YouTubeTranscriptApi.get_transcript(video_id)
|
||||||
```
|
```
|
||||||
|
|
||||||
This will return a list of dictionaries looking somewhat like this:
|
This will return a list of dictionaries looking somewhat like this:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
[
|
[
|
||||||
{
|
{
|
||||||
'text': 'Hey there',
|
'text': 'Hey there',
|
||||||
'start': 7.58,
|
'start': 7.58,
|
||||||
'duration': 6.13
|
'duration': 6.13
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'text': 'how are you',
|
'text': 'how are you',
|
||||||
'start': 14.08,
|
'start': 14.08,
|
||||||
'duration': 7.58
|
'duration': 7.58
|
||||||
},
|
},
|
||||||
# ...
|
# ...
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).
|
You can also add the `languages` param if you want to make sure the transcripts are retrieved in your desired language (it defaults to english).
|
||||||
|
|
||||||
```python
|
```python
|
||||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||||
```
|
```
|
||||||
|
|
||||||
It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts)
|
It's a list of language codes in a descending priority. In this example it will first try to fetch the german transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out which languages are available first, [have a look at `list_transcripts()`](#list-available-transcripts)
|
||||||
|
|
||||||
To get transcripts for a list of video ids you can call:
|
To get transcripts for a list of video ids you can call:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
YouTubeTranscriptApi.get_transcripts(video_ids, languages=['de', 'en'])
|
||||||
```
|
```
|
||||||
|
|
||||||
`languages` also is optional here.
|
`languages` also is optional here.
|
||||||
|
|
||||||
### List available transcripts
|
### List available transcripts
|
||||||
|
|
||||||
|
@ -81,16 +76,16 @@ transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de
|
||||||
This will return a `TranscriptList` object which is iterable and provides methods to filter the list of transcripts for specific languages and types, like:
|
This will return a `TranscriptList` object which is iterable and provides methods to filter the list of transcripts for specific languages and types, like:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||||
```
|
```
|
||||||
|
|
||||||
By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types:
|
By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this default behaviour by searching for specific transcript types:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# filter for manually created transcripts
|
# filter for manually created transcripts
|
||||||
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||||
|
|
||||||
# or automatically generated ones
|
# or automatically generated ones
|
||||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -98,15 +93,15 @@ The methods `find_generated_transcript`, `find_manually_created_transcript`, `fi
|
||||||
|
|
||||||
```python
|
```python
|
||||||
print(
|
print(
|
||||||
transcript.video_id,
|
transcript.video_id,
|
||||||
transcript.language,
|
transcript.language,
|
||||||
transcript.language_code,
|
transcript.language_code,
|
||||||
# whether it has been manually created or generated by YouTube
|
# whether it has been manually created or generated by YouTube
|
||||||
transcript.is_generated,
|
transcript.is_generated,
|
||||||
# whether this transcript can be translated or not
|
# whether this transcript can be translated or not
|
||||||
transcript.is_translatable,
|
transcript.is_translatable,
|
||||||
# a list of languages the transcript can be translated to
|
# a list of languages the transcript can be translated to
|
||||||
transcript.translation_languages,
|
transcript.translation_languages,
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -116,42 +111,42 @@ and provide the method, which allows you to fetch the actual transcript data:
|
||||||
transcript.fetch()
|
transcript.fetch()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Translate transcript
|
### Translate transcript
|
||||||
|
|
||||||
YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object:
|
YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated `Transcript` object:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
transcript = transcript_list.find_transcript(['en'])
|
transcript = transcript_list.find_transcript(['en'])
|
||||||
translated_transcript = transcript.translate('de')
|
translated_transcript = transcript.translate('de')
|
||||||
print(translated_transcript.fetch())
|
print(translated_transcript.fetch())
|
||||||
```
|
```
|
||||||
|
|
||||||
### By example
|
### By example
|
||||||
```python
|
```python
|
||||||
# retrieve the available transcripts
|
# retrieve the available transcripts
|
||||||
transcript_list = YouTubeTranscriptApi.get('video_id')
|
transcript_list = YouTubeTranscriptApi.get('video_id')
|
||||||
|
|
||||||
# iterate over all available transcripts
|
# iterate over all available transcripts
|
||||||
for transcript in transcript_list:
|
for transcript in transcript_list:
|
||||||
|
|
||||||
# the Transcript object provides metadata properties
|
# the Transcript object provides metadata properties
|
||||||
print(
|
print(
|
||||||
transcript.video_id,
|
transcript.video_id,
|
||||||
transcript.language,
|
transcript.language,
|
||||||
transcript.language_code,
|
transcript.language_code,
|
||||||
# whether it has been manually created or generated by YouTube
|
# whether it has been manually created or generated by YouTube
|
||||||
transcript.is_generated,
|
transcript.is_generated,
|
||||||
# whether this transcript can be translated or not
|
# whether this transcript can be translated or not
|
||||||
transcript.is_translatable,
|
transcript.is_translatable,
|
||||||
# a list of languages the transcript can be translated to
|
# a list of languages the transcript can be translated to
|
||||||
transcript.translation_languages,
|
transcript.translation_languages,
|
||||||
)
|
)
|
||||||
|
|
||||||
# fetch the actual transcript data
|
# fetch the actual transcript data
|
||||||
print(transcript.fetch())
|
print(transcript.fetch())
|
||||||
|
|
||||||
# translating the transcript will return another transcript object
|
# translating the transcript will return another transcript object
|
||||||
print(transcript.translate('en').fetch())
|
print(transcript.translate('en').fetch())
|
||||||
|
|
||||||
# you can also directly filter for the language you are looking for, using the transcript list
|
# you can also directly filter for the language you are looking for, using the transcript list
|
||||||
transcript = transcript_list.find_transcript(['de', 'en'])
|
transcript = transcript_list.find_transcript(['de', 'en'])
|
||||||
|
|
|
@ -72,7 +72,7 @@ class YouTubeTranscriptCli():
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'--languages',
|
'--languages',
|
||||||
nargs='*',
|
nargs='*',
|
||||||
default=[],
|
default=['en',],
|
||||||
type=str,
|
type=str,
|
||||||
help=(
|
help=(
|
||||||
'A list of language codes in a descending priority. For example, if this is set to "de en" it will '
|
'A list of language codes in a descending priority. For example, if this is set to "de en" it will '
|
||||||
|
|
|
@ -77,7 +77,7 @@ class TestYouTubeTranscriptCli(TestCase):
|
||||||
parsed_args = YouTubeTranscriptCli('v1 v2'.split())._parse_args()
|
parsed_args = YouTubeTranscriptCli('v1 v2'.split())._parse_args()
|
||||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||||
self.assertEqual(parsed_args.json, False)
|
self.assertEqual(parsed_args.json, False)
|
||||||
self.assertEqual(parsed_args.languages, [])
|
self.assertEqual(parsed_args.languages, ['en'])
|
||||||
|
|
||||||
def test_argument_parsing__fail_without_video_ids(self):
|
def test_argument_parsing__fail_without_video_ids(self):
|
||||||
with self.assertRaises(SystemExit):
|
with self.assertRaises(SystemExit):
|
||||||
|
@ -87,12 +87,12 @@ class TestYouTubeTranscriptCli(TestCase):
|
||||||
parsed_args = YouTubeTranscriptCli('v1 v2 --json'.split())._parse_args()
|
parsed_args = YouTubeTranscriptCli('v1 v2 --json'.split())._parse_args()
|
||||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||||
self.assertEqual(parsed_args.json, True)
|
self.assertEqual(parsed_args.json, True)
|
||||||
self.assertEqual(parsed_args.languages, [])
|
self.assertEqual(parsed_args.languages, ['en'])
|
||||||
|
|
||||||
parsed_args = YouTubeTranscriptCli('--json v1 v2'.split())._parse_args()
|
parsed_args = YouTubeTranscriptCli('--json v1 v2'.split())._parse_args()
|
||||||
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
self.assertEqual(parsed_args.video_ids, ['v1', 'v2'])
|
||||||
self.assertEqual(parsed_args.json, True)
|
self.assertEqual(parsed_args.json, True)
|
||||||
self.assertEqual(parsed_args.languages, [])
|
self.assertEqual(parsed_args.languages, ['en'])
|
||||||
|
|
||||||
def test_argument_parsing__languages(self):
|
def test_argument_parsing__languages(self):
|
||||||
parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en'.split())._parse_args()
|
parsed_args = YouTubeTranscriptCli('v1 v2 --languages de en'.split())._parse_args()
|
||||||
|
|
Loading…
Reference in New Issue