Add docs for formatters
This commit is contained in:
parent
36f29c3058
commit
d3cd5d1254
65
README.md
65
README.md
|
@ -158,6 +158,71 @@ transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
|
||||||
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
transcript = transcript_list.find_generated_transcript(['de', 'en'])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Using Formatters
|
||||||
|
Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert the transcript from its Python data type into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats that have a defined specification such as JSON (`.json`), WebVTT format (`.vtt`), Comma-separated format (`.csv`), etc...
|
||||||
|
|
||||||
|
The `formatters` submodule provides a few basic formatters to wrap around you transcript data in cases where you might want to do something such as output a specific format then write that format to a file. Maybe to backup/store and run another script against at a later time.
|
||||||
|
|
||||||
|
We provided a few subclasses of formatters to use:
|
||||||
|
|
||||||
|
- JSONFormatter
|
||||||
|
- TextFormatter
|
||||||
|
- WebVTTFormatter (a basic implementation)
|
||||||
|
|
||||||
|
Here is how to import from the `formatters` module.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# the base class to inherit from when creating your own formatter.
|
||||||
|
from youtube_transcript_api.formatters import Formatter
|
||||||
|
|
||||||
|
# some provided subclasses, each outputs a different string format.
|
||||||
|
from youtube_transcript_api.formatters import JSONFormatter
|
||||||
|
from youtube_transcript_api.formatters import TextFormatter
|
||||||
|
from youtube_transcript_api.formatters import WebVTTFormatter
|
||||||
|
```
|
||||||
|
|
||||||
|
### Provided Formatter Example
|
||||||
|
Lets say we wanted to retrieve a transcript and write that transcript as a JSON file in the same format as the API returned it as. That would look something like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# your_custom_script.py
|
||||||
|
|
||||||
|
from youtube_transcript_api import YouTubeTranscriptApi
|
||||||
|
from youtube_transcript_api.formatters import JSONFormatter
|
||||||
|
|
||||||
|
# Must be a single transcript.
|
||||||
|
transcript = YouTubeTranscriptApi.get_transcript(video_id)
|
||||||
|
|
||||||
|
# .format() turns the transcript into a JSON string.
|
||||||
|
json_formatted = JSONFormatter(transcript).format()
|
||||||
|
|
||||||
|
|
||||||
|
# Now we can write it out to a file.
|
||||||
|
with open('your_filename.json', 'w', encoding='utf-8') as json_file:
|
||||||
|
json_file.write(json_formatted)
|
||||||
|
|
||||||
|
# Now should have a new JSON file that you can easily read back into Python.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Passing extra keyword arguments**
|
||||||
|
|
||||||
|
Since JSONFormatter leverages `json.dumps()` you can also forward keyword arguments into `.format()` such as making your file output prettier by forwarding the `indent=2` keyword argument.
|
||||||
|
|
||||||
|
```python
|
||||||
|
json_formatted = JSONFormatter(transcript).format(indent=2)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom Formatter Example
|
||||||
|
You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the `def format(self, **kwargs):` method which should ultimately return a string when called on your formatter instance.
|
||||||
|
|
||||||
|
```python
|
||||||
|
|
||||||
|
class MyCustomFormatter(Formatter):
|
||||||
|
def format(self, **kwargs):
|
||||||
|
# Do your custom work in here, but return a string.
|
||||||
|
return 'your processed output data as a string.'
|
||||||
|
```
|
||||||
|
|
||||||
## CLI
|
## CLI
|
||||||
|
|
||||||
Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:
|
Execute the CLI script using the video ids as parameters and the results will be printed out to the command line:
|
||||||
|
|
Loading…
Reference in New Issue