2022. 10. 17. 20:31ㆍPython/- Tensorflow
1. Tensorflow_datasets(in detail)
2. Background and goal
You can use data directly from images, xmls in local directory. After download the data.
But i want to use dataset voc/2007, voc/2012 from tfds using tfds.load(), and i want custom_dataset to use same pipeline.
3. Prepare the dataset
You can prepare these example, i will use sceond type
data data data
└ train └ train_images └ train
└ images └ train_xmls └ train.csv
└ xmls └ test_images └ test
└ test └ test_images └ test.csv
└ images
└ xmls
If dataset is prepared, zip you dataset(data.zip)
4. Generate custom dataset
4-1. Install tfds module and generate custom_dataset project
4-2. Modify custom_dataset.py
4-3. Build custom_dataset.py
4-1. Install tfds moduel and generate custom_dataset project
1) (env) C:\...> pip install tensorflow_datasets
2) (env) C:\...> cd workspace
3) (env) C:\...> tfds new custom_dataset
now we modify only custom_dataset.py to generate custom_dataset
4-2. Modify custom_dataset.py
We can get data from both web and local. I will introduce get data from local.
custom_dataset.py have three method.
_info: information of dataset features
_split_generator: read data.zip and split data ("train", "val", "test")
_generate_examples: define features example
default code
class CustomDataset(tfds.core.GeneratorBasedBuilder):
"""DatasetBuilder for custom_dataset dataset."""
VERSION = tfds.core.Version('1.0.0')
RELEASE_NOTES = {
'1.0.0': 'Initial release.',
}
def _info(self) -> tfds.core.DatasetInfo:
"""Returns the dataset metadata."""
# TODO(custom_dataset): Specifies the tfds.core.DatasetInfo object
return tfds.core.DatasetInfo(
builder=self,
description=_DESCRIPTION,
features=tfds.features.FeaturesDict({
# These are the features of your dataset like images, labels ...
'image': tfds.features.Image(shape=(None, None, 3)),
'label': tfds.features.ClassLabel(names=['no', 'yes']),
}),
# If there's a common (input, target) tuple from the
# features, specify them here. They'll be used if
# `as_supervised=True` in `builder.as_dataset`.
supervised_keys=('image', 'label'), # Set to `None` to disable
homepage='https://dataset-homepage/',
citation=_CITATION,
)
def _split_generators(self, dl_manager: tfds.download.DownloadManager):
"""Returns SplitGenerators."""
# TODO(custom_dataset): Downloads the data and defines the splits
path = dl_manager.download_and_extract('https://todo-data-url')
# TODO(custom_dataset): Returns the Dict[split names, Iterator[Key, Example]]
return {
'train': self._generate_examples(path / 'train_imgs'),
}
def _generate_examples(self, path):
"""Yields examples."""
# TODO(custom_dataset): Yields (key, example) tuples from the dataset
for f in path.glob('*.jpeg'):
yield 'key', {
'image': f,
'label': 'yes',
}
modifed_code
"""custom_dataset dataset."""
import tensorflow_datasets as tfds
from tensorflow_datasets.core.features import BBoxFeature
import xmltodict
from PIL import Image
import numpy as np
import tensorflow as tf
# TODO(custom_dataset): Markdown description that will appear on the catalog page.
_DESCRIPTION = """
Description is **formatted** as markdown.
It should also contain any processing which has been applied (if any),
(e.g. corrupted example skipped, images cropped,...):
"""
# TODO(custom_dataset): BibTeX citation
_CITATION = """
"""
class CustomDataset(tfds.core.GeneratorBasedBuilder):
MANUAL_DOWNLOAD_INSTRUCTIONS = """
data.zip files should be located at /root/tensorflow_dataset/downloads/manual
""" # modifyed
VERSION = tfds.core.Version('1.0.0')
RELEASE_NOTES = {
}
def _info(self) -> tfds.core.DatasetInfo:
"""Returns the dataset metadata."""
# TODO(custom_dataset): Specifies the tfds.core.DatasetInfo object
return tfds.core.DatasetInfo(
builder=self,
description=_DESCRIPTION,
features=tfds.features.FeaturesDict({
# These are the features of your dataset like images, labels ...
'image': tfds.features.Image(shape=(None, None, 3)), # modifyed
'objects': tfds.features.Sequence({
'bbox': tfds.features.BBoxFeature(),
'label': tfds.features.ClassLabel(names=['Choi Woo-shik',
'Kim Da-mi',
'Kim Seong-cheol',
'Kim Tae-ri',
'Nam Joo-hyuk',
'Yoo Jae-suk']), # modifyed
})
}),
# If there's a common (input, target) tuple from the
# features, specify them here. They'll be used if
# `as_supervised=True` in `builder.as_dataset`.
supervised_keys=('image', 'objects'), # Set to `None` to disable # modifyed
homepage='https://dataset-homepage/',
citation=_CITATION,
)
def _split_generators(self, dl_manager: tfds.download.DownloadManager):
"""Returns SplitGenerators."""
# TODO(custom_dataset): Downloads the data and defines the splits
archive_path = dl_manager.manual_dir / 'data.zip' # modifyed
extracted_path = dl_manager.extract(archive_path) # modifyed
# TODO(custom_dataset): Returns the Dict[split names, Iterator[Key, Example]]
return {
'train': self._generate_examples(img_path=extracted_path / 'train_images',
xml_path=extracted_path / 'train_xmls'), # modifyed
'test': self._generate_examples(img_path=extracted_path / 'test_images',
xml_path=extracted_path / 'test_xmls'), # modifyed
}
def _generate_examples(self, img_path, xml_path):
"""Yields examples."""
# TODO(custom_dataset): Yields (key, example) tuples from the dataset
for i, (img, xml) in enumerate(zip(img_path.glob('*.jpg'), xml_path.glob('*.xml'))):
yield i,{
'image': img, # modifyed
'objects': self._get_objects(xml) # modifyed
}
def _get_objects(self, xml): # custom method
data=dict()
f=open(xml)
xml_file=xmltodict.parse(f.read())
bbox=[]
label=[]
height, width = xml_file['annotation']['size']['height'], xml_file['annotation']['size']['width']
for obj in xml_file['annotation']['object']:
if type(obj)==type(dict()):
label.append(obj['name'])
x1=obj['bndbox']['xmin']
y1=obj['bndbox']['ymin']
x2=obj['bndbox']['xmax']
y2=obj['bndbox']['ymax']
y1, y2 = float(y1)/float(height), float(y2)/float(height)
x1, x2 = float(x1)/float(width), float(x2)/float(width)
bbox.append(tfds.features.BBox(ymin=y1, xmin=x1, ymax=y2, xmax=x2))
else:
if obj=='name':
label.append(xml_file['annotation']['object'][obj])
elif obj=='bndbox':
x1 = xml_file['annotation']['object'][obj]['xmin']
y1 = xml_file['annotation']['object'][obj]['ymin']
x2 = xml_file['annotation']['object'][obj]['xmax']
y2 = xml_file['annotation']['object'][obj]['ymax']
y1, y2 = float(y1)/float(height), float(y2)/float(height)
x1, x2 = float(x1)/float(width), float(x2)/float(width)
bbox.append(tfds.features.BBox(ymin=y1, xmin=x1, ymax=y2, xmax=x2))
f.close()
data['bbox']=bbox
data['label']=label
return data
If you want to see more detail or example click here
1) Define MANUAL_DOWNLOAD_INSTRUCTIONS
2) Modify _info, You can define features
3) Modify _split_generators
you have two options for manual_dir (in detail Build step)
1. leave data in default directory
2. leave data in my directory
archive_path = dl_managet.manual_dir / 'data.zip' : join manual_dir, 'data.zip'
extracted_path = dl_managet.extract(archive_path) : path of extracted data.zip
you can set dataset use dict() -> "train", "test", "val"...
4) Modify _generate_examples
Also you can set features images, bbox, classes...
But tfds use independent id, so you should give id to tfds. you can use just enumerate.
4-3. Build custom_dataset.py
cd .../custom_dataset
tfds build # use default manual_dir
# = "/.../tensorflow_datasets/downloads/manual"
# if you don't have manual, mkdir manual
# or
tfds build --manual_dir "C:\...\..." # use custom manual_dir
You can find generated dataset in /.../tensorflow_datatsets/custom_datasets/... using both two ways
5. load custom_dataset
dataset, info = tfds.load("voc/2007", split="train", data_dir="~/tensorflow_datasets", with_info=True)
dataset, info = tfds.load("custom_dataset", split="train", data_dir=".../tensorflow_datasets", with_info=True)
Voc dataset can be loaded using data_dir "~/tensorflow_datasets", but custon_dataset is not worked.
So custom_dataset need to dataset dir. Also you can use tf.io.gfile.glob.
Now you can make custom_dataset
Reference
https://www.tensorflow.org/datasets/api_docs/python/tfds/all_symbols
'Python > - Tensorflow' 카테고리의 다른 글
TFrecord, from_generator (0) | 2023.02.18 |
---|