first commit
This commit is contained in:
@@ -0,0 +1,4 @@
|
||||
venv
|
||||
__pycache__
|
||||
*.pyc
|
||||
.git
|
||||
+59
@@ -0,0 +1,59 @@
|
||||
# OS / editors
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
.idea/
|
||||
.vscode/
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
.python-version
|
||||
.mypy_cache/
|
||||
.pytest_cache/
|
||||
.ruff_cache/
|
||||
.pyre/
|
||||
.tox/
|
||||
.nox/
|
||||
.coverage
|
||||
.coverage.*
|
||||
htmlcov/
|
||||
|
||||
# Virtualenvs
|
||||
.venv/
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
logs/
|
||||
coursedoc/
|
||||
.codex-venv-docx/
|
||||
data/
|
||||
postgres.env
|
||||
|
||||
# Secrets / env files
|
||||
.env
|
||||
.env.local
|
||||
.env.*.local
|
||||
bot/.env
|
||||
menu_scraper/.env
|
||||
rag_api/.env
|
||||
|
||||
# Runtime data
|
||||
data/chroma/
|
||||
data/menu/*.json
|
||||
data/knowledge/
|
||||
|
||||
# Local databases / temp
|
||||
*.sqlite3
|
||||
*.db
|
||||
*.tmp
|
||||
tmp/
|
||||
|
||||
# Build artifacts
|
||||
build/
|
||||
dist/
|
||||
*.egg-info/
|
||||
|
||||
@@ -0,0 +1,199 @@
|
||||
# GorychBot
|
||||
|
||||
Набор сервисов для Telegram-бота шаурмечной "Горыч" с отдельным парсером меню и отдельным RAG API.
|
||||
|
||||
Проект разделён на независимые части:
|
||||
|
||||
- `tgbot` — основной Telegram-бот на `aiogram`.
|
||||
- `menu_scraper` — сервис, который парсит меню с `https://gorych34.ru/`.
|
||||
- `rag_api` — FastAPI-сервис с RAG, локальными эмбеддингами `sergeyzh/rubert-mini-frida` и function calling для подбора блюд.
|
||||
- `redisdb` — Redis для бота.
|
||||
- `postgredb` — PostgreSQL для бота.
|
||||
|
||||
## Что делает проект
|
||||
|
||||
1. `menu_scraper` забирает меню с сайта Горыча и сохраняет нормализованный JSON в `data/menu/gorich_menu.json`.
|
||||
2. `rag_api` индексирует:
|
||||
- знания о заведении: описание, контакты, режим, доставка, соцсети;
|
||||
- меню: названия, описания, ингредиенты, цены, размеры, фото.
|
||||
3. `rag_api` умеет:
|
||||
- отвечать на вопросы о заведении через RAG;
|
||||
- вызывать tool `find_menu_items` для подбора блюд по бюджету, категории и ингредиентам.
|
||||
4. `tgbot` работает отдельно от RAG API и использует PostgreSQL + Redis.
|
||||
|
||||
## Сервисы
|
||||
|
||||
### `tgbot`
|
||||
|
||||
Назначение:
|
||||
- Telegram-бот на `aiogram`.
|
||||
- Хранит данные в PostgreSQL.
|
||||
- Использует Redis для FSM/storage.
|
||||
|
||||
Файл env:
|
||||
- `bot/.env`
|
||||
|
||||
Зависимости:
|
||||
- `redisdb`
|
||||
- `postgredb`
|
||||
|
||||
Стартовая команда:
|
||||
```bash
|
||||
python aiogram_run.py
|
||||
```
|
||||
|
||||
### `menu_scraper`
|
||||
|
||||
Назначение:
|
||||
- Парсит меню с сайта Горыча.
|
||||
- Берёт встроенный каталог товаров из JSON на странице `gorych34.ru`.
|
||||
- Пишет результат в `data/menu/gorich_menu.json`.
|
||||
|
||||
Файл env:
|
||||
- `menu_scraper/.env`
|
||||
|
||||
Порт:
|
||||
- `8010`
|
||||
|
||||
Основные endpoints:
|
||||
- `GET /health`
|
||||
- `POST /scrape`
|
||||
- `GET /items`
|
||||
- `GET /items/{item_id}`
|
||||
|
||||
Пример ответа:
|
||||
- один snapshot меню с `total_items` и массивом `items`
|
||||
|
||||
### `rag_api`
|
||||
|
||||
Назначение:
|
||||
- Отдельный API для вопросов о заведении.
|
||||
- RAG по сайту, доставке, контактам и соцсетям.
|
||||
- Локальные эмбеддинги через `sergeyzh/rubert-mini-frida` на CPU.
|
||||
- Function calling через OpenRouter для подбора блюд из меню.
|
||||
|
||||
Файл env:
|
||||
- `rag_api/.env`
|
||||
|
||||
Порт:
|
||||
- внешний `8001`
|
||||
- внутренний контейнерный `8000`
|
||||
|
||||
Основные endpoints:
|
||||
- `GET /health`
|
||||
- `POST /chat`
|
||||
- `POST /admin/reindex`
|
||||
- `GET /menu/search`
|
||||
|
||||
`POST /chat` принимает:
|
||||
```json
|
||||
{
|
||||
"message": "Посоветуй что-нибудь острое из пиццы до 400 рублей",
|
||||
"history": []
|
||||
}
|
||||
```
|
||||
|
||||
`GET /menu/search` умеет:
|
||||
- `query`
|
||||
- `max_price`
|
||||
- `category`
|
||||
- `must_include`
|
||||
- `must_not_include`
|
||||
- `limit`
|
||||
|
||||
Пример:
|
||||
```bash
|
||||
curl "http://localhost:8001/menu/search?query=острая%20пицца&max_price=450&category=пицца&limit=3"
|
||||
```
|
||||
|
||||
## Структура данных
|
||||
|
||||
### `data/menu/gorich_menu.json`
|
||||
|
||||
Содержит:
|
||||
- `item_id`
|
||||
- `name`
|
||||
- `category`
|
||||
- `description`
|
||||
- `ingredients`
|
||||
- `price`
|
||||
- `price_label`
|
||||
- `size`
|
||||
- `photo_url`
|
||||
- `source_url`
|
||||
- `scraped_at`
|
||||
|
||||
### `data/chroma/`
|
||||
|
||||
Локальная база ChromaDB для RAG:
|
||||
- коллекция знаний о заведении;
|
||||
- коллекция документов меню.
|
||||
|
||||
## Настройка
|
||||
|
||||
В проекте используются отдельные env-файлы по сервисам:
|
||||
|
||||
- `bot/.env`
|
||||
- `menu_scraper/.env`
|
||||
- `rag_api/.env`
|
||||
|
||||
Примеры:
|
||||
|
||||
- `.env.example`
|
||||
- `bot/.env.example`
|
||||
- `menu_scraper/.env.example`
|
||||
- `rag_api/.env.example`
|
||||
|
||||
Минимум для запуска:
|
||||
|
||||
1. Заполнить `bot/.env`
|
||||
2. Заполнить `rag_api/.env`
|
||||
3. При необходимости поправить `menu_scraper/.env`
|
||||
|
||||
Важно:
|
||||
- `OPENROUTER_API_KEY` нужен только для `rag_api`.
|
||||
- Для OpenRouter лучше использовать модель, которая нормально работает с tools в вашем регионе. Сейчас в конфиге стоит `mistralai/mistral-medium-3-5`.
|
||||
|
||||
## Запуск
|
||||
|
||||
Поднять всё:
|
||||
|
||||
```bash
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
Проверка сервисов:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8010/health
|
||||
curl http://localhost:8001/health
|
||||
```
|
||||
|
||||
Пересобрать индекс RAG вручную:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8001/admin/reindex
|
||||
```
|
||||
|
||||
Перепарсить меню вручную:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8010/scrape
|
||||
```
|
||||
|
||||
## Что уже реализовано
|
||||
|
||||
- отдельный парсер меню;
|
||||
- сохранение меню в JSON;
|
||||
- отдельный RAG API;
|
||||
- ChromaDB;
|
||||
- локальные эмбеддинги `rubert-mini-frida`;
|
||||
- function calling для подбора блюд;
|
||||
- подбор по семантике + лексическим признакам меню;
|
||||
- Docker Compose для всех сервисов.
|
||||
|
||||
## Что важно знать
|
||||
|
||||
- `rag_api` сейчас не вшит напрямую в `tgbot`; это отдельный сервис с HTTP API.
|
||||
- `rag_api` на первом старте может подниматься дольше из-за загрузки модели эмбеддингов.
|
||||
- `data/` хранит runtime-данные и не должен коммититься в git.
|
||||
@@ -0,0 +1,48 @@
|
||||
# ========================
|
||||
# Telegram
|
||||
# ========================
|
||||
# Токен Telegram-бота от @BotFather
|
||||
TOKEN=your_telegram_bot_token
|
||||
|
||||
# Telegram user_id главного администратора
|
||||
BASE_ADMIN=123456789
|
||||
|
||||
# ========================
|
||||
# Redis
|
||||
# ========================
|
||||
# Для локального запуска можно оставить localhost.
|
||||
# В Docker Compose это значение переопределяется на redis://redisdb:6379/0
|
||||
REDIS_URL=redis://127.0.0.1:6379/0
|
||||
|
||||
# ========================
|
||||
# PostgreSQL
|
||||
# ========================
|
||||
# Для локального запуска можно оставить localhost.
|
||||
# В Docker Compose POSTGRES_HOST переопределяется на postgredb
|
||||
POSTGRES_DB=gorychbot
|
||||
POSTGRES_USER=postgres
|
||||
POSTGRES_PASSWORD=change_me
|
||||
POSTGRES_HOST=localhost
|
||||
POSTGRES_PORT=5432
|
||||
|
||||
# ========================
|
||||
# App
|
||||
# ========================
|
||||
# Таймзона для времени и дат внутри бота
|
||||
TIMEZONE=Europe/Moscow
|
||||
|
||||
# Прокси для Telegram Bot API.
|
||||
# Формат:
|
||||
# - socks5:ip:port
|
||||
# - http:ip:port
|
||||
# - socks5:ip:port:user:pass
|
||||
# - http:ip:port:user:pass
|
||||
BOT_PROXY=
|
||||
|
||||
# URL сервиса ответов.
|
||||
# Для локального запуска можно оставить localhost.
|
||||
# В Docker Compose это значение переопределяется на http://rag_api:8000
|
||||
RAG_API_URL=http://127.0.0.1:8001
|
||||
|
||||
# Таймаут запроса к RAG API в секундах
|
||||
RAG_API_TIMEOUT_SECONDS=60
|
||||
@@ -0,0 +1,14 @@
|
||||
# базовый образ Python
|
||||
FROM python:3.13-alpine
|
||||
|
||||
# рабочая директория
|
||||
WORKDIR /app
|
||||
|
||||
# файл зависимостей
|
||||
COPY bot/requirements.txt /app/
|
||||
|
||||
# устанавливаем зависимости
|
||||
RUN pip install --no-cache-dir --upgrade pip && \
|
||||
pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY ./bot /app
|
||||
@@ -0,0 +1,65 @@
|
||||
# Aiogram
|
||||
from aiogram.types.bot_command_scope_all_private_chats import (
|
||||
BotCommandScopeAllPrivateChats,
|
||||
)
|
||||
|
||||
# Bot
|
||||
from create_bot import bot, dp, start_command, orm
|
||||
|
||||
# Entry
|
||||
from handlers.start import start_router, types
|
||||
from handlers.admin.main import admin_main_router
|
||||
|
||||
# Client handlers
|
||||
from handlers.client import client_main_router
|
||||
|
||||
# Admin handlers
|
||||
from handlers.admin.list_of_users import list_of_users_router
|
||||
from handlers.admin.statistic import admin_statistic_router
|
||||
from handlers.admin.management import admin_management_router
|
||||
from handlers.admin.mailer import admin_mailer_router
|
||||
from handlers.admin.settings import admin_settings_router
|
||||
from handlers.admin.blacklist import admin_blacklist_router
|
||||
|
||||
# middlewares
|
||||
from middlewares.users_control import *
|
||||
from middlewares.album import AlbumMiddleware
|
||||
|
||||
# Another
|
||||
from decouple import config
|
||||
from uvloop import run
|
||||
|
||||
|
||||
async def main():
|
||||
|
||||
await orm.proceed_schemas()
|
||||
await bot.set_my_commands(start_command, scope=BotCommandScopeAllPrivateChats())
|
||||
await orm.create_admin(int(config("BASE_ADMIN")), "base_admin", "base_admin")
|
||||
|
||||
dp.message.middleware(BlacklistMiddleware())
|
||||
dp.callback_query.middleware(BlacklistMiddleware())
|
||||
dp.message.middleware(AntiFloodMiddleware())
|
||||
dp.message.middleware(AlbumMiddleware())
|
||||
|
||||
# ENTRY POINTS
|
||||
dp.include_routers(start_router, admin_main_router)
|
||||
|
||||
# CLIENT
|
||||
dp.include_routers(client_main_router)
|
||||
|
||||
# ADMIN
|
||||
dp.include_routers(
|
||||
list_of_users_router,
|
||||
admin_statistic_router,
|
||||
admin_management_router,
|
||||
admin_mailer_router,
|
||||
admin_settings_router,
|
||||
admin_blacklist_router,
|
||||
)
|
||||
|
||||
# await bot.delete_webhook(drop_pending_updates = True)
|
||||
await dp.start_polling(bot)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run(main())
|
||||
@@ -0,0 +1,41 @@
|
||||
# aiogram
|
||||
from aiogram import Bot, Dispatcher
|
||||
from aiogram.client.default import DefaultBotProperties
|
||||
from aiogram.enums import ParseMode
|
||||
from aiogram.fsm.storage.redis import RedisStorage, DefaultKeyBuilder, StorageKey
|
||||
from aiogram.types import BotCommand
|
||||
|
||||
# cfg
|
||||
from decouple import config
|
||||
|
||||
# db
|
||||
from database.orm import ORM
|
||||
|
||||
# utils
|
||||
from utils.proxy import build_bot_session
|
||||
|
||||
# another
|
||||
import logging, pytz
|
||||
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
redis_url = config("REDIS_URL")
|
||||
bot_session = build_bot_session()
|
||||
bot = Bot(
|
||||
token=config("TOKEN"),
|
||||
session=bot_session,
|
||||
default=DefaultBotProperties(
|
||||
parse_mode=ParseMode.HTML,
|
||||
link_preview_is_disabled=True,
|
||||
),
|
||||
)
|
||||
storage = RedisStorage.from_url(redis_url)
|
||||
storage.key_builder = DefaultKeyBuilder(with_bot_id=True)
|
||||
dp = Dispatcher(storage=storage)
|
||||
|
||||
start_command = [BotCommand(command="/start", description="🔄 Перезапустить бота")]
|
||||
tz = pytz.timezone(config("TIMEZONE"))
|
||||
orm = ORM()
|
||||
@@ -0,0 +1,57 @@
|
||||
# sqlalchemy
|
||||
from sqlalchemy.orm import declarative_base
|
||||
from sqlalchemy import (
|
||||
Column,
|
||||
Integer,
|
||||
String,
|
||||
BIGINT,
|
||||
VARCHAR,
|
||||
Boolean,
|
||||
DateTime,
|
||||
SmallInteger,
|
||||
ARRAY,
|
||||
DOUBLE_PRECISION,
|
||||
Enum,
|
||||
)
|
||||
from sqlalchemy.dialects.postgresql import JSONB
|
||||
|
||||
# types
|
||||
from database.db_types import *
|
||||
|
||||
|
||||
# init baseModel
|
||||
BaseModel = declarative_base()
|
||||
|
||||
|
||||
class User(BaseModel):
|
||||
|
||||
__tablename__ = "users"
|
||||
|
||||
user_id = Column(BIGINT, primary_key=True)
|
||||
username = Column(VARCHAR(33), nullable=True)
|
||||
fullname = Column(VARCHAR(128), nullable=False)
|
||||
register_date = Column(DateTime(timezone=True), nullable=False)
|
||||
|
||||
|
||||
class Admin(BaseModel):
|
||||
|
||||
__tablename__ = "admins"
|
||||
|
||||
user_id = Column(BIGINT, primary_key=True)
|
||||
username = Column(VARCHAR(33), nullable=True)
|
||||
fullname = Column(VARCHAR(128), nullable=False)
|
||||
|
||||
|
||||
class Blacklist(BaseModel):
|
||||
|
||||
__tablename__ = "blacklist"
|
||||
|
||||
user_id = Column(BIGINT, primary_key=True)
|
||||
|
||||
|
||||
class Setting(BaseModel):
|
||||
|
||||
__tablename__ = "settings"
|
||||
|
||||
name = Column(String, primary_key=True)
|
||||
value = Column(JSONB, nullable=True)
|
||||
@@ -0,0 +1,16 @@
|
||||
import enum
|
||||
|
||||
|
||||
# class Type(enum.Enum):
|
||||
# FIELD1 = "field1"
|
||||
# FIELD2 = "field2"
|
||||
|
||||
# @classmethod
|
||||
# def from_string(cls, value: str):
|
||||
# for item in cls:
|
||||
# if item.value == value:
|
||||
# return item
|
||||
# raise ValueError(f"{value} is not a valid Type")
|
||||
|
||||
# def __str__(self):
|
||||
# return self.value
|
||||
@@ -0,0 +1,18 @@
|
||||
# sqlalchemy imports
|
||||
from sqlalchemy.engine import URL
|
||||
from sqlalchemy.ext.asyncio import AsyncEngine, AsyncSession
|
||||
from sqlalchemy.ext.asyncio import create_async_engine as _create_async_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
# another
|
||||
from typing import Union
|
||||
|
||||
|
||||
def create_async_engine(url: Union[URL, str]) -> AsyncEngine:
|
||||
|
||||
return _create_async_engine(url=url, pool_pre_ping=True, pool_recycle=3600)
|
||||
|
||||
|
||||
def get_session_maker(engine: AsyncEngine) -> AsyncSession:
|
||||
|
||||
return sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
|
||||
@@ -0,0 +1,229 @@
|
||||
# sqlalchemy import
|
||||
from sqlalchemy import update, select, delete, func
|
||||
|
||||
# Database engine
|
||||
from database.engine import create_async_engine, get_session_maker
|
||||
|
||||
# DB Models
|
||||
from database.db_models import *
|
||||
|
||||
# Config
|
||||
from decouple import config
|
||||
|
||||
# Another
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
|
||||
|
||||
class ORM:
|
||||
|
||||
def __init__(self):
|
||||
self.async_engine = create_async_engine(
|
||||
url=f"postgresql+asyncpg://{config('POSTGRES_USER')}:{config('POSTGRES_PASSWORD')}@{config('POSTGRES_HOST')}:{config('POSTGRES_PORT')}/{config('POSTGRES_DB')}"
|
||||
)
|
||||
self.session_maker = get_session_maker(self.async_engine)
|
||||
|
||||
async def proceed_schemas(self) -> None:
|
||||
async with self.async_engine.begin() as conn:
|
||||
await conn.run_sync(BaseModel.metadata.create_all)
|
||||
|
||||
# *############################
|
||||
# *# USERS #
|
||||
# *############################
|
||||
|
||||
async def is_user_exists(self, user_id: int) -> bool:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.execute(
|
||||
select(User.user_id).where(User.user_id == user_id)
|
||||
)
|
||||
|
||||
return query.one_or_none() is not None
|
||||
|
||||
async def create_user(
|
||||
self, user_id: int, username: str, fullname: str, register_date: datetime
|
||||
) -> int:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
if not await self.is_user_exists(user_id):
|
||||
user = User(
|
||||
user_id=user_id,
|
||||
username=username,
|
||||
fullname=fullname,
|
||||
register_date=register_date,
|
||||
)
|
||||
|
||||
session.add(user)
|
||||
await session.flush()
|
||||
return user.user_id
|
||||
else:
|
||||
return
|
||||
|
||||
async def set_users_field(
|
||||
self, user_id: int, field: str, value: int | str | bool
|
||||
) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
await session.execute(
|
||||
update(User)
|
||||
.where(User.user_id == user_id)
|
||||
.values({getattr(User, field): value})
|
||||
)
|
||||
|
||||
async def get_user(self, user_id: int) -> User:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(
|
||||
select(User).where(User.user_id == user_id)
|
||||
)
|
||||
|
||||
return query.one_or_none()
|
||||
|
||||
async def get_all_users(self) -> list[User]:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(select(User))
|
||||
|
||||
return query.all()
|
||||
|
||||
async def get_users_count(self) -> int:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(select(func.count()).select_from(User))
|
||||
|
||||
return query.one_or_none()
|
||||
|
||||
async def get_all_user_ids(self) -> list[int]:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(select(User.user_id))
|
||||
|
||||
return query.all()
|
||||
|
||||
# *############################
|
||||
# *# ADMINS #
|
||||
# *############################
|
||||
|
||||
async def is_admin_exists(self, user_id: int) -> bool:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.execute(
|
||||
select(Admin.user_id).where(Admin.user_id == user_id)
|
||||
)
|
||||
|
||||
return query.one_or_none() is not None
|
||||
|
||||
async def create_admin(self, user_id: int, username: str, fullname: str) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
admin = Admin(user_id=user_id, username=username, fullname=fullname)
|
||||
|
||||
await session.merge(admin)
|
||||
|
||||
async def get_admin(self, user_id: int) -> Admin:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(
|
||||
select(Admin).where(Admin.user_id == user_id)
|
||||
)
|
||||
|
||||
return query.one_or_none()
|
||||
|
||||
async def get_all_admins(self) -> list[Admin]:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(select(Admin))
|
||||
|
||||
return query.all()
|
||||
|
||||
async def delete_admin(self, user_id: int) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
await session.execute(delete(Admin).where(Admin.user_id == user_id))
|
||||
|
||||
async def set_admin_field(
|
||||
self, user_id: int, field: str, value: int | str | bool
|
||||
) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
await session.execute(
|
||||
update(Admin)
|
||||
.where(Admin.user_id == user_id)
|
||||
.values({getattr(Admin, field): value})
|
||||
)
|
||||
|
||||
# *############################
|
||||
# *# SETTINGS #
|
||||
# *############################
|
||||
|
||||
async def is_setting_exists(self, name: str) -> bool:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.execute(
|
||||
select(Setting).where(Setting.name == name)
|
||||
)
|
||||
|
||||
return query.one_or_none() is not None
|
||||
|
||||
async def create_setting(self, name: str, value: Any) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
setting = Setting(name=name, value=value)
|
||||
|
||||
await session.merge(setting)
|
||||
|
||||
async def init_settings(self) -> None: ...
|
||||
|
||||
async def get_setting_value(self, name: str) -> Any:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(
|
||||
select(Setting.value).where(Setting.name == name)
|
||||
)
|
||||
|
||||
return query.one_or_none()
|
||||
|
||||
async def update_setting_value(self, name: str, value: dict | list) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
await session.execute(
|
||||
update(Setting)
|
||||
.where(Setting.name == name)
|
||||
.values({getattr(Setting, "value"): value})
|
||||
)
|
||||
|
||||
# *############################
|
||||
# *# BLACKLIST #
|
||||
# *############################
|
||||
|
||||
async def is_blacklisted(self, user_id: int) -> bool:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.execute(
|
||||
select(Blacklist).where(Blacklist.user_id == user_id)
|
||||
)
|
||||
|
||||
return query.one_or_none() is not None
|
||||
|
||||
async def create_blacklist(self, user_id: int) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
blacklist = Blacklist(user_id=user_id)
|
||||
|
||||
await session.merge(blacklist)
|
||||
|
||||
async def get_all_blacklist(self) -> list[int]:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
query = await session.scalars(
|
||||
select(Blacklist.user_id).order_by(Blacklist.user_id)
|
||||
)
|
||||
|
||||
return query.all()
|
||||
|
||||
async def delete_blacklist(self, user_id: int) -> None:
|
||||
async with self.session_maker() as session:
|
||||
async with session.begin():
|
||||
await session.execute(
|
||||
delete(Blacklist).where(Blacklist.user_id == user_id)
|
||||
)
|
||||
@@ -0,0 +1,221 @@
|
||||
# Aiogram
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram import Router, F
|
||||
from aiogram.exceptions import TelegramBadRequest
|
||||
|
||||
# Const
|
||||
from create_bot import orm
|
||||
|
||||
# Keyboards
|
||||
from keyboards.admin.main_kbs import *
|
||||
|
||||
# States
|
||||
from states.admin_states import AdminStates, AdminBlacklistStates
|
||||
|
||||
# Another
|
||||
from contextlib import suppress
|
||||
|
||||
|
||||
# Init
|
||||
admin_blacklist_router = Router()
|
||||
|
||||
|
||||
@admin_blacklist_router.message(
|
||||
F.text == "🚫 Черный список", StateFilter(AdminStates.main)
|
||||
)
|
||||
@admin_blacklist_router.message(F.text == "↩️ Назад", StateFilter(AdminBlacklistStates))
|
||||
async def cmd_blacklist(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "🚫 Выберите действие:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_blacklist_kb())
|
||||
|
||||
await state.set_state(AdminBlacklistStates.main)
|
||||
|
||||
|
||||
# *############################
|
||||
# *# ADD #
|
||||
# *############################
|
||||
|
||||
|
||||
@admin_blacklist_router.message(
|
||||
F.text == "➕ Добавить", StateFilter(AdminBlacklistStates.main)
|
||||
)
|
||||
async def cmd_blacklist_add(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = f"➕ Введите User ID:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_back_kb())
|
||||
|
||||
await state.set_state(AdminBlacklistStates.add_blacklist)
|
||||
|
||||
|
||||
@admin_blacklist_router.message(F.text, StateFilter(AdminBlacklistStates.add_blacklist))
|
||||
async def cmd_blacklist_add_finish(message: types.Message, state: FSMContext):
|
||||
|
||||
# validation
|
||||
if not message.text.isdigit():
|
||||
await message.answer(
|
||||
text="⛔️ Только цифры! Повторите попытку:", reply_markup=get_back_kb()
|
||||
)
|
||||
return
|
||||
|
||||
user_id = int(message.text)
|
||||
|
||||
if not await orm.is_user_exists(user_id):
|
||||
await message.answer(
|
||||
text="⛔️ Пользователь не существует в БД! Повторите попытку:",
|
||||
reply_markup=get_back_kb(),
|
||||
)
|
||||
return
|
||||
|
||||
await orm.create_blacklist(user_id=user_id)
|
||||
|
||||
await message.answer(text=f"✅ Черный список обновлен!")
|
||||
await cmd_blacklist(message, state)
|
||||
|
||||
|
||||
# *############################
|
||||
# *# DEL #
|
||||
# *############################
|
||||
|
||||
|
||||
@admin_blacklist_router.message(
|
||||
F.text == "➖ Удалить", StateFilter(AdminBlacklistStates.main)
|
||||
)
|
||||
async def cmd_blacklist_delete(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "➖ Введите User ID:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_back_kb())
|
||||
|
||||
await state.set_state(AdminBlacklistStates.del_blacklist)
|
||||
|
||||
|
||||
@admin_blacklist_router.message(F.text, StateFilter(AdminBlacklistStates.del_blacklist))
|
||||
async def cmd_blacklist_delete_finish(message: types.Message, state: FSMContext):
|
||||
|
||||
# validation
|
||||
if not message.text.isdigit():
|
||||
await message.answer(
|
||||
text="⛔️ Только цифры! Повторите попытку:", reply_markup=get_back_kb()
|
||||
)
|
||||
return
|
||||
|
||||
user_id = int(message.text)
|
||||
|
||||
if not await orm.is_blacklisted(user_id):
|
||||
await message.answer(
|
||||
text="⛔️ Пользователь не найден в ЧС! Повторите попытку:",
|
||||
reply_markup=get_back_kb(),
|
||||
)
|
||||
return
|
||||
|
||||
await orm.delete_blacklist(user_id=user_id)
|
||||
|
||||
await message.answer(text=f"✅ Черный список обновлен!")
|
||||
|
||||
await cmd_blacklist(message, state)
|
||||
|
||||
|
||||
# *############################
|
||||
# *# LIST #
|
||||
# *############################
|
||||
|
||||
|
||||
@admin_blacklist_router.message(
|
||||
F.text == "👁 Открыть список", StateFilter(AdminBlacklistStates.main)
|
||||
)
|
||||
async def cmd_blacklist_list(message: types.Message, state: FSMContext):
|
||||
|
||||
await state.update_data(blacklist_offset=0)
|
||||
items = await orm.get_all_blacklist()
|
||||
|
||||
if not items:
|
||||
await message.answer(text="💭 Список пуст.")
|
||||
return
|
||||
|
||||
offset = 0
|
||||
max_offset = len(items) // 10 + (1 if len(items) % 10 != 0 else 0)
|
||||
|
||||
msg_text = f"<b>🚫 Черный список {offset + 1}/{max_offset}</b>\n\n"
|
||||
|
||||
for item in items[offset * 10 : (offset + 1) * 10]:
|
||||
msg_text += f"✦ <code>{item}</code>\n"
|
||||
|
||||
await message.answer(
|
||||
text=msg_text,
|
||||
reply_markup=get_bookList_ikb(
|
||||
prefix="admin_blacklist",
|
||||
offset=0,
|
||||
max_offset=max_offset,
|
||||
items=[],
|
||||
element_col=10,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
async def cmd_blacklist_list_query(query: types.CallbackQuery, state: FSMContext):
|
||||
|
||||
data = await state.get_data()
|
||||
offset = data.get("blacklist_offset")
|
||||
items = await orm.get_all_blacklist()
|
||||
|
||||
if not items:
|
||||
await query.answer(text="💭 Список пуст.")
|
||||
return
|
||||
|
||||
max_offset = len(items) // 10 + (1 if len(items) % 10 != 0 else 0)
|
||||
|
||||
if offset < 0:
|
||||
offset = max_offset - 1
|
||||
await state.update_data(blacklist_offset=offset)
|
||||
elif offset >= max_offset:
|
||||
offset = 0
|
||||
await state.update_data(blacklist_offset=offset)
|
||||
|
||||
msg_text = f"<b>🚫 Черный список {offset + 1}/{max_offset}</b>\n\n"
|
||||
|
||||
for item in items[offset * 10 : (offset + 1) * 10]:
|
||||
msg_text += f"✦ <code>{item}</code>\n"
|
||||
|
||||
with suppress(TelegramBadRequest):
|
||||
await query.message.edit_text(
|
||||
text=msg_text,
|
||||
reply_markup=get_bookList_ikb(
|
||||
prefix="admin_blacklist",
|
||||
offset=offset,
|
||||
max_offset=max_offset,
|
||||
items=[],
|
||||
element_col=10,
|
||||
),
|
||||
)
|
||||
|
||||
await query.answer()
|
||||
|
||||
|
||||
@admin_blacklist_router.callback_query(
|
||||
F.data == "admin_blacklist_next", StateFilter(AdminBlacklistStates.main)
|
||||
)
|
||||
@admin_blacklist_router.callback_query(
|
||||
F.data == "admin_blacklist_prev", StateFilter(AdminBlacklistStates.main)
|
||||
)
|
||||
@admin_blacklist_router.callback_query(
|
||||
F.data == "admin_blacklist_status", StateFilter(AdminBlacklistStates.main)
|
||||
)
|
||||
async def cmd_blacklist_list_actions(query: types.CallbackQuery, state: FSMContext):
|
||||
|
||||
state_data = await state.get_data()
|
||||
|
||||
if query.data.endswith("next"):
|
||||
await state.update_data(
|
||||
blacklist_offset=state_data.get("blacklist_offset", 0) + 1
|
||||
)
|
||||
elif query.data.endswith("prev"):
|
||||
await state.update_data(
|
||||
blacklist_offset=state_data.get("blacklist_offset", 0) - 1
|
||||
)
|
||||
|
||||
await cmd_blacklist_list_query(query, state)
|
||||
@@ -0,0 +1,53 @@
|
||||
# Aiogram
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram import Router, F
|
||||
|
||||
# Const
|
||||
from create_bot import tz, orm
|
||||
|
||||
# States
|
||||
from states.admin_states import AdminStates
|
||||
|
||||
# Another
|
||||
import shutil, os
|
||||
from openpyxl import load_workbook
|
||||
|
||||
|
||||
# Init
|
||||
list_of_users_router = Router()
|
||||
|
||||
|
||||
@list_of_users_router.message(
|
||||
F.text == "📑 Список пользователей", StateFilter(AdminStates.main)
|
||||
)
|
||||
async def cmd_list_of_users(message: types.Message, state: FSMContext):
|
||||
|
||||
# copy the table
|
||||
table_path = shutil.copy(
|
||||
src="templates/users.xlsx", dst=f"templates/users_list.xlsx"
|
||||
)
|
||||
|
||||
# load table
|
||||
book = load_workbook(filename=table_path)
|
||||
sheet = book["users"]
|
||||
|
||||
all_clients = await orm.get_all_users()
|
||||
|
||||
for row, user in enumerate(all_clients, 2):
|
||||
sheet.cell(row=row, column=1, value=user.user_id)
|
||||
sheet.cell(row=row, column=2, value=user.username)
|
||||
sheet.cell(row=row, column=3, value=user.fullname)
|
||||
sheet.cell(
|
||||
row=row,
|
||||
column=4,
|
||||
value=user.register_date.astimezone(tz).strftime(r"%d-%m-%y %H:%M %Z"),
|
||||
)
|
||||
|
||||
book.save(table_path)
|
||||
|
||||
await message.answer_document(document=types.FSInputFile(table_path))
|
||||
|
||||
if os.path.exists(table_path):
|
||||
os.remove(table_path)
|
||||
@@ -0,0 +1,129 @@
|
||||
# Aiogram imports
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram import Router, F
|
||||
|
||||
# Const
|
||||
from create_bot import bot, orm
|
||||
|
||||
# Keyboards
|
||||
from keyboards.admin.mailer_kbs import *
|
||||
|
||||
# Utils
|
||||
from utils.text_tools import parse_links_to_inline_markup
|
||||
|
||||
# States
|
||||
from states.admin_states import AdminStates, AdminMailerStates
|
||||
|
||||
# Funcs
|
||||
from handlers.admin.main import show_admin_menu
|
||||
|
||||
|
||||
admin_mailer_router = Router()
|
||||
|
||||
|
||||
@admin_mailer_router.message(F.text == "✉️ Рассылка", StateFilter(AdminStates.main))
|
||||
@admin_mailer_router.message(F.text == "↩️ Назад", StateFilter(AdminMailerStates))
|
||||
async def process_mailer_post(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "✉️ Отправьте пост одним сообщением:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_back_to_main_kb())
|
||||
|
||||
await state.set_state(AdminMailerStates.post)
|
||||
|
||||
|
||||
@admin_mailer_router.message(StateFilter(AdminMailerStates.post))
|
||||
async def process_mailer_ikb(message: types.Message, state: FSMContext):
|
||||
|
||||
await state.update_data(admin_mailer_post=message.message_id)
|
||||
|
||||
msg_text = """✉️ Введите кнопки:
|
||||
|
||||
<blockquote>Отправьте ссылку(и) в формате:
|
||||
[Текст кнопки + ссылка]
|
||||
Пример:
|
||||
[Переводчик + https://t.me/TransioBot]
|
||||
|
||||
Чтобы добавить несколько кнопок в один ряд, пишите ссылки рядом с предыдущими.
|
||||
Формат:
|
||||
[Первый текст + первая ссылка][Второй текст + вторая ссылка]
|
||||
|
||||
Чтобы добавить несколько кнопок в строчку, пишите новые ссылки с новой строки.
|
||||
Формат:
|
||||
[Первый текст + первая ссылка]
|
||||
[Второй текст + вторая ссылка]</blockquote>"""
|
||||
|
||||
await message.answer(
|
||||
text=msg_text, reply_markup=get_skip_kb(), disable_web_page_preview=True
|
||||
)
|
||||
|
||||
await state.set_state(AdminMailerStates.ikb)
|
||||
|
||||
|
||||
@admin_mailer_router.message(F.text, StateFilter(AdminMailerStates.ikb))
|
||||
async def process_mailer_preview(message: types.Message, state: FSMContext):
|
||||
|
||||
ikb = (
|
||||
parse_links_to_inline_markup(message.text)
|
||||
if message.text != "↪️ Пропустить"
|
||||
else None
|
||||
)
|
||||
await state.update_data(admin_mailer_ikb=ikb)
|
||||
|
||||
state_data = await state.get_data()
|
||||
post = state_data.get("admin_mailer_post")
|
||||
|
||||
await message.answer(text="✉️ Предпросмотр:", reply_markup=get_mailer_finish_kb())
|
||||
|
||||
try:
|
||||
await bot.copy_message(
|
||||
chat_id=message.from_user.id,
|
||||
from_chat_id=message.from_user.id,
|
||||
message_id=post,
|
||||
reply_markup=get_mailer_btn_ikb(buttons_preset=ikb),
|
||||
)
|
||||
except:
|
||||
await message.answer(text="🔴 Ошибка!")
|
||||
await process_mailer_post(message, state)
|
||||
return
|
||||
|
||||
await state.set_state(AdminMailerStates.preview)
|
||||
|
||||
|
||||
@admin_mailer_router.message(
|
||||
F.text == "🟢 Начать рассылку", StateFilter(AdminMailerStates.preview)
|
||||
)
|
||||
async def process_mailer_finish(message: types.Message, state: FSMContext):
|
||||
|
||||
state_data = await state.get_data()
|
||||
ikb = state_data.get("admin_mailer_ikb")
|
||||
post = state_data.get("admin_mailer_post")
|
||||
|
||||
all_users = await orm.get_all_user_ids()
|
||||
|
||||
# info
|
||||
await message.answer(text="▶️✉️ Рассылка запущена...")
|
||||
|
||||
await state.clear()
|
||||
|
||||
# back to main menu
|
||||
await show_admin_menu(message, state)
|
||||
|
||||
counter = 0
|
||||
for user_id in all_users:
|
||||
try:
|
||||
await bot.copy_message(
|
||||
chat_id=user_id,
|
||||
from_chat_id=message.from_user.id,
|
||||
message_id=post,
|
||||
reply_markup=get_mailer_btn_ikb(buttons_preset=ikb),
|
||||
)
|
||||
counter += 1
|
||||
except:
|
||||
pass
|
||||
|
||||
await message.answer(
|
||||
text=f"✅ Рассылка завершена! Сообщение отправлено {counter}/{len(all_users)}."
|
||||
)
|
||||
@@ -0,0 +1,67 @@
|
||||
# Aiogram
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import Command, StateFilter
|
||||
from aiogram import Router, F
|
||||
|
||||
# Const
|
||||
from create_bot import orm
|
||||
|
||||
# Keyboards
|
||||
from keyboards.admin.main_kbs import *
|
||||
|
||||
# States
|
||||
from states.admin_states import (
|
||||
AdminStates,
|
||||
AdminMailerStates,
|
||||
AdminManagementStates,
|
||||
AdminSettingsStates,
|
||||
AdminBlacklistStates,
|
||||
)
|
||||
|
||||
# Funcs
|
||||
from handlers.start import cmd_start
|
||||
|
||||
|
||||
# Init
|
||||
admin_main_router = Router()
|
||||
|
||||
|
||||
@admin_main_router.message(Command("admin"), StateFilter("*"))
|
||||
async def cmd_login_as_admin(message: types.Message, state: FSMContext):
|
||||
|
||||
if message.chat.type != "private":
|
||||
return
|
||||
|
||||
is_admin_exists = await orm.is_admin_exists(user_id=message.from_user.id)
|
||||
|
||||
if is_admin_exists:
|
||||
await show_admin_menu(message, state)
|
||||
else:
|
||||
await message.answer(text="🤨")
|
||||
|
||||
|
||||
@admin_main_router.message(F.text == "🔚 Выйти", StateFilter(AdminStates.main))
|
||||
async def cmd_admin_exit(message: types.Message, state: FSMContext):
|
||||
|
||||
await message.answer(text="🚪⠀", reply_markup=types.ReplyKeyboardRemove())
|
||||
|
||||
await cmd_start(message, state)
|
||||
|
||||
|
||||
@admin_main_router.message(
|
||||
F.text == "↩️ Вернуться в меню",
|
||||
StateFilter(
|
||||
AdminManagementStates.main,
|
||||
AdminMailerStates.post,
|
||||
AdminSettingsStates.main,
|
||||
AdminBlacklistStates.main,
|
||||
),
|
||||
)
|
||||
async def show_admin_menu(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "👮♂️ Вы находитесь в админ-панели"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_main_menu_kb())
|
||||
|
||||
await state.set_state(AdminStates.main)
|
||||
@@ -0,0 +1,142 @@
|
||||
# Aiogram imports
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram import Router, F
|
||||
from aiogram.exceptions import TelegramBadRequest
|
||||
|
||||
# Const
|
||||
from create_bot import bot, storage, StorageKey, orm
|
||||
|
||||
# Keyboards
|
||||
from keyboards.admin.main_kbs import *
|
||||
|
||||
# States
|
||||
from states.admin_states import AdminStates, AdminManagementStates
|
||||
|
||||
# Config
|
||||
from decouple import config
|
||||
|
||||
# Another
|
||||
from contextlib import suppress
|
||||
|
||||
|
||||
# Init
|
||||
admin_management_router = Router()
|
||||
|
||||
|
||||
@admin_management_router.message(
|
||||
F.text == "👮♂️ Управление админами", StateFilter(AdminStates.main)
|
||||
)
|
||||
@admin_management_router.message(
|
||||
F.text == "↩️ Назад", StateFilter(AdminManagementStates)
|
||||
)
|
||||
async def cmd_management(message: types.Message, state: FSMContext):
|
||||
|
||||
admins = await orm.get_all_admins()
|
||||
|
||||
msg_text = "<i>👮♂️ Действующие администраторы</i>\n"
|
||||
|
||||
for admin in admins:
|
||||
msg_text += f"✦ [<code>{admin.user_id}</code>]: {admin.username if admin.username else admin.fullname}\n"
|
||||
|
||||
msg_text += f"\n<b>🔽 Выберите действие:</b>"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_add_admins_kb())
|
||||
|
||||
await state.set_state(AdminManagementStates.main)
|
||||
|
||||
|
||||
# *############################
|
||||
# *# ADD #
|
||||
# *############################
|
||||
|
||||
|
||||
@admin_management_router.message(
|
||||
F.text == "➕ Добавить", StateFilter(AdminManagementStates.main)
|
||||
)
|
||||
async def cmd_management_add_id(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "➕ Введите User ID нового админа:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_back_kb())
|
||||
|
||||
await state.set_state(AdminManagementStates.add_admin)
|
||||
|
||||
|
||||
@admin_management_router.message(F.text, StateFilter(AdminManagementStates.add_admin))
|
||||
async def cmd_management_add_finish(message: types.Message, state: FSMContext):
|
||||
|
||||
# validation
|
||||
if not message.text.isdigit():
|
||||
await message.answer(
|
||||
text="⛔️ Только цифры! Повторите попытку:", reply_markup=get_back_kb()
|
||||
)
|
||||
return
|
||||
|
||||
user_id = int(message.text)
|
||||
|
||||
if not await orm.is_user_exists(user_id):
|
||||
await message.answer(
|
||||
text="⛔️ Пользователь не существует в БД! Повторите попытку:",
|
||||
reply_markup=get_back_kb(),
|
||||
)
|
||||
return
|
||||
|
||||
user = await orm.get_user(user_id)
|
||||
await orm.create_admin(user.user_id, user.username, user.fullname)
|
||||
await message.answer("✅ Успешно!")
|
||||
await cmd_management(message, state)
|
||||
|
||||
|
||||
# *############################
|
||||
# *# DELETE #
|
||||
# *############################
|
||||
|
||||
|
||||
@admin_management_router.message(
|
||||
F.text == "➖ Удалить", StateFilter(AdminManagementStates.main)
|
||||
)
|
||||
async def cmd_management_delete(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "➖ Введите ID админа для удаления:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_back_kb())
|
||||
|
||||
await state.set_state(AdminManagementStates.del_admin)
|
||||
|
||||
|
||||
@admin_management_router.message(F.text, StateFilter(AdminManagementStates.del_admin))
|
||||
async def cmd_management_delete_finish(message: types.Message, state: FSMContext):
|
||||
|
||||
# validation
|
||||
if not message.text.isdigit():
|
||||
await message.answer(text="⛔️ Только цифры! Повторите попытку:")
|
||||
return
|
||||
|
||||
user_id = int(message.text)
|
||||
|
||||
if user_id == int(config("BASE_ADMIN")):
|
||||
await message.answer(
|
||||
text="⛔️ Отказано! Повторите попытку:", reply_markup=get_back_kb()
|
||||
)
|
||||
return
|
||||
|
||||
if not await orm.is_admin_exists(user_id):
|
||||
await message.answer(text="⛔️ Админ не найден! Повторите попытку:")
|
||||
return
|
||||
|
||||
# change admin state
|
||||
with suppress(TelegramBadRequest):
|
||||
await bot.send_message(
|
||||
chat_id=user_id,
|
||||
text="☹️ Вы больше не являетесь админом!",
|
||||
reply_markup=types.ReplyKeyboardRemove(),
|
||||
)
|
||||
|
||||
await storage.set_state(
|
||||
key=StorageKey(bot_id=bot.id, chat_id=user_id, user_id=user_id), state=None
|
||||
)
|
||||
await orm.delete_admin(user_id)
|
||||
await message.answer("✅ Успешно!")
|
||||
await cmd_management(message, state)
|
||||
@@ -0,0 +1,85 @@
|
||||
# Aiogram imports
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram import Router, F
|
||||
|
||||
# Const
|
||||
from create_bot import orm
|
||||
|
||||
# Keyboards
|
||||
from keyboards.admin.main_kbs import *
|
||||
|
||||
# States
|
||||
from states.admin_states import AdminStates, AdminSettingsStates
|
||||
|
||||
|
||||
# Init
|
||||
admin_settings_router = Router()
|
||||
|
||||
|
||||
@admin_settings_router.message(F.text == "↩️ Назад", StateFilter(AdminSettingsStates))
|
||||
@admin_settings_router.message(F.text == "⚙️ Настройки", StateFilter(AdminStates.main))
|
||||
async def cmd_settings(message: types.Message, state: FSMContext):
|
||||
|
||||
msg_text = "⚙️ Выберите, что хотите изменить:"
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_settings_kb())
|
||||
|
||||
await state.set_state(AdminSettingsStates.main)
|
||||
|
||||
|
||||
# *############################
|
||||
# *# EDIT PHOTO #
|
||||
# *############################
|
||||
|
||||
|
||||
@admin_settings_router.message(
|
||||
F.text.in_({"🖼 ..."}), StateFilter(AdminSettingsStates.main)
|
||||
)
|
||||
async def cmd_edit_photo(message: types.Message, state: FSMContext):
|
||||
|
||||
x = {"🖼 ...": "..."}
|
||||
|
||||
setting_key = x.get(message.text)
|
||||
await state.update_data(setting_key=setting_key)
|
||||
photo = await orm.get_setting_value(setting_key)
|
||||
|
||||
msg_text = f"""<b>Текущее значение:</b>
|
||||
<blockquote>{photo}</blockquote>
|
||||
|
||||
⌨️ Отправьте фото для изменения:"""
|
||||
|
||||
if photo:
|
||||
await message.answer_photo(
|
||||
photo=photo, caption=msg_text, reply_markup=get_back_kb()
|
||||
)
|
||||
else:
|
||||
await message.answer(text=msg_text, reply_markup=get_back_kb())
|
||||
|
||||
await state.set_state(AdminSettingsStates.edit_photo)
|
||||
|
||||
|
||||
@admin_settings_router.message(F.photo, StateFilter(AdminSettingsStates.edit_photo))
|
||||
async def cmd_edit_photo_setup(message: types.Message, state: FSMContext):
|
||||
|
||||
photo = message.photo[-1].file_id
|
||||
|
||||
state_data = await state.get_data()
|
||||
setting_key = state_data.get("setting_key")
|
||||
|
||||
await orm.update_setting_value(setting_key, photo)
|
||||
|
||||
msg_text = f"""<b>Текущее значение:</b>
|
||||
<blockquote>{photo}</blockquote>
|
||||
|
||||
⌨️ Отправьте фото для изменения:"""
|
||||
|
||||
if photo:
|
||||
await message.answer_photo(
|
||||
photo=photo, caption=msg_text, reply_markup=get_back_kb()
|
||||
)
|
||||
else:
|
||||
await message.answer(text=msg_text, reply_markup=get_back_kb())
|
||||
|
||||
await state.set_state(AdminSettingsStates.edit_photo)
|
||||
@@ -0,0 +1,28 @@
|
||||
# Aiogram
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram import Router, F
|
||||
|
||||
# Const
|
||||
from create_bot import orm
|
||||
|
||||
# States
|
||||
from states.admin_states import AdminStates
|
||||
|
||||
# Init
|
||||
admin_statistic_router = Router()
|
||||
|
||||
|
||||
@admin_statistic_router.message(
|
||||
F.text == "📊 Статистика", StateFilter(AdminStates.main)
|
||||
)
|
||||
async def cmd_statistic(message: types.Message, state: FSMContext):
|
||||
|
||||
users_count = await orm.get_users_count()
|
||||
|
||||
msg_text = f"""<i>📊 Статистика</i>
|
||||
|
||||
🔹 Кол-во пользователей в боте: {users_count:,} чел."""
|
||||
|
||||
await message.answer(text=msg_text)
|
||||
@@ -0,0 +1 @@
|
||||
from .main import client_main_router
|
||||
@@ -0,0 +1,154 @@
|
||||
# Aiogram
|
||||
import aiogram.types as types
|
||||
from aiogram import Router, F
|
||||
from aiogram.filters import StateFilter
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.exceptions import TelegramBadRequest
|
||||
from aiogram.types import BufferedInputFile
|
||||
|
||||
import aiohttp
|
||||
|
||||
# Keyboards
|
||||
from keyboards.reply_keyboards import get_client_main_kb
|
||||
|
||||
# Services
|
||||
from services.rag_api import ask_rag_api, RagApiError
|
||||
|
||||
# States
|
||||
from states.client_states import MainStates
|
||||
|
||||
# Utils
|
||||
from utils.text_tools import format_telegram_html
|
||||
|
||||
|
||||
client_main_router = Router()
|
||||
|
||||
POPULAR_QUESTION_MAP = {
|
||||
"🕒 До скольки вы работаете?": "До скольки вы работаете?",
|
||||
"🚚 Как работает доставка?": "Есть ли доставка и как заказать?",
|
||||
"🌯 Что посоветуете из шаурмы?": "Что посоветуете из шаурмы?",
|
||||
"🍕 Подобрать пиццу до 400 ₽": "Подбери вкусную пиццу до 400 рублей",
|
||||
"🧀 Что у вас есть с сыром?": "Что у вас есть с сыром?",
|
||||
"🔥 Что у вас есть острое?": "Что у вас есть острое?",
|
||||
}
|
||||
|
||||
MAX_HISTORY_MESSAGES = 8
|
||||
MAX_MENU_CARDS = 3
|
||||
PHOTO_DOWNLOAD_TIMEOUT_SECONDS = 20
|
||||
|
||||
|
||||
def trim_history(history: list[dict[str, str]]) -> list[dict[str, str]]:
|
||||
return history[-MAX_HISTORY_MESSAGES:]
|
||||
|
||||
|
||||
def shorten_text(text: str, limit: int = 240) -> str:
|
||||
cleaned = " ".join(str(text).split())
|
||||
if len(cleaned) <= limit:
|
||||
return cleaned
|
||||
return cleaned[: limit - 1].rstrip() + "…"
|
||||
|
||||
|
||||
def build_menu_item_caption(item: dict[str, str]) -> str:
|
||||
name = format_telegram_html(item.get("name", "Позиция из меню"))
|
||||
raw_price = item.get("price_label") or "Цена уточняется"
|
||||
if item.get("price") is None or "бесплат" in str(raw_price).lower():
|
||||
raw_price = "Цена уточняется"
|
||||
price = format_telegram_html(raw_price)
|
||||
description = format_telegram_html(shorten_text(item.get("description", "")))
|
||||
size = format_telegram_html(item.get("size") or "")
|
||||
category = format_telegram_html(item.get("category") or "")
|
||||
|
||||
caption_parts = [f"<b>{name}</b>", f"💸 {price}"]
|
||||
if category or size:
|
||||
meta = " • ".join(part for part in [category, size] if part)
|
||||
caption_parts.append(meta)
|
||||
if description:
|
||||
caption_parts.append(description)
|
||||
return "\n".join(caption_parts)
|
||||
|
||||
|
||||
async def send_menu_cards(message: types.Message, items: list[dict[str, str]]) -> None:
|
||||
for item in items[:MAX_MENU_CARDS]:
|
||||
caption = build_menu_item_caption(item)
|
||||
photo_url = item.get("photo_url")
|
||||
|
||||
if photo_url:
|
||||
try:
|
||||
photo = await download_menu_photo(str(photo_url), str(item.get("item_id") or "menu"))
|
||||
await message.answer_photo(photo=photo, caption=caption)
|
||||
continue
|
||||
except TelegramBadRequest:
|
||||
pass
|
||||
except Exception:
|
||||
try:
|
||||
await message.answer_photo(photo=photo_url, caption=caption)
|
||||
continue
|
||||
except TelegramBadRequest:
|
||||
pass
|
||||
|
||||
await message.answer(caption)
|
||||
|
||||
|
||||
async def download_menu_photo(photo_url: str, item_id: str) -> BufferedInputFile:
|
||||
timeout = aiohttp.ClientTimeout(total=PHOTO_DOWNLOAD_TIMEOUT_SECONDS)
|
||||
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||
async with session.get(photo_url) as response:
|
||||
response.raise_for_status()
|
||||
content = await response.read()
|
||||
|
||||
extension = photo_url.rsplit(".", 1)[-1].split("?", 1)[0] if "." in photo_url else "jpg"
|
||||
filename = f"menu_{item_id}.{extension or 'jpg'}"
|
||||
return BufferedInputFile(content, filename=filename)
|
||||
|
||||
|
||||
@client_main_router.message(F.text == "🧹 Очистить диалог", StateFilter(MainStates.main))
|
||||
async def clear_dialog(message: types.Message, state: FSMContext):
|
||||
await state.update_data(rag_history=[])
|
||||
await message.answer(
|
||||
"🧼 Диалог очищен. Можете задать новый вопрос о меню, доставке или заведении.",
|
||||
reply_markup=get_client_main_kb(),
|
||||
)
|
||||
|
||||
|
||||
@client_main_router.message(F.text, StateFilter(MainStates.main))
|
||||
async def handle_client_message(message: types.Message, state: FSMContext):
|
||||
if message.chat.type != "private":
|
||||
return
|
||||
|
||||
if not message.text:
|
||||
return
|
||||
|
||||
user_text = POPULAR_QUESTION_MAP.get(message.text, message.text)
|
||||
state_data = await state.get_data()
|
||||
history = state_data.get("rag_history", [])
|
||||
|
||||
waiting_message = await message.answer("🤖 Думаю над ответом...")
|
||||
|
||||
try:
|
||||
response = await ask_rag_api(message=user_text, history=history)
|
||||
except RagApiError:
|
||||
await waiting_message.edit_text(
|
||||
"⚠️ Не получилось обратиться к сервису ответов. Попробуйте ещё раз через минуту."
|
||||
)
|
||||
return
|
||||
except Exception:
|
||||
await waiting_message.edit_text(
|
||||
"⚠️ Что-то пошло не так. Попробуйте отправить вопрос ещё раз."
|
||||
)
|
||||
return
|
||||
|
||||
answer = format_telegram_html(response.get("answer", "⚠️ Не удалось получить ответ."))
|
||||
updated_history = trim_history(
|
||||
[
|
||||
*history,
|
||||
{"role": "user", "content": user_text},
|
||||
{"role": "assistant", "content": answer},
|
||||
]
|
||||
)
|
||||
|
||||
await state.update_data(rag_history=updated_history)
|
||||
await waiting_message.edit_text(answer)
|
||||
|
||||
tool_results = response.get("tool_results") or []
|
||||
if tool_results:
|
||||
await send_menu_cards(message, tool_results)
|
||||
@@ -0,0 +1,58 @@
|
||||
# Aiogram
|
||||
import aiogram.types as types
|
||||
from aiogram.fsm.context import FSMContext
|
||||
from aiogram.filters import CommandStart, StateFilter
|
||||
from aiogram import Router, F
|
||||
|
||||
# Utils
|
||||
from utils.text_tools import to_html
|
||||
|
||||
# Const
|
||||
from create_bot import orm
|
||||
|
||||
# Keyboards
|
||||
from keyboards.reply_keyboards import get_client_main_kb
|
||||
|
||||
# States
|
||||
from states.client_states import MainStates
|
||||
|
||||
# Another
|
||||
from datetime import datetime, timezone
|
||||
|
||||
|
||||
# Init
|
||||
start_router = Router()
|
||||
|
||||
|
||||
@start_router.message(CommandStart(), StateFilter("*"))
|
||||
async def cmd_start(message: types.Message, state: FSMContext):
|
||||
|
||||
if message.chat.type != "private":
|
||||
return
|
||||
|
||||
user_id = message.from_user.id
|
||||
username = (
|
||||
"@" + message.from_user.username
|
||||
if message.from_user.username is not None
|
||||
else None
|
||||
)
|
||||
fullname = to_html(message.from_user.full_name)
|
||||
|
||||
await orm.create_user(
|
||||
user_id=user_id,
|
||||
username=username,
|
||||
fullname=fullname,
|
||||
register_date=datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
msg_text = (
|
||||
f"👋 Привет, {fullname}!\n\n"
|
||||
"Я бот шаурмечной <b>Горыч</b>.\n"
|
||||
"Подскажу по меню, доставке, режиму работы и помогу подобрать блюдо.\n\n"
|
||||
"✨ Выберите популярный вопрос ниже или просто напишите свой."
|
||||
)
|
||||
|
||||
await message.answer(text=msg_text, reply_markup=get_client_main_kb())
|
||||
|
||||
await state.update_data(rag_history=[])
|
||||
await state.set_state(MainStates.main)
|
||||
@@ -0,0 +1,55 @@
|
||||
# Aiogram imports
|
||||
from aiogram.utils.keyboard import ReplyKeyboardBuilder, InlineKeyboardBuilder
|
||||
from aiogram.types import InlineKeyboardButton, KeyboardButton
|
||||
|
||||
|
||||
def get_back_to_main_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.row(KeyboardButton(text="↩️ Вернуться в меню"))
|
||||
|
||||
return builder.as_markup(resize_keyboard=True)
|
||||
|
||||
|
||||
def get_back_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.row(KeyboardButton(text="↩️ Назад"))
|
||||
|
||||
return builder.as_markup(resize_keyboard=True)
|
||||
|
||||
|
||||
def get_skip_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.add(KeyboardButton(text="↪️ Пропустить"), KeyboardButton(text="↩️ Назад"))
|
||||
builder.adjust(1)
|
||||
|
||||
return builder.as_markup(resize_keyboard=True)
|
||||
|
||||
|
||||
def get_mailer_finish_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.add(
|
||||
KeyboardButton(text="🟢 Начать рассылку"), KeyboardButton(text="↩️ Назад")
|
||||
)
|
||||
builder.adjust(1)
|
||||
|
||||
return builder.as_markup(resize_keyboard=True, is_persistent=True)
|
||||
|
||||
|
||||
def get_mailer_btn_ikb(buttons_preset: list[str] | None):
|
||||
|
||||
builder = InlineKeyboardBuilder()
|
||||
|
||||
if buttons_preset:
|
||||
for row in buttons_preset:
|
||||
for btn_name, btn_url in row:
|
||||
builder.row(InlineKeyboardButton(text=btn_name, url=btn_url))
|
||||
|
||||
return builder.as_markup()
|
||||
@@ -0,0 +1,95 @@
|
||||
# Aiogram imports
|
||||
from aiogram.utils.keyboard import (
|
||||
ReplyKeyboardBuilder,
|
||||
KeyboardButton,
|
||||
InlineKeyboardBuilder,
|
||||
)
|
||||
from aiogram.types import (
|
||||
ReplyKeyboardMarkup,
|
||||
InlineKeyboardMarkup,
|
||||
InlineKeyboardButton,
|
||||
)
|
||||
|
||||
|
||||
def get_main_menu_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.row(KeyboardButton(text="📊 Статистика"), KeyboardButton(text="✉️ Рассылка"))
|
||||
|
||||
builder.row(
|
||||
KeyboardButton(text="🚫 Черный список"), KeyboardButton(text="⚙️ Настройки")
|
||||
)
|
||||
|
||||
builder.row(
|
||||
KeyboardButton(text="📑 Список пользователей"),
|
||||
KeyboardButton(text="👮♂️ Управление админами"),
|
||||
)
|
||||
|
||||
builder.row(KeyboardButton(text="🔚 Выйти"))
|
||||
|
||||
return builder.as_markup(resize_keyboard=True, is_persistent=True)
|
||||
|
||||
|
||||
def get_add_admins_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.row(KeyboardButton(text="➕ Добавить"), KeyboardButton(text="➖ Удалить"))
|
||||
|
||||
builder.row(KeyboardButton(text="↩️ Вернуться в меню"))
|
||||
|
||||
return builder.as_markup(resize_keyboard=True, is_persistent=True)
|
||||
|
||||
|
||||
def get_back_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.row(KeyboardButton(text="↩️ Назад"))
|
||||
|
||||
return builder.as_markup(resize_keyboard=True)
|
||||
|
||||
|
||||
def get_settings_kb() -> ReplyKeyboardMarkup:
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.add(KeyboardButton(text="↩️ Вернуться в меню"))
|
||||
builder.adjust(2)
|
||||
|
||||
return builder.as_markup(resize_keyboard=True, is_persistent=True)
|
||||
|
||||
|
||||
def get_blacklist_kb():
|
||||
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
builder.row(KeyboardButton(text="👁 Открыть список"))
|
||||
|
||||
builder.row(KeyboardButton(text="➕ Добавить"), KeyboardButton(text="➖ Удалить"))
|
||||
|
||||
builder.row(KeyboardButton(text="↩️ Вернуться в меню"))
|
||||
|
||||
return builder.as_markup(resize_keyboard=True, is_persistent=True)
|
||||
|
||||
|
||||
def get_bookList_ikb(
|
||||
prefix: str, offset: int, max_offset: int, items: list[tuple], element_col: int = 10
|
||||
) -> InlineKeyboardMarkup:
|
||||
|
||||
builder = InlineKeyboardBuilder()
|
||||
|
||||
for item_id, item_name in items[offset * element_col : (offset + 1) * element_col]:
|
||||
builder.row(
|
||||
InlineKeyboardButton(
|
||||
text=f"{item_name}", callback_data=f"{prefix}_pick_{item_id}"
|
||||
)
|
||||
)
|
||||
|
||||
builder.row(
|
||||
InlineKeyboardButton(text="⬅️", callback_data=f"{prefix}_prev"),
|
||||
InlineKeyboardButton(text="➡️", callback_data=f"{prefix}_next"),
|
||||
)
|
||||
|
||||
return builder.as_markup()
|
||||
@@ -0,0 +1,3 @@
|
||||
# Aiogram imports
|
||||
from aiogram.utils.keyboard import InlineKeyboardBuilder
|
||||
from aiogram.types import InlineKeyboardMarkup, InlineKeyboardButton
|
||||
@@ -0,0 +1,28 @@
|
||||
# Aiogram imports
|
||||
from aiogram.utils.keyboard import ReplyKeyboardBuilder
|
||||
from aiogram.types import ReplyKeyboardMarkup, KeyboardButton
|
||||
|
||||
|
||||
POPULAR_QUESTIONS = [
|
||||
"🕒 До скольки вы работаете?",
|
||||
"🚚 Как работает доставка?",
|
||||
"🌯 Что посоветуете из шаурмы?",
|
||||
"🍕 Подобрать пиццу до 400 ₽",
|
||||
"🧀 Что у вас есть с сыром?",
|
||||
"🔥 Что у вас есть острое?",
|
||||
]
|
||||
|
||||
|
||||
def get_client_main_kb() -> ReplyKeyboardMarkup:
|
||||
builder = ReplyKeyboardBuilder()
|
||||
|
||||
for question in POPULAR_QUESTIONS:
|
||||
builder.add(KeyboardButton(text=question))
|
||||
|
||||
builder.add(KeyboardButton(text="🧹 Очистить диалог"))
|
||||
|
||||
builder.adjust(2, 2, 2, 1)
|
||||
return builder.as_markup(
|
||||
resize_keyboard=True,
|
||||
input_field_placeholder="Спросите про меню, доставку или режим работы",
|
||||
)
|
||||
@@ -0,0 +1,60 @@
|
||||
import asyncio
|
||||
from typing import Any, Dict, Union
|
||||
|
||||
from aiogram import BaseMiddleware
|
||||
from aiogram.types import Message
|
||||
|
||||
|
||||
class AlbumMiddleware(BaseMiddleware):
|
||||
def __init__(self, latency: Union[int, float] = 0.19):
|
||||
# Initialize latency and album_data dictionary
|
||||
self.latency = latency
|
||||
self.album_data = {}
|
||||
|
||||
#
|
||||
def collect_album_messages(self, event: Message):
|
||||
"""
|
||||
Collect messages of the same media group.
|
||||
"""
|
||||
# # Check if media_group_id exists in album_data
|
||||
if event.media_group_id not in self.album_data:
|
||||
# # Create a new entry for the media group
|
||||
self.album_data[event.media_group_id] = {"messages": []}
|
||||
#
|
||||
# # Append the new message to the media group
|
||||
self.album_data[event.media_group_id]["messages"].append(event)
|
||||
#
|
||||
# # Return the total number of messages in the current media group
|
||||
return len(self.album_data[event.media_group_id]["messages"])
|
||||
|
||||
#
|
||||
async def __call__(self, handler, event: Message, data: Dict[str, Any]) -> Any:
|
||||
"""
|
||||
Main middleware logic.
|
||||
"""
|
||||
# # If the event has no media_group_id, pass it to the handler immediately
|
||||
if not event.media_group_id:
|
||||
return await handler(event, data)
|
||||
#
|
||||
# # Collect messages of the same media group
|
||||
total_before = self.collect_album_messages(event)
|
||||
#
|
||||
# # Wait for a specified latency period
|
||||
await asyncio.sleep(self.latency)
|
||||
#
|
||||
# # Check the total number of messages after the latency
|
||||
total_after = len(self.album_data[event.media_group_id]["messages"])
|
||||
#
|
||||
# # If new messages were added during the latency, exit
|
||||
if total_before != total_after:
|
||||
return
|
||||
#
|
||||
# # Sort the album messages by message_id and add to data
|
||||
album_messages = self.album_data[event.media_group_id]["messages"]
|
||||
album_messages.sort(key=lambda x: x.message_id)
|
||||
data["album"] = album_messages
|
||||
#
|
||||
# # Remove the media group from tracking to free up memory
|
||||
del self.album_data[event.media_group_id]
|
||||
# # Call the original event handler
|
||||
return await handler(event, data)
|
||||
@@ -0,0 +1,93 @@
|
||||
from aiogram import types
|
||||
from aiogram import BaseMiddleware
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from collections import deque
|
||||
import asyncio
|
||||
|
||||
# Const
|
||||
from create_bot import orm
|
||||
|
||||
|
||||
class AntiFloodMiddleware(BaseMiddleware):
|
||||
|
||||
def __init__(
|
||||
self, max_messages: int = 5, interval: float = 2, block_time: float = 60.0
|
||||
):
|
||||
"""
|
||||
Инициализация AntiFloodMiddleware.
|
||||
|
||||
:param max_messages: Максимальное количество сообщений.
|
||||
:param interval: Временной интервал (в секундах) для проверки сообщений.
|
||||
:param block_time: Время блокировки пользователя (в секундах).
|
||||
"""
|
||||
super(AntiFloodMiddleware, self).__init__()
|
||||
self.max_messages = max_messages
|
||||
self.interval = interval
|
||||
self.block_time = block_time
|
||||
self.user_messages = {} # user_id: deque of message timestamps
|
||||
self.blocked_users = {} # user_id: unblock_time
|
||||
self.lock = asyncio.Lock() # Для обеспечения потокобезопасности
|
||||
|
||||
async def __call__(self, handler, event: types.Message, data):
|
||||
user_id = event.from_user.id
|
||||
current_time = datetime.now(timezone.utc)
|
||||
|
||||
async with self.lock:
|
||||
# Проверка, заблокирован ли пользователь
|
||||
if user_id in self.blocked_users:
|
||||
unblock_time = self.blocked_users[user_id]
|
||||
if current_time < unblock_time:
|
||||
# Пользователь всё ещё заблокирован
|
||||
return
|
||||
else:
|
||||
# Блокировка истекла
|
||||
del self.blocked_users[user_id]
|
||||
|
||||
if isinstance(event, types.CallbackQuery):
|
||||
return await handler(event, data)
|
||||
|
||||
# Инициализация очереди сообщений для пользователя, если её ещё нет
|
||||
if user_id not in self.user_messages:
|
||||
self.user_messages[user_id] = deque()
|
||||
|
||||
user_queue = self.user_messages[user_id]
|
||||
user_queue.append(current_time)
|
||||
|
||||
# Удаление сообщений, которые старше интервала
|
||||
while (
|
||||
user_queue
|
||||
and (current_time - user_queue[0]).total_seconds() > self.interval
|
||||
):
|
||||
user_queue.popleft()
|
||||
|
||||
# Проверка, превысил ли пользователь лимит сообщений
|
||||
if len(user_queue) > self.max_messages:
|
||||
# Блокировка пользователя
|
||||
self.blocked_users[user_id] = current_time + timedelta(
|
||||
seconds=self.block_time
|
||||
)
|
||||
# Очистка очереди сообщений
|
||||
del self.user_messages[user_id]
|
||||
|
||||
await event.answer(text="🧊 Вы заморожены на 1 минуту за флуд!")
|
||||
|
||||
# Отмена обработки сообщения и блокировка
|
||||
return
|
||||
|
||||
return await handler(event, data)
|
||||
|
||||
|
||||
class BlacklistMiddleware(BaseMiddleware):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
async def __call__(self, handler, event: types.Update, data: dict):
|
||||
user_id = self.get_user_id(event)
|
||||
if user_id:
|
||||
if await orm.is_blacklisted(user_id):
|
||||
return
|
||||
|
||||
return await handler(event, data)
|
||||
|
||||
def get_user_id(self, event: types.Update):
|
||||
return event.from_user.id
|
||||
@@ -0,0 +1,39 @@
|
||||
aiofiles==24.1.0
|
||||
aiogram==3.17.0
|
||||
aiohappyeyeballs==2.4.6
|
||||
aiohttp==3.11.12
|
||||
aiohttp-socks==0.10.1
|
||||
aiosignal==1.3.2
|
||||
annotated-types==0.7.0
|
||||
asyncio==3.4.3
|
||||
asyncpg==0.30.0
|
||||
attrs==25.1.0
|
||||
certifi==2025.1.31
|
||||
charset-normalizer==3.4.1
|
||||
dotenv-cli==3.4.1
|
||||
et_xmlfile==2.0.0
|
||||
frozenlist==1.5.0
|
||||
greenlet==3.1.1
|
||||
idna==3.10
|
||||
magic-filter==1.0.12
|
||||
markdown-it-py==3.0.0
|
||||
mdurl==0.1.2
|
||||
multidict==6.1.0
|
||||
openpyxl==3.1.5
|
||||
propcache==0.2.1
|
||||
pydantic==2.10.6
|
||||
pydantic_core==2.27.2
|
||||
Pygments==2.19.1
|
||||
python-decouple==3.8
|
||||
redis==5.2.1
|
||||
requests==2.32.3
|
||||
rich==13.9.4
|
||||
simplejson==3.20.1
|
||||
SQLAlchemy==2.0.38
|
||||
typing_extensions==4.12.2
|
||||
urllib3==2.3.0
|
||||
uvloop==0.21.0
|
||||
yarl==1.18.3
|
||||
fastapi
|
||||
uvicorn
|
||||
pytz
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1,31 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
import aiohttp
|
||||
from decouple import config
|
||||
|
||||
|
||||
RAG_API_URL = config("RAG_API_URL", default="http://127.0.0.1:8001")
|
||||
RAG_API_TIMEOUT_SECONDS = float(config("RAG_API_TIMEOUT_SECONDS", default="60"))
|
||||
|
||||
|
||||
class RagApiError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
async def ask_rag_api(message: str, history: list[dict[str, str]]) -> dict[str, Any]:
|
||||
timeout = aiohttp.ClientTimeout(total=RAG_API_TIMEOUT_SECONDS)
|
||||
payload = {
|
||||
"message": message,
|
||||
"history": history,
|
||||
}
|
||||
|
||||
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||
async with session.post(f"{RAG_API_URL}/chat", json=payload) as response:
|
||||
if response.status != 200:
|
||||
text = await response.text()
|
||||
raise RagApiError(f"RAG API returned {response.status}: {text}")
|
||||
|
||||
return await response.json()
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
# Aiogram imports
|
||||
from aiogram.fsm.state import State, StatesGroup
|
||||
|
||||
|
||||
class AdminStates(StatesGroup):
|
||||
|
||||
main = State()
|
||||
|
||||
|
||||
class AdminMailerStates(StatesGroup):
|
||||
|
||||
post = State()
|
||||
ikb = State()
|
||||
preview = State()
|
||||
|
||||
|
||||
class AdminManagementStates(StatesGroup):
|
||||
|
||||
main = State()
|
||||
|
||||
add_admin = State()
|
||||
del_admin = State()
|
||||
|
||||
|
||||
class AdminSettingsStates(StatesGroup):
|
||||
|
||||
main = State()
|
||||
edit_photo = State()
|
||||
|
||||
|
||||
class AdminBlacklistStates(StatesGroup):
|
||||
|
||||
main = State()
|
||||
|
||||
add_blacklist = State()
|
||||
del_blacklist = State()
|
||||
@@ -0,0 +1,7 @@
|
||||
# Aiogram imports
|
||||
from aiogram.fsm.state import State, StatesGroup
|
||||
|
||||
|
||||
class MainStates(StatesGroup):
|
||||
|
||||
main = State()
|
||||
Binary file not shown.
@@ -0,0 +1,17 @@
|
||||
import simplejson as json
|
||||
|
||||
# init
|
||||
CFG_PATH = "cfg/config.json"
|
||||
|
||||
|
||||
# load cfg and return it
|
||||
def load_config(cfg_path=CFG_PATH):
|
||||
|
||||
with open(cfg_path, "r", encoding="utf-8") as config_fp:
|
||||
return json.load(config_fp)
|
||||
|
||||
|
||||
def rewrite_config(obj, cfg_path=CFG_PATH):
|
||||
|
||||
with open(cfg_path, "w", encoding="utf-8") as config_fp:
|
||||
json.dump(obj, config_fp, indent=4)
|
||||
@@ -0,0 +1,44 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from aiohttp import BasicAuth
|
||||
from aiogram.client.session.aiohttp import AiohttpSession
|
||||
from decouple import config
|
||||
|
||||
|
||||
SUPPORTED_PROXY_PROTOCOLS = {"http", "socks5"}
|
||||
|
||||
|
||||
def build_bot_session() -> AiohttpSession | None:
|
||||
proxy_raw = config("BOT_PROXY", default="").strip()
|
||||
if not proxy_raw:
|
||||
return None
|
||||
|
||||
proxy = parse_proxy_value(proxy_raw)
|
||||
return AiohttpSession(proxy=proxy)
|
||||
|
||||
|
||||
def parse_proxy_value(proxy_raw: str) -> str | tuple[str, BasicAuth]:
|
||||
if "://" in proxy_raw:
|
||||
return proxy_raw
|
||||
|
||||
parts = proxy_raw.split(":")
|
||||
if len(parts) not in {3, 5}:
|
||||
raise ValueError(
|
||||
"BOT_PROXY must be in format protocol:ip:port or protocol:ip:port:user:pass"
|
||||
)
|
||||
|
||||
protocol, host, port = parts[:3]
|
||||
protocol = protocol.lower()
|
||||
|
||||
if protocol not in SUPPORTED_PROXY_PROTOCOLS:
|
||||
raise ValueError(
|
||||
f"Unsupported proxy protocol '{protocol}'. Supported: http, socks5"
|
||||
)
|
||||
|
||||
proxy_url = f"{protocol}://{host}:{port}"
|
||||
|
||||
if len(parts) == 3:
|
||||
return proxy_url
|
||||
|
||||
username, password = parts[3], parts[4]
|
||||
return proxy_url, BasicAuth(login=username, password=password)
|
||||
@@ -0,0 +1,119 @@
|
||||
import html
|
||||
import re
|
||||
from html.parser import HTMLParser
|
||||
|
||||
|
||||
def to_html(obj):
|
||||
return str(obj).replace("<", "<").replace(">", ">")
|
||||
|
||||
|
||||
class TelegramHTMLSanitizer(HTMLParser):
|
||||
ALLOWED_TAGS = {"b", "i", "u", "s", "code", "pre", "a"}
|
||||
TAG_ALIASES = {"strong": "b", "em": "i"}
|
||||
ALLOWED_HREF_PREFIXES = ("http://", "https://", "tg://", "mailto:")
|
||||
|
||||
def __init__(self) -> None:
|
||||
super().__init__(convert_charrefs=False)
|
||||
self.parts: list[str] = []
|
||||
self.tag_stack: list[str] = []
|
||||
|
||||
def handle_starttag(self, tag: str, attrs: list[tuple[str, str | None]]) -> None:
|
||||
normalized_tag = self.TAG_ALIASES.get(tag, tag)
|
||||
if normalized_tag not in self.ALLOWED_TAGS:
|
||||
return
|
||||
|
||||
if normalized_tag == "a":
|
||||
href = next((value for key, value in attrs if key == "href" and value), None)
|
||||
if not href or not href.startswith(self.ALLOWED_HREF_PREFIXES):
|
||||
return
|
||||
safe_href = html.escape(href, quote=True)
|
||||
self.parts.append(f'<a href="{safe_href}">')
|
||||
self.tag_stack.append(normalized_tag)
|
||||
return
|
||||
|
||||
self.parts.append(f"<{normalized_tag}>")
|
||||
self.tag_stack.append(normalized_tag)
|
||||
|
||||
def handle_endtag(self, tag: str) -> None:
|
||||
normalized_tag = self.TAG_ALIASES.get(tag, tag)
|
||||
if normalized_tag not in self.ALLOWED_TAGS:
|
||||
return
|
||||
|
||||
for index in range(len(self.tag_stack) - 1, -1, -1):
|
||||
if self.tag_stack[index] == normalized_tag:
|
||||
del self.tag_stack[index]
|
||||
self.parts.append(f"</{normalized_tag}>")
|
||||
break
|
||||
|
||||
def handle_data(self, data: str) -> None:
|
||||
self.parts.append(html.escape(data, quote=False))
|
||||
|
||||
def handle_entityref(self, name: str) -> None:
|
||||
self.parts.append(f"&{name};")
|
||||
|
||||
def handle_charref(self, name: str) -> None:
|
||||
self.parts.append(f"&#{name};")
|
||||
|
||||
def get_html(self) -> str:
|
||||
while self.tag_stack:
|
||||
self.parts.append(f"</{self.tag_stack.pop()}>")
|
||||
return "".join(self.parts)
|
||||
|
||||
|
||||
def markdown_to_telegram_html(text: str) -> str:
|
||||
prepared = text.replace("\r\n", "\n").strip()
|
||||
prepared = re.sub(
|
||||
r"\[([^\]]+)\]\((https?://[^\s)]+)\)",
|
||||
r'<a href="\2">\1</a>',
|
||||
prepared,
|
||||
)
|
||||
prepared = re.sub(r"\*\*(.+?)\*\*", r"<b>\1</b>", prepared, flags=re.DOTALL)
|
||||
prepared = re.sub(r"__(.+?)__", r"<b>\1</b>", prepared, flags=re.DOTALL)
|
||||
prepared = re.sub(r"(?m)^[ \t]*[*-]\s+", "• ", prepared)
|
||||
return prepared
|
||||
|
||||
|
||||
def format_telegram_html(text: str) -> str:
|
||||
prepared = markdown_to_telegram_html(str(text))
|
||||
sanitizer = TelegramHTMLSanitizer()
|
||||
sanitizer.feed(prepared)
|
||||
sanitizer.close()
|
||||
return sanitizer.get_html()
|
||||
|
||||
|
||||
def parse_links_to_inline_markup(message: str) -> list:
|
||||
"""
|
||||
Парсит сообщение с форматированными ссылками и возвращает список рядов кнопок.
|
||||
|
||||
Формат входного сообщения:
|
||||
- [Текст кнопки + Ссылка] для одной кнопки.
|
||||
- [Кнопка1 + Ссылка1][Кнопка2 + Ссылка2] для нескольких кнопок в одном ряду.
|
||||
- Каждая строка представляет отдельный ряд кнопок.
|
||||
|
||||
Пример:
|
||||
[Кнопка1 + https://example.com]
|
||||
[Кнопка2 + https://example.org][Кнопка3 + https://example.net]
|
||||
|
||||
:param message: Строка с отформатированными ссылками.
|
||||
:return: Список рядов кнопок, где каждый ряд — это список кортежей (Текст, Ссылка).
|
||||
"""
|
||||
# Исправленное регулярное выражение для поиска [Текст + Ссылка]
|
||||
pattern = re.compile(r"\[([^\[\]+]+)\s*\+\s*(https?://[^\[\]]+)\]")
|
||||
|
||||
# Инициализируем список рядов кнопок
|
||||
keyboard_rows = []
|
||||
|
||||
# Разбиваем сообщение на строки
|
||||
lines = message.strip().split("\n")
|
||||
|
||||
for line in lines:
|
||||
# Находим все совпадения в строке
|
||||
matches = pattern.findall(line)
|
||||
if matches:
|
||||
row = []
|
||||
for text, url in matches:
|
||||
button = (text.strip(), url.strip())
|
||||
row.append(button)
|
||||
keyboard_rows.append(row)
|
||||
|
||||
return keyboard_rows
|
||||
@@ -0,0 +1,16 @@
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi.responses import JSONResponse
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
return {"message": "Hello, this is the test webhook endpoint!"}
|
||||
|
||||
|
||||
@app.post("/webhook")
|
||||
async def webhook(request: Request):
|
||||
data = await request.json()
|
||||
|
||||
return JSONResponse(content={"status": "ok", "data": data})
|
||||
@@ -0,0 +1,88 @@
|
||||
name: gorych-bot
|
||||
|
||||
x-default-logging: &default-logging
|
||||
logging:
|
||||
driver: json-file
|
||||
options:
|
||||
max-size: "10m"
|
||||
max-file: "3"
|
||||
|
||||
services:
|
||||
bot:
|
||||
<<: *default-logging
|
||||
build:
|
||||
context: .
|
||||
dockerfile: bot/Dockerfile
|
||||
restart: unless-stopped
|
||||
env_file:
|
||||
- ./bot/.env
|
||||
- ./postgres.env
|
||||
environment:
|
||||
- REDIS_URL=redis://redisdb:6379/0
|
||||
- RAG_API_URL=http://rag_api:8000
|
||||
depends_on:
|
||||
redisdb:
|
||||
condition: service_healthy
|
||||
postgredb:
|
||||
condition: service_healthy
|
||||
rag_api:
|
||||
condition: service_started
|
||||
command: python aiogram_run.py
|
||||
|
||||
menu_scraper:
|
||||
<<: *default-logging
|
||||
build:
|
||||
context: .
|
||||
dockerfile: menu_scraper/Dockerfile
|
||||
restart: unless-stopped
|
||||
env_file:
|
||||
- ./menu_scraper/.env
|
||||
- ./postgres.env
|
||||
volumes:
|
||||
- ./data:/data
|
||||
ports:
|
||||
- "8010:8010"
|
||||
|
||||
rag_api:
|
||||
<<: *default-logging
|
||||
build:
|
||||
context: .
|
||||
dockerfile: rag_api/Dockerfile
|
||||
restart: unless-stopped
|
||||
env_file:
|
||||
- ./rag_api/.env
|
||||
- ./postgres.env
|
||||
environment:
|
||||
- ANONYMIZED_TELEMETRY=false
|
||||
- HF_HOME=/data/huggingface
|
||||
- TRANSFORMERS_CACHE=/data/huggingface
|
||||
- HUGGINGFACE_HUB_CACHE=/data/huggingface
|
||||
- HUGGINGFACE_CACHE_DIR=/data/huggingface
|
||||
depends_on:
|
||||
- menu_scraper
|
||||
volumes:
|
||||
- ./data:/data
|
||||
ports:
|
||||
- "8001:8000"
|
||||
|
||||
redisdb:
|
||||
<<: *default-logging
|
||||
image: redis:6-alpine
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
postgredb:
|
||||
<<: *default-logging
|
||||
image: postgres:16-alpine
|
||||
restart: unless-stopped
|
||||
env_file:
|
||||
- ./postgres.env
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U ruby -d postgres"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
@@ -0,0 +1,4 @@
|
||||
GORICH_SITE_URL=https://gorych34.ru/
|
||||
MENU_OUTPUT_PATH=/data/menu/gorich_menu.json
|
||||
REQUEST_TIMEOUT_SECONDS=20
|
||||
SCRAPE_ON_STARTUP=true
|
||||
@@ -0,0 +1,13 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY menu_scraper/requirements.txt /app/requirements.txt
|
||||
|
||||
RUN pip install --no-cache-dir --upgrade pip && \
|
||||
pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY menu_scraper/app /app/app
|
||||
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8010"]
|
||||
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1,15 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class Settings:
|
||||
site_url: str = os.getenv("GORICH_SITE_URL", "https://gorych34.ru/")
|
||||
output_path: str = os.getenv("MENU_OUTPUT_PATH", "/data/menu/gorich_menu.json")
|
||||
request_timeout: float = float(os.getenv("REQUEST_TIMEOUT_SECONDS", "20"))
|
||||
scrape_on_startup: bool = os.getenv("SCRAPE_ON_STARTUP", "true").lower() == "true"
|
||||
|
||||
|
||||
settings = Settings()
|
||||
@@ -0,0 +1,63 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from contextlib import asynccontextmanager
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import FastAPI, HTTPException
|
||||
|
||||
from .config import settings
|
||||
from .models import MenuSnapshot
|
||||
from .scraper import GorichMenuScraper
|
||||
|
||||
|
||||
scraper = GorichMenuScraper()
|
||||
output_path = Path(settings.output_path)
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(_: FastAPI):
|
||||
if settings.scrape_on_startup:
|
||||
await scraper.scrape_and_save()
|
||||
yield
|
||||
|
||||
|
||||
app = FastAPI(
|
||||
title="Gorich Menu Scraper",
|
||||
version="1.0.0",
|
||||
lifespan=lifespan,
|
||||
)
|
||||
|
||||
|
||||
def load_snapshot_from_disk() -> MenuSnapshot:
|
||||
if not output_path.exists():
|
||||
raise HTTPException(status_code=404, detail="Menu snapshot not found")
|
||||
|
||||
data = json.loads(output_path.read_text(encoding="utf-8"))
|
||||
return MenuSnapshot.model_validate(data)
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health() -> dict[str, str]:
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
@app.post("/scrape", response_model=MenuSnapshot)
|
||||
async def scrape_menu() -> MenuSnapshot:
|
||||
return await scraper.scrape_and_save()
|
||||
|
||||
|
||||
@app.get("/items", response_model=MenuSnapshot)
|
||||
async def get_items() -> MenuSnapshot:
|
||||
return load_snapshot_from_disk()
|
||||
|
||||
|
||||
@app.get("/items/{item_id}")
|
||||
async def get_item(item_id: str) -> dict[str, object]:
|
||||
snapshot = load_snapshot_from_disk()
|
||||
for item in snapshot.items:
|
||||
if item.item_id == item_id:
|
||||
return item.model_dump(mode="json")
|
||||
|
||||
raise HTTPException(status_code=404, detail="Menu item not found")
|
||||
|
||||
@@ -0,0 +1,29 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class MenuItem(BaseModel):
|
||||
item_id: str
|
||||
name: str
|
||||
category: str
|
||||
description: str
|
||||
ingredients: list[str]
|
||||
price: int | None = None
|
||||
price_label: str
|
||||
size: str | None = None
|
||||
photo_url: str
|
||||
source_url: str
|
||||
scraped_at: datetime
|
||||
metadata: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class MenuSnapshot(BaseModel):
|
||||
source_url: str
|
||||
scraped_at: datetime
|
||||
total_items: int
|
||||
items: list[MenuItem]
|
||||
|
||||
@@ -0,0 +1,309 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from .config import settings
|
||||
from .models import MenuItem, MenuSnapshot
|
||||
|
||||
|
||||
SHOP_PAYLOAD_MARKERS = (
|
||||
"MsJsShop.init(",
|
||||
"MsJsPublishedManager.addJsData(",
|
||||
)
|
||||
SIZE_PATTERN = re.compile(
|
||||
r"(\d+\s*(?:см|г|мл)(?:\s*/\s*\d+\s*(?:см|г|мл))*)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
|
||||
def normalize_spaces(value: str) -> str:
|
||||
return " ".join(value.replace("\xa0", " ").split())
|
||||
|
||||
|
||||
def compact_text(value: str) -> str:
|
||||
return re.sub(r"\s+", "", value.replace("\xa0", " ")).lower()
|
||||
|
||||
|
||||
def parse_price(price_label: str) -> int | None:
|
||||
cleaned = normalize_spaces(price_label).lower()
|
||||
if "бесплатно" in cleaned:
|
||||
return None
|
||||
|
||||
digits = re.sub(r"[^\d]", "", cleaned)
|
||||
return int(digits) if digits else None
|
||||
|
||||
|
||||
def parse_ingredients(description: str) -> list[str]:
|
||||
cleaned = normalize_spaces(description)
|
||||
if not cleaned:
|
||||
return []
|
||||
|
||||
lower_cleaned = cleaned.lower()
|
||||
if lower_cleaned.startswith("состав:"):
|
||||
cleaned = cleaned.split(":", 1)[1].strip()
|
||||
|
||||
return [part.strip() for part in cleaned.split(",") if part.strip()]
|
||||
|
||||
|
||||
def extract_size(*values: str) -> str | None:
|
||||
for value in values:
|
||||
match = SIZE_PATTERN.search(value)
|
||||
if match:
|
||||
return match.group(1).replace(" ", "")
|
||||
return None
|
||||
|
||||
|
||||
def is_size_only_line(value: str) -> bool:
|
||||
size = extract_size(value)
|
||||
return size is not None and compact_text(value) == compact_text(size)
|
||||
|
||||
|
||||
def extract_first_json_object(html: str, marker: str) -> dict[str, object]:
|
||||
marker_index = html.find(marker)
|
||||
if marker_index == -1:
|
||||
raise ValueError(f"{marker} payload not found in page")
|
||||
|
||||
object_start = html.find("{", marker_index)
|
||||
if object_start == -1:
|
||||
raise ValueError("Shop payload start not found")
|
||||
|
||||
depth = 0
|
||||
in_string = False
|
||||
escaped = False
|
||||
object_end = None
|
||||
|
||||
for index in range(object_start, len(html)):
|
||||
char = html[index]
|
||||
|
||||
if in_string:
|
||||
if escaped:
|
||||
escaped = False
|
||||
elif char == "\\":
|
||||
escaped = True
|
||||
elif char == '"':
|
||||
in_string = False
|
||||
continue
|
||||
|
||||
if char == '"':
|
||||
in_string = True
|
||||
elif char == "{":
|
||||
depth += 1
|
||||
elif char == "}":
|
||||
depth -= 1
|
||||
if depth == 0:
|
||||
object_end = index + 1
|
||||
break
|
||||
|
||||
if object_end is None:
|
||||
raise ValueError("Shop payload end not found")
|
||||
|
||||
return json.loads(html[object_start:object_end])
|
||||
|
||||
|
||||
def find_shop_container(payload: object) -> dict[str, object] | None:
|
||||
if isinstance(payload, dict):
|
||||
shop = payload.get("shop")
|
||||
if isinstance(shop, dict) and isinstance(shop.get("products"), list):
|
||||
return payload
|
||||
|
||||
ds_shop = payload.get("dsShop")
|
||||
if isinstance(ds_shop, dict) and isinstance(ds_shop.get("data"), list):
|
||||
return {
|
||||
"shop": {
|
||||
"products": ds_shop.get("data", []),
|
||||
"settings": ds_shop.get("settings", {}),
|
||||
}
|
||||
}
|
||||
|
||||
for value in payload.values():
|
||||
found = find_shop_container(value)
|
||||
if found:
|
||||
return found
|
||||
|
||||
if isinstance(payload, list):
|
||||
for value in payload:
|
||||
found = find_shop_container(value)
|
||||
if found:
|
||||
return found
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def extract_shop_payload(html: str) -> dict[str, object]:
|
||||
errors: list[str] = []
|
||||
for marker in SHOP_PAYLOAD_MARKERS:
|
||||
try:
|
||||
payload = extract_first_json_object(html, marker)
|
||||
except ValueError as exc:
|
||||
errors.append(str(exc))
|
||||
continue
|
||||
|
||||
shop_container = find_shop_container(payload)
|
||||
if shop_container is not None:
|
||||
return shop_container
|
||||
|
||||
errors.append(f"{marker} found, but shop container is missing")
|
||||
|
||||
raise ValueError("; ".join(errors) or "Shop payload not found in page")
|
||||
|
||||
|
||||
def html_fragment_to_lines(fragment: str) -> list[str]:
|
||||
if not fragment:
|
||||
return []
|
||||
|
||||
soup = BeautifulSoup(fragment, "html.parser")
|
||||
return [
|
||||
normalize_spaces(line)
|
||||
for line in soup.get_text("\n", strip=True).splitlines()
|
||||
if normalize_spaces(line)
|
||||
]
|
||||
|
||||
|
||||
class GorichMenuScraper:
|
||||
def __init__(self) -> None:
|
||||
self.site_url = settings.site_url
|
||||
self.output_path = Path(settings.output_path)
|
||||
self.timeout = settings.request_timeout
|
||||
|
||||
async def fetch_html(self) -> str:
|
||||
async with self._build_client() as client:
|
||||
response = await client.get(self.site_url)
|
||||
response.raise_for_status()
|
||||
return response.text
|
||||
|
||||
def _build_client(self) -> httpx.AsyncClient:
|
||||
return httpx.AsyncClient(
|
||||
headers={"User-Agent": "Mozilla/5.0"},
|
||||
follow_redirects=True,
|
||||
timeout=self.timeout,
|
||||
)
|
||||
|
||||
def parse_menu(self, html: str) -> MenuSnapshot:
|
||||
payload = extract_shop_payload(html)
|
||||
shop = payload.get("shop") or {}
|
||||
if not isinstance(shop, dict):
|
||||
raise ValueError("Shop payload has unexpected format")
|
||||
|
||||
shop_settings = shop.get("settings") or {}
|
||||
categories = shop_settings.get("categories") or []
|
||||
products = shop.get("products") or []
|
||||
if not isinstance(categories, list) or not isinstance(products, list):
|
||||
raise ValueError("Shop categories or products have unexpected format")
|
||||
|
||||
category_by_id: dict[int, dict[str, object]] = {}
|
||||
for category in categories:
|
||||
if not isinstance(category, dict):
|
||||
continue
|
||||
category_id = category.get("id")
|
||||
if isinstance(category_id, int):
|
||||
category_by_id[category_id] = category
|
||||
|
||||
scraped_at = datetime.now(timezone.utc)
|
||||
items: list[MenuItem] = []
|
||||
|
||||
for product in products:
|
||||
if not isinstance(product, dict):
|
||||
continue
|
||||
if not product.get("is_visible", True):
|
||||
continue
|
||||
|
||||
product_id = product.get("id")
|
||||
name = normalize_spaces(str(product.get("name", "")))
|
||||
if not product_id or not name:
|
||||
continue
|
||||
|
||||
raw_description = str(product.get("short_description", "") or "")
|
||||
description_lines = html_fragment_to_lines(raw_description)
|
||||
size = extract_size(name, *description_lines)
|
||||
description_parts = [line for line in description_lines if not is_size_only_line(line)]
|
||||
description = " ".join(description_parts).strip()
|
||||
if not description and description_lines:
|
||||
description = " ".join(description_lines).strip()
|
||||
|
||||
raw_category_ids = [
|
||||
category_id
|
||||
for category_id in product.get("category_list", [])
|
||||
if isinstance(category_id, int)
|
||||
]
|
||||
sorted_category_ids = sorted(
|
||||
raw_category_ids,
|
||||
key=lambda category_id: int(category_by_id.get(category_id, {}).get("pos", 10_000)),
|
||||
)
|
||||
category_name = "прочее"
|
||||
primary_category_id: int | None = None
|
||||
if sorted_category_ids:
|
||||
primary_category_id = sorted_category_ids[0]
|
||||
category_name = normalize_spaces(
|
||||
str(category_by_id.get(primary_category_id, {}).get("name", "прочее"))
|
||||
).lower()
|
||||
|
||||
image_url = ""
|
||||
image_list = product.get("image_list", [])
|
||||
if isinstance(image_list, list):
|
||||
for image in image_list:
|
||||
if not isinstance(image, dict):
|
||||
continue
|
||||
raw_url = str(image.get("url", "") or "")
|
||||
if raw_url:
|
||||
image_url = urljoin(self.site_url, raw_url)
|
||||
break
|
||||
|
||||
price = product.get("price")
|
||||
numeric_price = int(price) if isinstance(price, int) else None
|
||||
currency = normalize_spaces(str(product.get("currency", "руб.") or "руб."))
|
||||
price_label = (
|
||||
f"{numeric_price} {currency}" if numeric_price is not None else "Цена не указана"
|
||||
)
|
||||
|
||||
description_url = str(product.get("description_url", "") or "")
|
||||
source_url = urljoin(self.site_url, description_url) if description_url else self.site_url
|
||||
|
||||
items.append(
|
||||
MenuItem(
|
||||
item_id=str(product_id),
|
||||
name=name,
|
||||
category=category_name,
|
||||
description=description,
|
||||
ingredients=parse_ingredients(description),
|
||||
price=parse_price(price_label),
|
||||
price_label=price_label,
|
||||
size=size,
|
||||
photo_url=image_url,
|
||||
source_url=source_url,
|
||||
scraped_at=scraped_at,
|
||||
metadata={
|
||||
"category_id": primary_category_id,
|
||||
"category_ids": sorted_category_ids,
|
||||
"raw_short_description": raw_description,
|
||||
"amount": product.get("amount"),
|
||||
"sku": product.get("sku"),
|
||||
},
|
||||
)
|
||||
)
|
||||
|
||||
return MenuSnapshot(
|
||||
source_url=self.site_url,
|
||||
scraped_at=scraped_at,
|
||||
total_items=len(items),
|
||||
items=items,
|
||||
)
|
||||
|
||||
def save_snapshot(self, snapshot: MenuSnapshot) -> None:
|
||||
self.output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
self.output_path.write_text(
|
||||
json.dumps(snapshot.model_dump(mode="json"), ensure_ascii=False, indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
async def scrape_and_save(self) -> MenuSnapshot:
|
||||
html = await self.fetch_html()
|
||||
snapshot = self.parse_menu(html)
|
||||
self.save_snapshot(snapshot)
|
||||
return snapshot
|
||||
@@ -0,0 +1,6 @@
|
||||
beautifulsoup4==4.12.3
|
||||
fastapi==0.115.12
|
||||
httpx==0.28.1
|
||||
pydantic==2.11.4
|
||||
uvicorn==0.34.2
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
POSTGRES_DB=gorych_bot_db
|
||||
POSTGRES_USER=''
|
||||
POSTGRES_PASSWORD=''
|
||||
POSTGRES_HOST=localhost
|
||||
POSTGRES_PORT=5432
|
||||
@@ -0,0 +1,30 @@
|
||||
GORICH_SITE_URL=https://gorych34.ru/
|
||||
|
||||
# ChromaDB
|
||||
CHROMA_PATH=/data/chroma
|
||||
HUGGINGFACE_CACHE_DIR=/data/huggingface
|
||||
KNOWLEDGE_COLLECTION=gorich_knowledge
|
||||
MENU_COLLECTION=gorich_menu
|
||||
MENU_SNAPSHOT_PATH=/data/menu/gorich_menu.json
|
||||
ANONYMIZED_TELEMETRY=false
|
||||
|
||||
# OpenRouter
|
||||
OPENROUTER_API_KEY=your_openrouter_api_key
|
||||
OPENROUTER_MODEL=mistralai/mistral-medium-3-5
|
||||
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
|
||||
|
||||
# Public app metadata
|
||||
PUBLIC_APP_URL=http://localhost:8001
|
||||
PUBLIC_APP_NAME=Gorich Bot RAG
|
||||
|
||||
# Embeddings
|
||||
EMBEDDING_MODEL=sergeyzh/rubert-mini-frida
|
||||
EMBEDDING_QUERY_PREFIX="search_query: "
|
||||
EMBEDDING_DOCUMENT_PREFIX="search_document: "
|
||||
EMBEDDING_MAX_LENGTH=512
|
||||
EMBEDDING_BATCH_SIZE=32
|
||||
|
||||
# RAG
|
||||
REQUEST_TIMEOUT_SECONDS=60
|
||||
RAG_TOP_K=5
|
||||
INDEX_ON_STARTUP=true
|
||||
@@ -0,0 +1,13 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY rag_api/requirements.txt /app/requirements.txt
|
||||
|
||||
RUN pip install --no-cache-dir --upgrade pip && \
|
||||
pip install --no-cache-dir --index-url https://download.pytorch.org/whl/cpu torch==2.7.0 && \
|
||||
pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY rag_api/app /app/app
|
||||
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1,37 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class Settings:
|
||||
app_name: str = "Gorich RAG API"
|
||||
site_url: str = os.getenv("GORICH_SITE_URL", "https://gorych34.ru/")
|
||||
chroma_path: str = os.getenv("CHROMA_PATH", "/data/chroma")
|
||||
huggingface_cache_dir: str = os.getenv("HUGGINGFACE_CACHE_DIR", "/data/huggingface")
|
||||
knowledge_collection: str = os.getenv("KNOWLEDGE_COLLECTION", "gorich_knowledge")
|
||||
menu_collection: str = os.getenv("MENU_COLLECTION", "gorich_menu")
|
||||
menu_snapshot_path: str = os.getenv("MENU_SNAPSHOT_PATH", "/data/menu/gorich_menu.json")
|
||||
openrouter_api_key: str = os.getenv("OPENROUTER_API_KEY", "")
|
||||
openrouter_model: str = os.getenv("OPENROUTER_MODEL", "mistralai/mistral-medium-3-5")
|
||||
openrouter_base_url: str = os.getenv("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1")
|
||||
public_app_url: str = os.getenv("PUBLIC_APP_URL", "http://localhost:8000")
|
||||
public_app_name: str = os.getenv("PUBLIC_APP_NAME", "Gorich Bot RAG")
|
||||
embedding_model: str = os.getenv(
|
||||
"EMBEDDING_MODEL",
|
||||
"sergeyzh/rubert-mini-frida",
|
||||
)
|
||||
embedding_query_prefix: str = os.getenv("EMBEDDING_QUERY_PREFIX", "search_query: ")
|
||||
embedding_document_prefix: str = os.getenv(
|
||||
"EMBEDDING_DOCUMENT_PREFIX",
|
||||
"search_document: ",
|
||||
)
|
||||
embedding_max_length: int = int(os.getenv("EMBEDDING_MAX_LENGTH", "512"))
|
||||
embedding_batch_size: int = int(os.getenv("EMBEDDING_BATCH_SIZE", "32"))
|
||||
request_timeout: float = float(os.getenv("REQUEST_TIMEOUT_SECONDS", "60"))
|
||||
top_k: int = int(os.getenv("RAG_TOP_K", "5"))
|
||||
index_on_startup: bool = os.getenv("INDEX_ON_STARTUP", "true").lower() == "true"
|
||||
|
||||
|
||||
settings = Settings()
|
||||
@@ -0,0 +1,65 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
import torch
|
||||
import torch.nn.functional as functional
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
from .config import settings
|
||||
|
||||
|
||||
class RuBertMiniFridaEmbedder:
|
||||
def __init__(self) -> None:
|
||||
torch.set_grad_enabled(False)
|
||||
self.device = "cpu"
|
||||
self.max_length = settings.embedding_max_length
|
||||
self.batch_size = settings.embedding_batch_size
|
||||
self.cache_dir = Path(settings.huggingface_cache_dir)
|
||||
self.cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
self.tokenizer = AutoTokenizer.from_pretrained(
|
||||
settings.embedding_model,
|
||||
cache_dir=str(self.cache_dir),
|
||||
)
|
||||
self.model = AutoModel.from_pretrained(
|
||||
settings.embedding_model,
|
||||
cache_dir=str(self.cache_dir),
|
||||
)
|
||||
self.model.to(self.device)
|
||||
self.model.eval()
|
||||
|
||||
@staticmethod
|
||||
def mean_pool(hidden_state: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
|
||||
masked_state = hidden_state * attention_mask.unsqueeze(-1).float()
|
||||
summed = torch.sum(masked_state, dim=1)
|
||||
counts = attention_mask.sum(dim=1, keepdim=True).float()
|
||||
return summed / counts
|
||||
|
||||
def _encode(self, texts: Iterable[str], prompt: str) -> list[list[float]]:
|
||||
prepared_texts = [f"{prompt}{text}" for text in texts]
|
||||
if not prepared_texts:
|
||||
return []
|
||||
|
||||
embeddings: list[list[float]] = []
|
||||
for start in range(0, len(prepared_texts), self.batch_size):
|
||||
batch = prepared_texts[start : start + self.batch_size]
|
||||
encoded = self.tokenizer(
|
||||
batch,
|
||||
max_length=self.max_length,
|
||||
padding=True,
|
||||
truncation=True,
|
||||
return_tensors="pt",
|
||||
)
|
||||
encoded = {key: value.to(self.device) for key, value in encoded.items()}
|
||||
outputs = self.model(**encoded)
|
||||
pooled = self.mean_pool(outputs.last_hidden_state, encoded["attention_mask"])
|
||||
normalized = functional.normalize(pooled, p=2, dim=1)
|
||||
embeddings.extend(normalized.cpu().tolist())
|
||||
return embeddings
|
||||
|
||||
def embed_documents(self, texts: Iterable[str]) -> list[list[float]]:
|
||||
return self._encode(texts, prompt=settings.embedding_document_prefix)
|
||||
|
||||
def embed_queries(self, texts: Iterable[str]) -> list[list[float]]:
|
||||
return self._encode(texts, prompt=settings.embedding_query_prefix)
|
||||
@@ -0,0 +1,64 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
from fastapi import FastAPI
|
||||
|
||||
from .config import settings
|
||||
from .menu_catalog import MenuCatalog
|
||||
from .models import ChatRequest, ChatResponse, IndexResponse
|
||||
from .service import RagService
|
||||
|
||||
|
||||
rag_service = RagService()
|
||||
menu_catalog = MenuCatalog()
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(_: FastAPI):
|
||||
if settings.index_on_startup:
|
||||
await rag_service.reindex()
|
||||
yield
|
||||
|
||||
|
||||
app = FastAPI(title=settings.app_name, version="1.0.0", lifespan=lifespan)
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health() -> dict[str, str]:
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
@app.post("/chat", response_model=ChatResponse)
|
||||
async def chat(request: ChatRequest) -> ChatResponse:
|
||||
return await rag_service.chat(request)
|
||||
|
||||
|
||||
@app.post("/admin/reindex", response_model=IndexResponse)
|
||||
async def reindex() -> IndexResponse:
|
||||
return await rag_service.reindex()
|
||||
|
||||
|
||||
@app.get("/menu/search")
|
||||
async def search_menu(
|
||||
query: str = "",
|
||||
max_price: int | None = None,
|
||||
category: str | None = None,
|
||||
must_include: str | None = None,
|
||||
must_not_include: str | None = None,
|
||||
limit: int = 5,
|
||||
) -> dict[str, object]:
|
||||
return {
|
||||
"items": rag_service.search_menu(
|
||||
query=query,
|
||||
max_price=max_price,
|
||||
category=category,
|
||||
must_include=[value.strip() for value in must_include.split(",")]
|
||||
if must_include
|
||||
else None,
|
||||
must_not_include=[value.strip() for value in must_not_include.split(",")]
|
||||
if must_not_include
|
||||
else None,
|
||||
limit=limit,
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,214 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
from .config import settings
|
||||
from .models import MenuItem, MenuSnapshot
|
||||
|
||||
|
||||
def tokenize(value: str) -> list[str]:
|
||||
raw_tokens = re.findall(r"[a-zA-Zа-яА-Я0-9]+", value.lower())
|
||||
return [
|
||||
token
|
||||
for token in raw_tokens
|
||||
if token not in QUERY_STOPWORDS and (len(token) > 2 or token.isdigit())
|
||||
]
|
||||
|
||||
|
||||
QUERY_STOPWORDS = {
|
||||
"что",
|
||||
"у",
|
||||
"вас",
|
||||
"есть",
|
||||
"из",
|
||||
"как",
|
||||
"ли",
|
||||
"мне",
|
||||
"могу",
|
||||
"хочу",
|
||||
"надо",
|
||||
"для",
|
||||
"под",
|
||||
"про",
|
||||
"или",
|
||||
"это",
|
||||
"эта",
|
||||
"этот",
|
||||
"какой",
|
||||
"какая",
|
||||
"какие",
|
||||
"посоветуй",
|
||||
"посоветуйте",
|
||||
"подбери",
|
||||
"подобрать",
|
||||
"вкусную",
|
||||
"вкусный",
|
||||
"вкусное",
|
||||
}
|
||||
|
||||
|
||||
QUERY_HINTS = {
|
||||
"шаурма": ["шаурма", "классика"],
|
||||
"шаурмы": ["шаурма", "классика"],
|
||||
"шаверма": ["шаурма", "классика"],
|
||||
"шавуха": ["шаурма", "классика"],
|
||||
"острый": ["халапеньо", "шрирача", "том", "ям"],
|
||||
"острая": ["халапеньо", "шрирача", "том", "ям"],
|
||||
"острое": ["халапеньо", "шрирача", "том", "ям"],
|
||||
"острого": ["халапеньо", "шрирача", "том", "ям"],
|
||||
"пикантный": ["халапеньо", "шрирача", "том", "ям"],
|
||||
"сыр": ["сыр", "моцарелла", "пармезан", "крем", "чиз"],
|
||||
"сыром": ["сыр", "моцарелла", "пармезан", "крем", "чиз"],
|
||||
"сыра": ["сыр", "моцарелла", "пармезан", "крем", "чиз"],
|
||||
"сырный": ["сыр", "моцарелла", "пармезан", "крем", "чиз"],
|
||||
"сырная": ["сыр", "моцарелла", "пармезан", "крем", "чиз"],
|
||||
"рыбный": ["лосось"],
|
||||
"рыбная": ["лосось"],
|
||||
"мясной": ["свинина", "курица", "ростбиф", "колбаски", "пепперони"],
|
||||
"мясная": ["свинина", "курица", "ростбиф", "колбаски", "пепперони"],
|
||||
}
|
||||
|
||||
CATEGORY_ALIASES = {
|
||||
"шаурмы": "шаурма",
|
||||
"шаверма": "шаурма",
|
||||
"шавуха": "шаурма",
|
||||
}
|
||||
|
||||
|
||||
class MenuCatalog:
|
||||
def __init__(self) -> None:
|
||||
self.snapshot_path = Path(settings.menu_snapshot_path)
|
||||
|
||||
def exists(self) -> bool:
|
||||
return self.snapshot_path.exists()
|
||||
|
||||
def load_snapshot(self) -> MenuSnapshot:
|
||||
data = json.loads(self.snapshot_path.read_text(encoding="utf-8"))
|
||||
return MenuSnapshot.model_validate(data)
|
||||
|
||||
def menu_documents(self) -> list[tuple[MenuItem, str]]:
|
||||
if not self.exists():
|
||||
return []
|
||||
|
||||
snapshot = self.load_snapshot()
|
||||
documents: list[tuple[MenuItem, str]] = []
|
||||
for item in snapshot.items:
|
||||
text = " | ".join(
|
||||
[
|
||||
item.name,
|
||||
item.category,
|
||||
item.description,
|
||||
", ".join(item.ingredients),
|
||||
item.size or "",
|
||||
item.price_label,
|
||||
]
|
||||
)
|
||||
documents.append((item, text))
|
||||
return documents
|
||||
|
||||
def items_map(self) -> dict[str, MenuItem]:
|
||||
if not self.exists():
|
||||
return {}
|
||||
|
||||
snapshot = self.load_snapshot()
|
||||
return {item.item_id: item for item in snapshot.items}
|
||||
|
||||
def search(
|
||||
self,
|
||||
query: str = "",
|
||||
max_price: int | None = None,
|
||||
category: str | None = None,
|
||||
must_include: list[str] | None = None,
|
||||
must_not_include: list[str] | None = None,
|
||||
limit: int = 5,
|
||||
candidate_ids: list[str] | None = None,
|
||||
semantic_ranks: dict[str, int] | None = None,
|
||||
) -> list[dict[str, object]]:
|
||||
if not self.exists():
|
||||
return []
|
||||
|
||||
must_include = [value.lower() for value in (must_include or [])]
|
||||
must_not_include = [value.lower() for value in (must_not_include or [])]
|
||||
query_tokens = tokenize(query)
|
||||
normalized_category = category.lower() if category else None
|
||||
if normalized_category in CATEGORY_ALIASES:
|
||||
normalized_category = CATEGORY_ALIASES[normalized_category]
|
||||
hint_tokens = []
|
||||
for token in query_tokens:
|
||||
hint_tokens.extend(QUERY_HINTS.get(token, []))
|
||||
candidate_set = set(candidate_ids or [])
|
||||
semantic_ranks = semantic_ranks or {}
|
||||
|
||||
scored_items: list[tuple[int, MenuItem]] = []
|
||||
for item, text in self.menu_documents():
|
||||
if candidate_set and item.item_id not in candidate_set:
|
||||
continue
|
||||
|
||||
lowered = text.lower()
|
||||
|
||||
if normalized_category and item.category.lower() != normalized_category:
|
||||
continue
|
||||
if max_price is not None and item.price is not None and item.price > max_price:
|
||||
continue
|
||||
if max_price is not None and item.price is None:
|
||||
continue
|
||||
if any(value not in lowered for value in must_include):
|
||||
continue
|
||||
if any(value in lowered for value in must_not_include):
|
||||
continue
|
||||
|
||||
score = 0
|
||||
for token in query_tokens:
|
||||
if token in lowered:
|
||||
score += 3
|
||||
if token in item.name.lower():
|
||||
score += 5
|
||||
|
||||
for token in hint_tokens:
|
||||
if token in lowered:
|
||||
score += 6
|
||||
if token == item.category.lower():
|
||||
score += 8
|
||||
|
||||
for token in must_include:
|
||||
if token in lowered:
|
||||
score += 4
|
||||
|
||||
if item.item_id in semantic_ranks:
|
||||
score += max(0, 20 - semantic_ranks[item.item_id])
|
||||
|
||||
if not query_tokens and not must_include and category:
|
||||
score += 1
|
||||
|
||||
scored_items.append((score, item))
|
||||
|
||||
scored_items.sort(
|
||||
key=lambda row: (
|
||||
row[0],
|
||||
-(row[1].price or 0),
|
||||
row[1].name,
|
||||
),
|
||||
reverse=True,
|
||||
)
|
||||
|
||||
results: list[dict[str, object]] = []
|
||||
for score, item in scored_items[:limit]:
|
||||
results.append(
|
||||
{
|
||||
"item_id": item.item_id,
|
||||
"name": item.name,
|
||||
"category": item.category,
|
||||
"description": item.description,
|
||||
"ingredients": item.ingredients,
|
||||
"price": item.price,
|
||||
"price_label": item.price_label,
|
||||
"size": item.size,
|
||||
"photo_url": item.photo_url,
|
||||
"source_url": item.source_url,
|
||||
"score": score,
|
||||
}
|
||||
)
|
||||
|
||||
return results
|
||||
@@ -0,0 +1,72 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class ChatMessage(BaseModel):
|
||||
role: str
|
||||
content: str
|
||||
|
||||
|
||||
class ChatRequest(BaseModel):
|
||||
message: str
|
||||
history: list[ChatMessage] = Field(default_factory=list)
|
||||
|
||||
|
||||
class SourceDocument(BaseModel):
|
||||
source_id: str
|
||||
source_type: str
|
||||
title: str
|
||||
source_url: str
|
||||
snippet: str
|
||||
published_at: datetime | None = None
|
||||
score: float | None = None
|
||||
|
||||
|
||||
class ChatResponse(BaseModel):
|
||||
answer: str
|
||||
model: str
|
||||
sources: list[SourceDocument]
|
||||
tool_results: list[dict[str, Any]] = Field(default_factory=list)
|
||||
|
||||
|
||||
class IndexResponse(BaseModel):
|
||||
indexed_knowledge_documents: int
|
||||
indexed_menu_documents: int
|
||||
menu_items_loaded: int
|
||||
|
||||
|
||||
class KnowledgeDocument(BaseModel):
|
||||
doc_id: str
|
||||
title: str
|
||||
text: str
|
||||
source_type: str
|
||||
source_url: str
|
||||
published_at: datetime | None = None
|
||||
metadata: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class MenuItem(BaseModel):
|
||||
item_id: str
|
||||
name: str
|
||||
category: str
|
||||
description: str
|
||||
ingredients: list[str]
|
||||
price: int | None = None
|
||||
price_label: str
|
||||
size: str | None = None
|
||||
photo_url: str
|
||||
source_url: str
|
||||
scraped_at: datetime
|
||||
metadata: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class MenuSnapshot(BaseModel):
|
||||
source_url: str
|
||||
scraped_at: datetime
|
||||
total_items: int
|
||||
items: list[MenuItem]
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from chromadb.telemetry.product import ProductTelemetryClient, ProductTelemetryEvent
|
||||
from overrides import override
|
||||
|
||||
|
||||
class NoOpProductTelemetry(ProductTelemetryClient):
|
||||
@override
|
||||
def capture(self, event: ProductTelemetryEvent) -> None:
|
||||
return None
|
||||
@@ -0,0 +1,51 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
|
||||
from .config import settings
|
||||
|
||||
|
||||
class OpenRouterClient:
|
||||
def __init__(self) -> None:
|
||||
self.base_url = settings.openrouter_base_url.rstrip("/")
|
||||
self.api_key = settings.openrouter_api_key
|
||||
self.model = settings.openrouter_model
|
||||
self.timeout = settings.request_timeout
|
||||
|
||||
async def chat_completion(
|
||||
self,
|
||||
messages: list[dict[str, Any]],
|
||||
tools: list[dict[str, Any]] | None = None,
|
||||
tool_choice: str | dict[str, Any] | None = None,
|
||||
temperature: float = 0.2,
|
||||
) -> dict[str, Any]:
|
||||
if not self.api_key:
|
||||
raise RuntimeError("OPENROUTER_API_KEY is not configured")
|
||||
|
||||
payload: dict[str, Any] = {
|
||||
"model": self.model,
|
||||
"messages": messages,
|
||||
"temperature": temperature,
|
||||
}
|
||||
if tools:
|
||||
payload["tools"] = tools
|
||||
payload["tool_choice"] = tool_choice or "auto"
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
"HTTP-Referer": settings.public_app_url,
|
||||
"X-Title": settings.public_app_name,
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(timeout=self.timeout) as client:
|
||||
response = await client.post(
|
||||
f"{self.base_url}/chat/completions",
|
||||
headers=headers,
|
||||
json=payload,
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
@@ -0,0 +1,306 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from typing import Any
|
||||
|
||||
from .config import settings
|
||||
from .embeddings import RuBertMiniFridaEmbedder
|
||||
from .menu_catalog import MenuCatalog
|
||||
from .models import ChatRequest, ChatResponse, IndexResponse, KnowledgeDocument, SourceDocument
|
||||
from .openrouter_client import OpenRouterClient
|
||||
from .site_scraper import SiteKnowledgeScraper
|
||||
from .vector_store import VectorStore
|
||||
|
||||
|
||||
class RagService:
|
||||
def __init__(self) -> None:
|
||||
self.vector_store = VectorStore()
|
||||
self.embedder = RuBertMiniFridaEmbedder()
|
||||
self.site_scraper = SiteKnowledgeScraper()
|
||||
self.menu_catalog = MenuCatalog()
|
||||
self.openrouter = OpenRouterClient()
|
||||
self.knowledge_collection = self.vector_store.get_collection(
|
||||
settings.knowledge_collection
|
||||
)
|
||||
self.menu_collection = self.vector_store.get_collection(settings.menu_collection)
|
||||
|
||||
@staticmethod
|
||||
def clear_collection(collection: Any) -> None:
|
||||
ids = collection.get(include=[])["ids"]
|
||||
if ids:
|
||||
collection.delete(ids=ids)
|
||||
|
||||
async def reindex(self) -> IndexResponse:
|
||||
knowledge_documents = await self.site_scraper.scrape()
|
||||
self.clear_collection(self.knowledge_collection)
|
||||
self.clear_collection(self.menu_collection)
|
||||
|
||||
if knowledge_documents:
|
||||
knowledge_texts = [doc.text for doc in knowledge_documents]
|
||||
self.knowledge_collection.add(
|
||||
ids=[doc.doc_id for doc in knowledge_documents],
|
||||
documents=knowledge_texts,
|
||||
embeddings=self.embedder.embed_documents(knowledge_texts),
|
||||
metadatas=[
|
||||
{
|
||||
"title": doc.title,
|
||||
"source_type": doc.source_type,
|
||||
"source_url": doc.source_url,
|
||||
"published_at": doc.published_at.isoformat()
|
||||
if doc.published_at
|
||||
else "",
|
||||
**doc.metadata,
|
||||
}
|
||||
for doc in knowledge_documents
|
||||
],
|
||||
)
|
||||
|
||||
menu_documents = self.menu_catalog.menu_documents()
|
||||
if menu_documents:
|
||||
menu_texts = [document for _, document in menu_documents]
|
||||
self.menu_collection.add(
|
||||
ids=[item.item_id for item, _ in menu_documents],
|
||||
documents=menu_texts,
|
||||
embeddings=self.embedder.embed_documents(menu_texts),
|
||||
metadatas=[
|
||||
{
|
||||
"name": item.name,
|
||||
"category": item.category,
|
||||
"price": item.price if item.price is not None else -1,
|
||||
"price_label": item.price_label,
|
||||
"source_url": item.source_url,
|
||||
"photo_url": item.photo_url,
|
||||
}
|
||||
for item, _ in menu_documents
|
||||
],
|
||||
)
|
||||
|
||||
return IndexResponse(
|
||||
indexed_knowledge_documents=len(knowledge_documents),
|
||||
indexed_menu_documents=len(menu_documents),
|
||||
menu_items_loaded=len(menu_documents),
|
||||
)
|
||||
|
||||
def retrieve_knowledge(self, query: str) -> list[SourceDocument]:
|
||||
if self.knowledge_collection.count() == 0:
|
||||
return []
|
||||
|
||||
query_embedding = self.embedder.embed_queries([query])[0]
|
||||
result = self.knowledge_collection.query(
|
||||
query_embeddings=[query_embedding],
|
||||
n_results=settings.top_k,
|
||||
)
|
||||
documents = result.get("documents", [[]])[0]
|
||||
metadatas = result.get("metadatas", [[]])[0]
|
||||
distances = result.get("distances", [[]])[0]
|
||||
ids = result.get("ids", [[]])[0]
|
||||
|
||||
sources: list[SourceDocument] = []
|
||||
for index, document in enumerate(documents):
|
||||
metadata = metadatas[index]
|
||||
published_at = metadata.get("published_at") or None
|
||||
sources.append(
|
||||
SourceDocument(
|
||||
source_id=ids[index],
|
||||
source_type=str(metadata.get("source_type", "unknown")),
|
||||
title=str(metadata.get("title", ids[index])),
|
||||
source_url=str(metadata.get("source_url", settings.site_url)),
|
||||
snippet=document[:400],
|
||||
published_at=published_at,
|
||||
score=distances[index] if index < len(distances) else None,
|
||||
)
|
||||
)
|
||||
return sources
|
||||
|
||||
def build_system_prompt(self, sources: list[SourceDocument]) -> str:
|
||||
context_parts = []
|
||||
for source in sources:
|
||||
published_label = (
|
||||
f" | дата: {source.published_at.isoformat()}"
|
||||
if source.published_at
|
||||
else ""
|
||||
)
|
||||
context_parts.append(
|
||||
f"[{source.source_type}] {source.title}{published_label}\n"
|
||||
f"Источник: {source.source_url}\n"
|
||||
f"{source.snippet}"
|
||||
)
|
||||
|
||||
context_block = "\n\n".join(context_parts) if context_parts else "Нет найденного контекста."
|
||||
return (
|
||||
"Ты помощник шаурмечной Горыч из Волгограда.\n"
|
||||
"Отвечай по-русски, дружелюбно, естественно и клиентоориентированно.\n"
|
||||
"Не начинай каждый ответ с нового приветствия.\n"
|
||||
"Отвечай только на текущий вопрос пользователя и не повторяй без необходимости уже сказанное ранее.\n"
|
||||
"Не перечисляй ассортимент без запроса. Если человек не просил список позиций, не превращай ответ в каталог.\n"
|
||||
"Для рекомендаций предлагай максимум 3 конкретные позиции.\n"
|
||||
"Не выдумывай факты. Если данные расходятся, прямо скажи об этом и укажи источник.\n"
|
||||
"Если вопрос про режим работы, доставку, контакты, адрес, соцсети, способы заказа или общую информацию о заведении, отвечай по контексту и не используй tool меню.\n"
|
||||
"Используй tool find_menu_items только когда пользователь явно просит подобрать, перечислить, сравнить или найти блюда из меню:\n"
|
||||
"- что есть в меню;\n"
|
||||
"- что посоветуете из конкретной категории;\n"
|
||||
"- что есть с определённым ингредиентом;\n"
|
||||
"- что можно до определённого бюджета;\n"
|
||||
"- что острое, сырное, мясное и так далее, если нужен именно подбор позиций.\n"
|
||||
"Если вопрос общий и консультативный, например про вкус, выбор мяса или что лучше взять в целом, сначала ответь по-человечески и не вызывай tool, пока пользователь не попросит конкретные позиции.\n"
|
||||
"Если всё же используешь tool и он вернул позиции, назови их по именам и по возможности укажи цену.\n"
|
||||
"Если tool ничего не нашёл, честно скажи об этом и предложи уточнить запрос.\n"
|
||||
"Если в контексте есть даты, ориентируйся на более свежие данные.\n\n"
|
||||
"Формат ответа:\n"
|
||||
"- Используй только HTML-теги, подходящие для Telegram/aiogram: <b>, <i>, <u>, <s>, <code>, <pre>, <a href=\"...\">.\n"
|
||||
"- Не используй Markdown со звёздочками, подчёркиваниями или решётками.\n"
|
||||
"- Не пиши служебные фразы вроде 'выберите вопрос ниже'.\n\n"
|
||||
f"Контекст RAG:\n{context_block}"
|
||||
)
|
||||
|
||||
def build_tools(self) -> list[dict[str, Any]]:
|
||||
return [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "find_menu_items",
|
||||
"description": "Подбирает блюда из меню Горыча по описанию, бюджету, категории и ингредиентам. Использовать только для явных запросов о меню и конкретных позициях, не использовать для вопросов о режиме работы, доставке, контактах и общей информации о заведении.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Свободное описание того, что хочет пользователь.",
|
||||
},
|
||||
"max_price": {
|
||||
"type": "integer",
|
||||
"description": "Максимальная цена в рублях.",
|
||||
},
|
||||
"category": {
|
||||
"type": "string",
|
||||
"description": "Категория блюда, например: пицца, донар, шаурма.",
|
||||
},
|
||||
"must_include": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Ингредиенты или слова, которые желательно включить.",
|
||||
},
|
||||
"must_not_include": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Ингредиенты или слова, которых нужно избегать.",
|
||||
},
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"description": "Максимум позиций в выдаче.",
|
||||
"default": 5,
|
||||
},
|
||||
},
|
||||
"required": [],
|
||||
},
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
def run_tool(self, name: str, arguments: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
if name != "find_menu_items":
|
||||
return []
|
||||
|
||||
return self.search_menu(
|
||||
query=arguments.get("query", ""),
|
||||
max_price=arguments.get("max_price"),
|
||||
category=arguments.get("category"),
|
||||
must_include=arguments.get("must_include"),
|
||||
must_not_include=arguments.get("must_not_include"),
|
||||
limit=arguments.get("limit", 5),
|
||||
)
|
||||
|
||||
def search_menu(
|
||||
self,
|
||||
query: str = "",
|
||||
max_price: int | None = None,
|
||||
category: str | None = None,
|
||||
must_include: list[str] | None = None,
|
||||
must_not_include: list[str] | None = None,
|
||||
limit: int = 5,
|
||||
) -> list[dict[str, Any]]:
|
||||
candidate_ids: list[str] | None = None
|
||||
semantic_ranks: dict[str, int] | None = None
|
||||
|
||||
if query and self.menu_collection.count() > 0:
|
||||
query_embedding = self.embedder.embed_queries([query])[0]
|
||||
semantic_result = self.menu_collection.query(
|
||||
query_embeddings=[query_embedding],
|
||||
n_results=min(max(limit * 4, 10), self.menu_collection.count()),
|
||||
)
|
||||
candidate_ids = semantic_result.get("ids", [[]])[0]
|
||||
semantic_ranks = {
|
||||
item_id: rank for rank, item_id in enumerate(candidate_ids, start=1)
|
||||
}
|
||||
|
||||
return self.menu_catalog.search(
|
||||
query=query,
|
||||
max_price=max_price,
|
||||
category=category,
|
||||
must_include=must_include,
|
||||
must_not_include=must_not_include,
|
||||
limit=limit,
|
||||
candidate_ids=candidate_ids,
|
||||
semantic_ranks=semantic_ranks,
|
||||
)
|
||||
|
||||
async def chat(self, request: ChatRequest) -> ChatResponse:
|
||||
sources = self.retrieve_knowledge(request.message)
|
||||
messages: list[dict[str, Any]] = [
|
||||
{"role": "system", "content": self.build_system_prompt(sources)}
|
||||
]
|
||||
for message in request.history:
|
||||
messages.append({"role": message.role, "content": message.content})
|
||||
messages.append({"role": "user", "content": request.message})
|
||||
|
||||
tools = self.build_tools()
|
||||
first_response = await self.openrouter.chat_completion(
|
||||
messages=messages,
|
||||
tools=tools,
|
||||
)
|
||||
choice_message = first_response["choices"][0]["message"]
|
||||
tool_calls = choice_message.get("tool_calls", [])
|
||||
tool_results: list[dict[str, Any]] = []
|
||||
model = first_response.get("model", settings.openrouter_model)
|
||||
|
||||
if tool_calls:
|
||||
messages.append(
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": choice_message.get("content", ""),
|
||||
"tool_calls": tool_calls,
|
||||
}
|
||||
)
|
||||
|
||||
for tool_call in tool_calls:
|
||||
function_name = tool_call["function"]["name"]
|
||||
arguments = json.loads(tool_call["function"]["arguments"] or "{}")
|
||||
result = self.run_tool(function_name, arguments)
|
||||
tool_results.extend(result)
|
||||
messages.append(
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call["id"],
|
||||
"name": function_name,
|
||||
"content": json.dumps(result, ensure_ascii=False),
|
||||
}
|
||||
)
|
||||
|
||||
final_response = await self.openrouter.chat_completion(messages=messages)
|
||||
final_message = final_response["choices"][0]["message"]["content"]
|
||||
model = final_response.get("model", settings.openrouter_model)
|
||||
|
||||
return ChatResponse(
|
||||
answer=final_message,
|
||||
model=model,
|
||||
sources=sources,
|
||||
tool_results=tool_results,
|
||||
)
|
||||
|
||||
answer = choice_message.get("content", "")
|
||||
return ChatResponse(
|
||||
answer=answer,
|
||||
model=model,
|
||||
sources=sources,
|
||||
tool_results=tool_results,
|
||||
)
|
||||
@@ -0,0 +1,226 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from datetime import datetime, timezone
|
||||
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from .config import settings
|
||||
from .models import KnowledgeDocument
|
||||
|
||||
|
||||
MAP_PATTERN = re.compile(
|
||||
r"yandexMaps\.addMap\('([^']+)'\s*,\s*'([^']+)'\s*,\s*'([^']+)'\)"
|
||||
)
|
||||
|
||||
|
||||
def normalize_spaces(value: str) -> str:
|
||||
return " ".join(value.replace("\xa0", " ").split())
|
||||
|
||||
|
||||
def deduplicate_preserving_order(values: list[str]) -> list[str]:
|
||||
seen: set[str] = set()
|
||||
result: list[str] = []
|
||||
for value in values:
|
||||
if value and value not in seen:
|
||||
seen.add(value)
|
||||
result.append(value)
|
||||
return result
|
||||
|
||||
|
||||
def is_meaningful_value(value: str) -> bool:
|
||||
return any(char.isalnum() for char in value)
|
||||
|
||||
|
||||
class SiteKnowledgeScraper:
|
||||
ABOUT_MARKER = "ТЕРРИТОРИЯ БЫСТРОГО ПИТАНИЯ В ВОЛГОГРАДЕ"
|
||||
MENU_MARKER = "МЕНЮ"
|
||||
DELIVERY_MARKER = "ДОСТАВКА"
|
||||
CONTACT_MARKER = "КОНТАКТЫ"
|
||||
CONTACT_END_MARKERS = ("Закрыть", "OK")
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.site_url = settings.site_url
|
||||
self.timeout = settings.request_timeout
|
||||
|
||||
async def fetch_homepage(self) -> str:
|
||||
async with httpx.AsyncClient(
|
||||
headers={"User-Agent": "Mozilla/5.0"},
|
||||
follow_redirects=True,
|
||||
timeout=self.timeout,
|
||||
) as client:
|
||||
response = await client.get(self.site_url)
|
||||
response.raise_for_status()
|
||||
return response.text
|
||||
|
||||
def visible_strings(self, soup: BeautifulSoup) -> list[str]:
|
||||
return [
|
||||
normalized
|
||||
for text in soup.stripped_strings
|
||||
for normalized in [normalize_spaces(text)]
|
||||
if normalized and is_meaningful_value(normalized)
|
||||
]
|
||||
|
||||
def find_marker(self, values: list[str], marker: str, start: int = 0) -> int | None:
|
||||
for index in range(start, len(values)):
|
||||
if values[index] == marker:
|
||||
return index
|
||||
return None
|
||||
|
||||
def find_last_marker(self, values: list[str], marker: str, start: int = 0) -> int | None:
|
||||
for index in range(len(values) - 1, start - 1, -1):
|
||||
if values[index] == marker:
|
||||
return index
|
||||
return None
|
||||
|
||||
def slice_between_markers(
|
||||
self,
|
||||
values: list[str],
|
||||
start_marker: str,
|
||||
end_markers: tuple[str, ...],
|
||||
start_at: int = 0,
|
||||
) -> list[str]:
|
||||
start_index = self.find_marker(values, start_marker, start_at)
|
||||
if start_index is None:
|
||||
return []
|
||||
|
||||
end_index = len(values)
|
||||
for marker in end_markers:
|
||||
marker_index = self.find_marker(values, marker, start_index + 1)
|
||||
if marker_index is not None:
|
||||
end_index = min(end_index, marker_index)
|
||||
|
||||
return values[start_index:end_index]
|
||||
|
||||
def extract_social_links(self, soup: BeautifulSoup) -> list[str]:
|
||||
links: list[str] = []
|
||||
for node in soup.select("[data-page-link]"):
|
||||
href = node.get("data-page-link")
|
||||
label = normalize_spaces(node.get_text(" ", strip=True))
|
||||
if not href:
|
||||
continue
|
||||
if label:
|
||||
links.append(f"{label}: {href}")
|
||||
else:
|
||||
links.append(str(href))
|
||||
return deduplicate_preserving_order(links)
|
||||
|
||||
def extract_map_coordinates(self, html: str) -> str | None:
|
||||
match = MAP_PATTERN.search(html)
|
||||
if not match:
|
||||
return None
|
||||
latitude = normalize_spaces(match.group(2))
|
||||
longitude = normalize_spaces(match.group(3))
|
||||
return f"{latitude}, {longitude}"
|
||||
|
||||
def parse_homepage(self, html: str) -> list[KnowledgeDocument]:
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
strings = self.visible_strings(soup)
|
||||
documents: list[KnowledgeDocument] = []
|
||||
scraped_at = datetime.now(timezone.utc)
|
||||
|
||||
meta_description = soup.select_one('meta[name="description"]')
|
||||
if meta_description and meta_description.get("content"):
|
||||
documents.append(
|
||||
KnowledgeDocument(
|
||||
doc_id="site-meta-description",
|
||||
title="Краткое описание заведения",
|
||||
text=normalize_spaces(meta_description["content"]),
|
||||
source_type="about",
|
||||
source_url=self.site_url,
|
||||
metadata={"scraped_at": scraped_at.isoformat()},
|
||||
)
|
||||
)
|
||||
|
||||
about_section = self.slice_between_markers(
|
||||
strings,
|
||||
self.ABOUT_MARKER,
|
||||
(self.MENU_MARKER,),
|
||||
)
|
||||
if about_section:
|
||||
documents.append(
|
||||
KnowledgeDocument(
|
||||
doc_id="site-about",
|
||||
title=about_section[0],
|
||||
text="\n".join(deduplicate_preserving_order(about_section[1:])),
|
||||
source_type="about",
|
||||
source_url=self.site_url,
|
||||
metadata={"scraped_at": scraped_at.isoformat()},
|
||||
)
|
||||
)
|
||||
|
||||
social_links = self.extract_social_links(soup)
|
||||
if social_links:
|
||||
documents.append(
|
||||
KnowledgeDocument(
|
||||
doc_id="site-links",
|
||||
title="Соцсети и внешние площадки",
|
||||
text="\n".join(social_links),
|
||||
source_type="links",
|
||||
source_url=self.site_url,
|
||||
metadata={"scraped_at": scraped_at.isoformat()},
|
||||
)
|
||||
)
|
||||
|
||||
menu_index = self.find_marker(strings, self.MENU_MARKER)
|
||||
delivery_start = self.find_last_marker(
|
||||
strings,
|
||||
self.DELIVERY_MARKER,
|
||||
start=(menu_index + 1) if menu_index is not None else 0,
|
||||
)
|
||||
contact_start = self.find_last_marker(
|
||||
strings,
|
||||
self.CONTACT_MARKER,
|
||||
start=(delivery_start + 1) if delivery_start is not None else 0,
|
||||
)
|
||||
delivery_section = (
|
||||
strings[delivery_start:contact_start]
|
||||
if delivery_start is not None and contact_start is not None and contact_start > delivery_start
|
||||
else []
|
||||
)
|
||||
if delivery_section:
|
||||
documents.append(
|
||||
KnowledgeDocument(
|
||||
doc_id="site-delivery",
|
||||
title=delivery_section[0],
|
||||
text="\n".join(deduplicate_preserving_order(delivery_section[1:])),
|
||||
source_type="delivery",
|
||||
source_url=self.site_url,
|
||||
metadata={"scraped_at": scraped_at.isoformat()},
|
||||
)
|
||||
)
|
||||
|
||||
auth_index = len(strings)
|
||||
if contact_start is not None:
|
||||
for marker in self.CONTACT_END_MARKERS:
|
||||
marker_index = self.find_marker(strings, marker, contact_start + 1)
|
||||
if marker_index is not None:
|
||||
auth_index = min(auth_index, marker_index)
|
||||
contact_section = (
|
||||
strings[contact_start:auth_index]
|
||||
if contact_start is not None and auth_index > contact_start
|
||||
else []
|
||||
)
|
||||
if contact_section:
|
||||
metadata = {"scraped_at": scraped_at.isoformat()}
|
||||
coordinates = self.extract_map_coordinates(html)
|
||||
if coordinates:
|
||||
metadata["map_coordinates"] = coordinates
|
||||
|
||||
documents.append(
|
||||
KnowledgeDocument(
|
||||
doc_id="site-contact",
|
||||
title=contact_section[0],
|
||||
text="\n".join(deduplicate_preserving_order(contact_section[1:])),
|
||||
source_type="contact",
|
||||
source_url=self.site_url,
|
||||
metadata=metadata,
|
||||
)
|
||||
)
|
||||
|
||||
return documents
|
||||
|
||||
async def scrape(self) -> list[KnowledgeDocument]:
|
||||
html = await self.fetch_homepage()
|
||||
return self.parse_homepage(html)
|
||||
@@ -0,0 +1,23 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from chromadb import PersistentClient
|
||||
from chromadb.api.models.Collection import Collection
|
||||
from chromadb.config import Settings as ChromaSettings
|
||||
|
||||
from .config import settings
|
||||
|
||||
|
||||
class VectorStore:
|
||||
def __init__(self) -> None:
|
||||
chroma_settings = ChromaSettings(
|
||||
anonymized_telemetry=False,
|
||||
chroma_product_telemetry_impl="app.noop_telemetry.NoOpProductTelemetry",
|
||||
chroma_telemetry_impl="app.noop_telemetry.NoOpProductTelemetry",
|
||||
)
|
||||
self.client = PersistentClient(
|
||||
path=settings.chroma_path,
|
||||
settings=chroma_settings,
|
||||
)
|
||||
|
||||
def get_collection(self, name: str) -> Collection:
|
||||
return self.client.get_or_create_collection(name=name)
|
||||
@@ -0,0 +1,7 @@
|
||||
beautifulsoup4==4.12.3
|
||||
chromadb==1.0.8
|
||||
fastapi==0.115.9
|
||||
httpx==0.28.1
|
||||
pydantic==2.11.4
|
||||
transformers==4.57.1
|
||||
uvicorn==0.34.2
|
||||
Reference in New Issue
Block a user