I2pRfc/I2pRfc1002I2pSter

I2pRFC / I2pRfc1002I2pSter / RFC 1002

RFC 1002 - I2P Napster/DirectConnect like file sharing application

Status: INCOMPLETE DRAFT

Abstract

I2pSter is a searchable file sharing network for I2P. It is similiar in topology to DirectConnect. It should allow swarmable downloads ala BitTorrent.

There is no formal name yet so we will refer to this application as I2pSter

Definitions

I2pSter is a file sharing network for I2P.
'peer' refers to a node running the I2pSter client software.
'server' refers to a centralized node that collects (file hashes, filenames, metadata) to peer mappings.
'file hash' refers to a sha-1 (or similiar) hash of the file to uniquely identify it.
'block hash' refers to a sha-1 (or similiar) hash of an individual block in a file to uniquely identify it.
'block hash table' refers to a table of block hashes that either make up a file in it's entirety or are the blocks that a peer currently has of an uncompleted download of a file.
'file metadata' refers to the data a peer stores regarding files. It can contain the filename, filesize, file hash, description, and optional fields.

Content

Warning, what follows is an unorganized and somewhat incoherent dump of various ideas and/or features for I2pSter. These will be parsed into something useful soon.

Brain Dump

clients generate file hashes of each file it shares. These are sent to the server at login along with file name, size, and metadata. The server collects this as the files available on the network.
the data above is given a ttl by the server and the client is expected to 'renew' this data before the ttl expires.
the 'renew' procedure will be similiar to a dhcp lease scheme: at 50% of the ttl, the client will renew, if that fails, at 50% of the remaining time it will renew, ad naseum.
clients generate hash tables of each file. This consists of hashing $blocksize blocks of each file and storing them in a local database. These will be provided to other clients that are requesting the file.
client sends search terms to server; server sends back file hashes that match search terms and the destinations that can provide those file hashes.
client connects to each destination and requests full block hash table for file (optimize this in the future).
client parses the hash table and requests the $blocksize blocks by hash from the destinations.
client requests multiple blocks from multiple destinations at once (swarming)
even if a client has not finished downloading a file, but has at least one block of said file, it reports to the server that it has the file hash in order to allow swarmable downloading from peers who have not yet finished downloading - ala BitTorrent
clients also listen for requests and fulfills them.
client that requests blocks verifies them vs. the hash table and collects them into target file.
at the end of the download, the hash is verified to ensure no corruption.

Peers

Peer tasks:

At launch, and periodically while running, the peer will scan it's 'shared', 'downloaded', and 'temp' directories for files. These files will be be updated in the database if new or changed since last scan. File and block hashes will be recalculated as needed and stored in the database.
At launch, and periodically (as defined in the metadata ttl description *future*) while running, the peer will update the server with it's list of files and their file metadata.
Will open up it's destination in order to recieve requests or data

Server

Server tasks:

Open up destination for requests
Handle metadata updates (expound)
Handle search requests (expound)
Expire data in database according to metadata ttl descripion (*future*)

Protocol

Considering an XML-RPC based protocol - Need to see if Python's xmlrpclib is easily overloaded to use the SAM libraries for socket connections.

File Metadata

Contains file name, search terms, description, file hash, file size, protocol (I2pSter or BT)

Questions

How to handle when two peers claim to have the same file (via a file hash) however return differing block hash tables?

Flaws

Centralized-ish design is prone to attack on the central server.
These attacks may include a traffic DoS attack, a database poisoning attack, etc.

Current TODO

Organize brain dump into something useful.
Everything, really.

Roadmap

v0.0.1

Client App: something
Server App: something

v0.0.2

Client App: something
Server App: something

v0.1

Feature complete
Client App: something
Server App: something

References

none at this time

Author Contacts

UserJdot

Comments

10:58 < jrandom2p> what are your thoughts wrt chat along side that - will they want to use irc or will the server allow (persistent?) chat through it? 
10:58 < jrandom2p> also, the ability to query a peer you're already talking to for what files they have wuold be neat (and would encourage social interactions) 11:02 < jdot> chat was not part of the initial plan, but i'm hoping the arch and protocol design will allow for that extensibility 
11:02 < jdot> wrt to your second comment, i had not thought of that.  that is definately a needed feature 
11:04 < jrandom2p> with the swarming, would the server be operating as a tracker, or more of a meta-tracker, where each peer tracks their files?  or is it just swarming w/out the bt economics stuff? 
11:08 < jdot> jrandom2p: initially, no economics.  once peers have exhausted their known destinations for needed blocks, they would re-query the server for more peers to talk to and also re-query exiting peers to see if they have more blocks

I fail to see any advantages over BitTorrent, except perhaps the ease of inserting new files and a centralized search server. Would it really be worth the trouble of implementing a new client and server? I have always considered the DirectConnect protocol a failure, even though they have large hubs and communities. In my opinion, I2P could rather use a fully distributed P2P network, if it needs a new P2P protocol at all. -- UghaBugha

My 2 cents go to kademlia decentralized source finding, as in emule. As long as it's prototyping phase, Kad is practically as easy to implement as anything (I have a java/binaryprotocol prototype :D). -- stoxx

How to handle when two peers claim to have the same file (via a file hash) however return differing block hash tables? Hash tree. The hash of a file is the hash of its parts' hashes. When one partial hash changes, the filehash changes as well. -- stoxx

I2pRfc/I2pRfc1002I2pSter

Contents