#archiveteam-bs 2016-04-11,Mon

↑back Search

Time Nickname Message
00:06 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:09 🔗 dashcloud has joined #archiveteam-bs
00:15 🔗 FalconK has quit IRC (Ping timeout: 260 seconds)
00:16 🔗 FalconK has joined #archiveteam-bs
00:50 🔗 sigkell has quit IRC (Ping timeout: 260 seconds)
00:58 🔗 Frogging am 19, can confirm that disk problems sound worrisome
00:59 🔗 Stiletto has quit IRC (Read error: Operation timed out)
01:00 🔗 sigkell has joined #archiveteam-bs
01:00 🔗 JesseW snicker
01:14 🔗 yipdw_ oh nice, https://github.com/ptal/expected is being proposed with Haskell do-ish notation for C++
01:17 🔗 yipdw_ so you can write x <- failure_or_result(); y <- failure_or_result(); return op(x, y); and the compiler will generate appropriate things to propagate errors
01:17 🔗 Stiletto has joined #archiveteam-bs
01:17 🔗 yipdw_ this has code concision benefits, but mostly I like it because it means I can use more PragmataPro ligatures
01:45 🔗 Atros has quit IRC (Read error: Operation timed out)
01:45 🔗 atrocity has joined #archiveteam-bs
01:59 🔗 atrocity and as usual, i have a lot more upload bandwidth than download
01:59 🔗 atrocity i'll never understand this
02:01 🔗 atrocity 62 down, 68 up, but i'm rated at 50/50
02:01 🔗 atrocity not complaining, just weird that up is so much faster
02:04 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:05 🔗 SN4T14 has quit IRC (Read error: Connection reset by peer)
02:08 🔗 dashcloud has joined #archiveteam-bs
03:16 🔗 SketchCow By the way, for the record, I've been downloading from that site everyone got banned from (bootlegs) for 4 days now, no ban.
03:16 🔗 SketchCow Why? Because I didn't attack a long-standing site with 10 parallel hits, that's why
03:17 🔗 SketchCow Also, on FOS, I've now initiated 6 separate uploading processes. (Creating the MegaWARCs, uploading)
03:18 🔗 SketchCow Eventually, I'm making a new uploader script that is centralized, puts the megawarc completions on a site with reverse viewing, and that notifies IA of the work, etc.
03:18 🔗 SketchCow I just need to sit down to do all that.
03:18 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:19 🔗 bwn has quit IRC (Ping timeout: 492 seconds)
03:20 🔗 BnA-Rob1n has quit IRC (Ping timeout: 244 seconds)
03:22 🔗 dashcloud has joined #archiveteam-bs
03:22 🔗 BnA-Rob1n has joined #archiveteam-bs
03:22 🔗 Simpbrai_ has quit IRC (Ping timeout: 244 seconds)
03:22 🔗 Simpbrai_ has joined #archiveteam-bs
03:40 🔗 SketchCow Spent the weekend planning out the Japan trip. Japan trip will be nuts. I'm essentially off the grid.
03:41 🔗 SketchCow If I come back and there's a coup, I'm getting on my horse with a samurai sword and you should all inform your relatives you should be considered dead.
03:41 🔗 SketchCow Other than that, it'll be a good time. (End of May to end of June)
03:42 🔗 BlueMaxim Why are you going to Japan, out of curiosity?
03:42 🔗 phuzion Nice! Have fun in Japan, SketchCow
03:43 🔗 phuzion (Yes, I realize it's quite a ways away still)
04:00 🔗 JesseW http://archiveteam.org/index.php?title=Internet_Archive/Collections -- list of all the IA collections that contain other collections (at least, based on the recheck of the identifiers provided back in March 2015)
04:02 🔗 yipdw_ it occurred to me that I have not seen a chibi Jason Scott
04:04 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:12 🔗 RedType kawaii~
04:12 🔗 bwn has joined #archiveteam-bs
04:13 🔗 Sk1d has joined #archiveteam-bs
04:43 🔗 metalcamp has joined #archiveteam-bs
04:48 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
04:56 🔗 wyatt8740 the latest XKCD has me rolling on the floor by the end of it. it all accelerated so smoothly.
05:05 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
05:10 🔗 Stilett0 has joined #archiveteam-bs
05:11 🔗 beardicus has quit IRC (Read error: Operation timed out)
05:12 🔗 Stiletto has quit IRC (Read error: Operation timed out)
05:12 🔗 schbirid has joined #archiveteam-bs
05:13 🔗 Jonimus has quit IRC (Read error: Operation timed out)
05:16 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
05:17 🔗 Honno has joined #archiveteam-bs
05:35 🔗 phuzion wyatt8740: Haha wow, I need to catch up on xkcd apparently
05:37 🔗 JesseW https://archive.org/stream/the-patch-19xx/the_patch.19xx#page/n0/mode/1up <- THE PATCH! "Wouldn't you like to access YOUR entire typeset?"
05:44 🔗 beardicus has joined #archiveteam-bs
05:44 🔗 Mayonaise has joined #archiveteam-bs
06:32 🔗 metalcamp has joined #archiveteam-bs
06:44 🔗 Jonimus has joined #archiveteam-bs
06:44 🔗 swebb sets mode: +o Jonimus
06:53 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
07:20 🔗 Honno has quit IRC (Read error: Operation timed out)
07:31 🔗 bwn has quit IRC (Read error: Operation timed out)
07:59 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
08:05 🔗 metalcamp has joined #archiveteam-bs
08:07 🔗 zenguy has quit IRC (Read error: Operation timed out)
08:11 🔗 bwn has joined #archiveteam-bs
08:46 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:49 🔗 dashcloud has joined #archiveteam-bs
09:33 🔗 lbft has quit IRC (Quit: Bye)
09:34 🔗 lbft has joined #archiveteam-bs
09:43 🔗 lbft has quit IRC (Quit: Bye)
09:47 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
10:55 🔗 atrocity SketchCow: isn't going to japan and being off the grid like going to a LEGO convention and playing with Megablox?
11:51 🔗 metalcamp has joined #archiveteam-bs
12:37 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
13:04 🔗 SN4T14 has joined #archiveteam-bs
13:27 🔗 GLaDOS has quit IRC (Ping timeout: 260 seconds)
13:30 🔗 GLaDOS has joined #archiveteam-bs
13:33 🔗 VADemon has joined #archiveteam-bs
13:49 🔗 Stilett0 is now known as Stiletto
13:56 🔗 Honno has joined #archiveteam-bs
13:59 🔗 Start has quit IRC (Quit: Disconnected.)
14:29 🔗 Yoshimura has joined #archiveteam-bs
14:31 🔗 Kaz has joined #archiveteam-bs
14:32 🔗 kurt has quit IRC (Quit: leaving)
14:48 🔗 SketchCow http://www.storybench.org/to-scrape-or-not-to-scrape-the-technical-and-ethical-challenges-of-collecting-data-off-the-web/
14:49 🔗 SketchCow EVERYONE NOT FOLLOW THAT PLEASE
14:49 🔗 SketchCow I suspect I will check in, in Japan, but my time will equally be spent walking, going to events, and generally not being able to react to anything online.
14:54 🔗 Start has joined #archiveteam-bs
14:58 🔗 Yoshimura I don't get why not to scrape data when they have API
14:58 🔗 Yoshimura Or if it excludes the API
15:21 🔗 JesseW has joined #archiveteam-bs
15:57 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
16:05 🔗 Stiletto has quit IRC (Ping timeout: 260 seconds)
16:06 🔗 Start has quit IRC (Quit: Disconnected.)
16:43 🔗 arkiver SketchCow: from when to when are you not available?
16:43 🔗 arkiver err nevermind
16:43 🔗 arkiver "End of May to end of June"
16:49 🔗 Atluxity we got time to plan the coup...
16:52 🔗 * arkiver muhahaha
16:59 🔗 Honno never thought I'd say this but I'm so happy my 2tb harddrive came with free one day delievery
16:59 🔗 Honno have to get rid of my optical drive to use it tho
17:00 🔗 Honno fortunely, who uses disks anymore
17:02 🔗 Stiletto has joined #archiveteam-bs
17:04 🔗 Yoshimura I do use discs. The only externally connected thing is optical drive.
17:04 🔗 Yoshimura Optical media are kind of superior. Also if you are archiving old stuff, you need to create images.
17:04 🔗 Honno probably shouldn't of said that in an archivist irc haha
17:05 🔗 * HCross2 grabs hammer of justice
17:05 🔗 Honno oh do you guys find that, huh
17:07 🔗 * yipdw_ has had optical media fail in cold storage and these days just keeps stuff powered on disks
17:07 🔗 yipdw_ but I fear I have incited a nerd riot by stating that, so I will just pass that off as anecdotal and exit
17:08 🔗 wyatt8740 JesseW: "THE PATCH (tm)"
17:08 🔗 xmc eh, i keep things on spinning rust too
17:08 🔗 wyatt8740 I only really use optical media for playing sega CD software
17:08 🔗 wyatt8740 of which there's barely any good stuff
17:09 🔗 wyatt8740 I have all my stuff on spinning rust
17:09 🔗 wyatt8740 :\
17:09 🔗 yipdw_ damnit I should have followed the topic advice
17:09 🔗 xmc :P
17:10 🔗 wyatt8740 another sign that optical media is dying http://i.imgur.com/rCROKbJ.jpg
17:10 🔗 wyatt8740 also taiyo yuden's partnership with JVC for CD-R's stopped this january
17:29 🔗 Honno has quit IRC (Read error: Operation timed out)
17:37 🔗 Yoshimura I wonder how Archive.org stores the data. Anyone knows?
17:38 🔗 Yoshimura yipdw_: Both ways of storage have its merits
17:38 🔗 yipdw_ I fucking knew it
17:38 🔗 * yipdw_ mute
17:38 🔗 zino unmute
17:41 🔗 joepie91 Yoshimura: live HDDs for digital data
17:41 🔗 joepie91 also physical books etc
17:41 🔗 Yoshimura joepie91: No, I meant the actual ondisk format.
17:42 🔗 Yoshimura If they compress blocks, or if they compress each page separately, or if they use large warcs and parse it.
17:42 🔗 xmc IT'S WARC
17:42 🔗 Yoshimura Warc is just format, their ondisc storage might be different.
17:43 🔗 xmc nope
17:43 🔗 xmc listen to me
17:43 🔗 xmc it's warc
17:43 🔗 Yoshimura So they store gziped warc files?
17:43 🔗 xmc did i say something confusing?
17:43 🔗 Yoshimura How large? And if one wants one page, they decompress whole large warc?
17:44 🔗 xmc how large depends on how big the file from the crawler is
17:44 🔗 xmc they use a different gzip stream per file with a cdx to tell them where to seek and start decompressing
17:44 🔗 Yoshimura So they just use index file for web and then decompress huge files to get one page?
17:44 🔗 xmc they use a different gzip stream per file with a cdx to tell them where to seek and start decompressing
17:45 🔗 Yoshimura They did talk about special very efficient storage, and this sounds like dumb idea.
17:45 🔗 yipdw_ .warc.gz is a concatenation of gzipped warc records; you don't need to decompress everything
17:45 🔗 xmc ok
17:45 🔗 yipdw_ you seek to the offset and decompress just that
17:45 🔗 Yoshimura Does not make sense what you said, but ok.
17:45 🔗 xmc that's ok it doesn't have to make sense
17:45 🔗 xmc it works
17:46 🔗 Yoshimura It does.
17:46 🔗 Yoshimura What would make sense is that its gzipfile, with per warc record blocks.
17:46 🔗 xmc yep, and we're glad that it does work :)
17:46 🔗 Yoshimura I meant it does have to make sense xD
17:46 🔗 yipdw_ it does, and I suspect we're talking about the same thing
17:46 🔗 Yoshimura No, gzip has headers.
17:46 🔗 yipdw_ please analyze the structure of these warc.gz files generated by Archive Team
17:47 🔗 Yoshimura So concatenation would mean each record has gzip file header.
17:47 🔗 yipdw_ yeah
17:47 🔗 yipdw_ they do
17:47 🔗 Yoshimura And gzip can also decompress it as just one file?
17:47 🔗 yipdw_ yes, or as individual records
17:47 🔗 Yoshimura I have to then take a second look at gzip format.
17:48 🔗 Yoshimura Ok, thanks that helped a lot. Still do not see what is special very efficient storage on this.
17:48 🔗 yipdw_ I don't know who told you that it is special or very efficient
17:48 🔗 Yoshimura Compressing together multiple files, or even using xdelta for historical versions, or a common dictionary would make it work much much better.
17:48 🔗 yipdw_ it's not hyper-efficient but it's good enough for now
17:49 🔗 yipdw_ yes, and it'd cause complications elsewhere
17:49 🔗 Yoshimura I read that on archive.org pages few months back.
17:49 🔗 yipdw_ please submit your ideas to the Internet Archive, their engineers will be happy to solicit feedback from the Internet
17:49 🔗 Yoshimura No complications if you use deflate with pre-seed for dictionary.
17:49 🔗 yipdw_ the Petabox hardware is (was?) novel, if that's what you mean
17:50 🔗 Start has joined #archiveteam-bs
17:50 🔗 yipdw_ but that's still an assemblage of off-the-shelf parts, arranged to reduce administrator overhead
17:50 🔗 xmc sure, any competent engineer could squeeze out another 10%. but when you put the same data in less space you add fragility
17:50 🔗 xmc fragility is not a goal of the internet archive
17:50 🔗 Yoshimura xmc: I am talking about a lot more then 10% while introducing next to no fragility, if you are concerned with that.
17:51 🔗 xmc why are you seeking to argue with me over this?
17:51 🔗 xmc you asked how it works, i explained how it works
17:51 🔗 xmc neither of us are in a position to change anything
17:51 🔗 Yoshimura xmc: I am not, sorry it it looks like so.
17:51 🔗 xmc re fucking lax
17:52 🔗 yipdw_ well at least nobody mentioned blockchains
17:52 🔗 xmc the world is not up to your standards and you're just going to have to be ok with it
17:52 🔗 xmc if you want change you should work at the archive and push it through
17:52 🔗 xmc i'm sure they'd be happy to have more competent engineers
17:52 🔗 bwn has quit IRC (Ping timeout: 246 seconds)
17:53 🔗 Yoshimura It does make sense if you have enough space to store it in simple format, yes. Still deflate with preseed dictionary would make it lot more (for html, or even images (headers)...) efficient adding next to no change, as zlib that is the standard for gzip, is standard and deflate is just gzip withotu header.
17:53 🔗 Yoshimura I am from other continent, so working there could be a problem, else would gladly do that.
17:54 🔗 Yoshimura yakfish: Maybe it was written in way that is confused soft and hard parts. Not sure
17:55 🔗 yipdw_ a preseed dictionary might improve compression ratio, but it would also complicate access
17:55 🔗 yipdw_ if you lose the dictionary -- whether it be a separate file, or a separate part of the stream -- you're in trouble
17:55 🔗 yipdw_ one advantage of concatenated gzipped warc records is that you have many ways to access that stream, and you can recover parts of it without much hassle
17:56 🔗 yipdw_ so you need to balance that in too. accessibility is a major concern alongside storage efficiency
17:57 🔗 Yoshimura yipdw_: Well I meant same way, per record deflate, of course.
17:58 🔗 yipdw_ if you can benchmark this and demonstrate significant gains on typical datasets that would be interesting
17:58 🔗 yipdw_ bonus points if existing replay tools can use it with no changes
17:58 🔗 Yoshimura ok, good idea.
17:58 🔗 Yoshimura IDK what are replay tools.
17:58 🔗 yipdw_ the other half of the equation that makes everything we do actually useful
17:58 🔗 yipdw_ Wayback Machine, pywb, webarchiveplayer are some examples
17:59 🔗 Yoshimura And dictionary would be a standard thing that should not be lost. If the warc was compressed in way, that only body of webpage would be gzipped in final format, so you could just stream that chunk to browser it would make sense.
17:59 🔗 yipdw_ please do investigate
18:00 🔗 Yoshimura But if its whole warc and you do not replay whole headers, means you still need to decompress, and that would make other (potentially time expensive, but very efficient (xdelta)) compression much better.
18:00 🔗 Yoshimura But if you still want to keep speed plus almost total compatibility (same library, different calls) then deflate would be the way. I will.
18:01 🔗 yipdw_ total compatibility
18:01 🔗 Yoshimura Total is nto doable.
18:01 🔗 yipdw_ almost will make for an interesting experiment, but it's a barrier to accessibility
18:01 🔗 Yoshimura Not much. Needs few lines of code change.
18:02 🔗 yipdw_ please check out http://archiveteam.org/index.php?title=The_WARC_Ecosystem
18:03 🔗 yipdw_ maintaining compatibility amongst these tools is a lot more important than saving a dozen gigabytes in a 50 GB fil
18:03 🔗 yipdw_ e
18:03 🔗 Yoshimura Building dictionary is though work. Needs either software that can analyze it well (lcs tree with statistical outputs along building the tree, or similar) or a lot of manual labor.
18:03 🔗 Yoshimura Okay. Will check it, but if someone wants 100% compatibility and there is not other way, then my investigation would be worthless. (Not to me though, so I might do it anyway)
18:04 🔗 Yoshimura Wget dedup is broken btw.
18:05 🔗 Honno has joined #archiveteam-bs
18:05 🔗 DFJustin https://www.opposuits.com/pac-man-suit.html
18:05 🔗 xmc hawt
18:07 🔗 Yoshimura https://github.com/ArchiveTeam/archiveteam-megawarc-factory#2-the-packer
18:08 🔗 Yoshimura Packing involves lots of gzipping and takes some time.
18:08 🔗 Yoshimura If they are just concatenated, why it involves lot of gzipping?
18:09 🔗 xmc all your questions can be answered by reading this https://github.com/ArchiveTeam/megawarc/blob/f77638dbf7d0c4a7dd301217ee04fbc6a3c3ebbf/megawarc
18:09 🔗 Start has quit IRC (Quit: Disconnected.)
18:14 🔗 Yoshimura All found is that it tests the gzip files and uses gzip for that.
18:14 🔗 Yoshimura Which is honestly retarded.
18:15 🔗 Yoshimura As it has to spawn a lot of processes, and the warc files are tiny, so I can see most time spent on spawning the processes then on actual tests.
18:17 🔗 Yoshimura All the other gzip operations are handled via python via zlib and they are only used for json metadata. So the README.md could be in false statement or misleading.
18:17 🔗 Honno_ has joined #archiveteam-bs
18:18 🔗 Yoshimura Not sure about capability of python, but in best case it would take only single process and would reuse handle to the library for each warc record.
18:18 🔗 Yoshimura In the worse case, it would re-init the library handles.
18:20 🔗 Yoshimura And also while I dislike python personally, one has to ask himself why the whole megawarc-factory is not scripted, perhaps in python, using workers with thread safe operations for the queues.
18:21 🔗 Honno has quit IRC (Read error: Operation timed out)
18:22 🔗 Start has joined #archiveteam-bs
18:25 🔗 bwn has joined #archiveteam-bs
18:29 🔗 Honno has joined #archiveteam-bs
18:36 🔗 ndizzle has quit IRC (Read error: Connection reset by peer)
18:38 🔗 Honno_ has quit IRC (Read error: Operation timed out)
18:38 🔗 ndiddy has joined #archiveteam-bs
18:45 🔗 Yoshimura So after going through the info, I conclude that gzip is nice, while deflate with dictionary and a new tools (one, different languages) would add little complexity for potentially great benefits (to be determined experimentally with crappy amateur with insight dictionary). Those tools would be mere wrappers for zlib, just like gzip. Which only has
18:45 🔗 Yoshimura additional header. In this case plus dictionary, that would be elaborately selected over time by group of experts with input from amateurs over the world. So given in context of time, there could be only one universal dictionary for course of many, many years. Or optionally even language specific ones, or framework relative ones. ... Basically this
18:45 🔗 Yoshimura process would be same like HTTP/2, which does use predefined dictionary to compress HTTP headers and both save BW and improve latency at the same time.
18:46 🔗 RedType has left
18:48 🔗 zino External gzip is nice. It can be shadowed by pigz for multithreaded goodness.
18:49 🔗 Yoshimura zino: You fail to understand.
18:49 🔗 Yoshimura It does iterate over warc records, unless I misunderstood.
18:49 🔗 Yoshimura So pigz has no place there, also its made to test the file for errors. So I would only use gzip or zlib.
18:50 🔗 Yoshimura If space was concern over CPU, xdelta or per-line diff could be incorporated, having per-site specific dictionaries. Which could be made HTTP/1.1 SDCH extension compatible. ... This could be a great thing with great potential for personal or small scale archiving. But if someone does not care about space or money much, the deflate would be enough.
18:50 🔗 Yoshimura In time though CPU power is getting cheaper, while storage goes marginally down.
18:53 🔗 zino Oh, I missed the upper part of your wall of text. I missed that you where talking academic advantages, not current engineering optimizations.
18:54 🔗 Yoshimura arkiver: Asking you here xD
18:54 🔗 arkiver https://github.com/ArchiveTeam/fotolog-grab/blob/master/fotolog.lua#L122-L126
18:54 🔗 arkiver just some little check
18:54 🔗 arkiver but it works
18:54 🔗 Yoshimura zino: Actually was talking in context of megawarc tool, which in my opinion is not bad though not great. Megawarc factory is another story though.
18:55 🔗 xmc do you want to improve megawarc? then make it better!
18:55 🔗 xmc patches are gladly accepted
18:55 🔗 Yoshimura xmc: I would rewrite it xD
18:55 🔗 xmc ok
18:55 🔗 xmc do it
18:55 🔗 Yoshimura Also not python, which might be... against some people.
18:56 🔗 xmc i promise if it works just as well & is faster, then it'll be used
18:56 🔗 xmc also if it doesn't require six hours of sysadmin work to put into place
18:56 🔗 Yoshimura But might try to make a patch for the gzip madness. Which is just what I talked all day today. Replacing gzip binary with zlib gzip to test the files.
18:57 🔗 Yoshimura I guess I have to download some megawarc to test it then xD
18:58 🔗 Yoshimura The only way I am ok with python is its widespread use, other people knowing it and it does the job. I do not dislike Python, but its syntax. xmc You are in charge / can have any input from you how long it takes to process one megawarc (packing) ?
18:59 🔗 xmc if you're not going to use python then what do you have in mind?
18:59 🔗 Yoshimura Anything that works. C, C++, Ruby, other compiled language. But if current tools work, though ugly, I have other better things to work on honestly.
19:00 🔗 xmc use python if you can
19:00 🔗 xmc for consistency
19:01 🔗 xmc ruby isn't compiled
19:01 🔗 Yoshimura From functioning perspective only patch to use zlib via binding and not binary for testing the warc records would solve the major culprit. No sysadmin (yep) is keen to see tons of processes.
19:01 🔗 Yoshimura Yes, I am aware its not compiled. xmc can you provide or from someone how long it takes to pack one megawarc?
19:02 🔗 * xmc shrugs
19:02 🔗 xmc a while
19:02 🔗 JW_work has joined #archiveteam-bs
19:02 🔗 xmc i don't run the machine where that happens
19:03 🔗 xmc but i'd guess about half an hour for a 50G megawarc
19:03 🔗 Yoshimura Yeah, a day or two wait is fine.
19:03 🔗 Yoshimura Does the half hour exclude upload?
19:03 🔗 Yoshimura If yes, its insane.
19:03 🔗 JW_work xmc: yes, but do you want to "be in the machine where it happens" (hums the song)
19:04 🔗 JW_work Yoshimura: You might be interested in IPFS, if you haven't heard of it. They do some of the creative storage ideas you have been mentioning, IIRC.
19:05 🔗 Yoshimura Holy molly, looks nice.
19:05 🔗 JW_work yeah IPFS is a pretty impressive piece of work
19:05 🔗 yipdw_ packing one megawarc depends primarily on the I/O bandwidth you can throw at it; other factors like compressibility can also influence time
19:06 🔗 yipdw_ e.g. records containing video can take more time
19:06 🔗 yipdw_ all of which is a way of saying "it's variable and I don't care because I just start the process and let it go"
19:06 🔗 Yoshimura I will be honest now. Although I seen a lot of stuff already, I had same or (much) greater ideas then implemented today, but 8 years ago.
19:07 🔗 JW_work that's nice — are they ready for us to drop in and use them? If not, thanks but no thanks.
19:07 🔗 Yoshimura Have to look more into this to talk. But one great example of a storage system that conforms or approaches quality of my designs (I am lone person, never studied CS) is Cassandra
19:08 🔗 yipdw_ some people around here use Cassandra as a data-store for their projects, yes
19:08 🔗 yipdw_ we don't require it because we value the ability to get stuff up and running on some random Linux machine
19:08 🔗 yipdw_ and Cassandra, despite its merits, incurs operational complexity
19:09 🔗 Yoshimura yipdw_: even I/O could be leveraged. Look, if you run more then one worker on HDD platter you are screwing I/O.
19:09 🔗 xmc archiveteam values: simplicity, reproducibility, completeness, portability
19:09 🔗 yipdw_ Yoshimura: yeah we know this, that's why some packers use SSDs
19:09 🔗 yipdw_ Yoshimura: can I please ask you to trust that we aren't idiots
19:09 🔗 xmc no
19:09 🔗 xmc you can't
19:09 🔗 Yoshimura That to me sounds "we know its shit so we throw more hardware at it"
19:09 🔗 xmc we are apparently incompetent and crazy
19:10 🔗 Yoshimura Sounds like "I want to kill you to sysadmins"
19:10 🔗 yipdw_ not really; the system we use was loaned to use by a sysadmin who had SSDs laying around
19:10 🔗 Yoshimura I do trust you more then most people I met on the net. I think (or try) objectively, Which is how I get to a lot of my designs.
19:11 🔗 yipdw_ what I want to communicate is that if you want to rewrite things, that's fine. however these tools are in the state that they are primarily because they work, they have known behaviors, and they aren't really that broken
19:11 🔗 yipdw_ in cases we do consider removing bits when it's clear that we have hit their limits
19:11 🔗 xmc it's not the most efficient but it's sturdy
19:11 🔗 yipdw_ this is one reason why wpull exists, for example
19:12 🔗 Yoshimura But whoever wrote the test_gz routine was not that competent.
19:12 🔗 yipdw_ ok fine
19:12 🔗 yipdw_ please patch it
19:12 🔗 JW_work Yoshimura: just so you know, you are wearing out the patience of your audience. Actual running code, especially with stats showing it works as well or better than alternatives — yes, please. Random claims about your designs, or the competence of other people — no, thanks.
19:13 🔗 Yoshimura I meant that with absolute respect, noone is competent enough in evrything. And random claim was offtopic note, sorry, about your patience.
19:13 🔗 Yoshimura I took it as educational discussion, not complaint or rant.
19:14 🔗 Yoshimura This is my (major?) problem, different way of understanding thing, sorry about that and expect that please in my case.
19:15 🔗 xmc well
19:15 🔗 xmc stop insulting people
19:15 🔗 yipdw_ with specific regard to why megawarc does what it does, I don't know. if you'd like to remove the gzip step and make it more akin to cat, then please do so
19:15 🔗 Yoshimura The problem is, some stuff does not sound like insult to me.
19:15 🔗 yipdw_ if it generates valid output then it would probably give us a nice speed boost
19:15 🔗 SketchCow Borp
19:15 🔗 xmc Yoshimura: accept it & move on
19:15 🔗 SketchCow What the hell is going on in here
19:15 🔗 Yoshimura I do accept it, I moved on.
19:16 🔗 xmc lol
19:16 🔗 SketchCow So am I seeing another great, classic case of "YO HO EVERYBODY CALM DOWN, I AM HERE, I HAVE A NEW PARADIGM AND YOU PLEBES ARE GONNA LEARN JUST STEP BACK"
19:16 🔗 SketchCow Or are we seeing helpful advise from a brave new warrior.
19:17 🔗 yipdw_ no I think what we're seeing is just a language thing that we'll all work out soon
19:17 🔗 SketchCow Got it.
19:17 🔗 SketchCow My least favorite thing is when someone wants to rewrite X in Y
19:17 🔗 SketchCow Which is good or I'd make all you shits rewrite everything in bash
19:17 🔗 SketchCow #!/bin/sh forever
19:18 🔗 xmc let's rewrite wpull in haskell
19:18 🔗 yipdw_ it seems simultaneously appropriate and inappropriate to point out that /bin/sh isn't bahs
19:18 🔗 yipdw_ but enough of that
19:18 🔗 * xmc laughs
19:18 🔗 JW_work bash? Bash?? What about ed?
19:18 🔗 * SketchCow rewrites yipdw_ in bash
19:18 🔗 schbirid baahaaahaaa
19:18 🔗 * yipdw_ is shellshocked
19:19 🔗 HCross nah, rewrite it all in FORTRAN
19:19 🔗 * JW_work heard of a terrible idea to rewrite a whole system purely in a custom SQL dialect once…
19:19 🔗 schbirid you know FORTRAN kinda means "runaway" in german
19:21 🔗 Yoshimura SketchCow: The part "trying to show you, not only write it, how it works bad and better" is indirectly true. Yes, I am new to warrior, knew about archiving a while and already done work on myself. Sorry to sound like annoying know it all. Having to work alone does vane off social comm skills, technical remain.
19:21 🔗 zino Stop with the FORTRAN! It's anough that I have to deal with it on payed time.
19:21 🔗 zino enough*
19:22 🔗 yipdw_ Yoshimura: no problem. honestly if you can speed up megawarc I would be happy to try it out
19:22 🔗 yipdw_ to be honest, the bottleneck isn't really compression
19:22 🔗 Yoshimura Can you propose a download link to HTML file to test on?
19:22 🔗 yipdw_ it's upload to IA
19:23 🔗 yipdw_ however if the speed up is due to reduced I/O then that can be nice alsoo
19:23 🔗 Yoshimura Yeah, I am aware. Thats why I would like input or test on my own before I spend time rewrting it whole.
19:23 🔗 Yoshimura The test_gz is done fine. Problem is handling a lot of files in non-sequential manner.
19:24 🔗 SketchCow I'll point out gamefront files are 15gb apiece up on archive
19:24 🔗 yipdw_ yes, we have many megawarcs in the archiveteam collection, e.g.
19:24 🔗 yipdw_ https://archive.org/download/archiveteam_hyves_20131120141647
19:24 🔗 SketchCow That's your smallest.
19:24 🔗 yipdw_ gamefront is good too
19:24 🔗 yipdw_ e.g. https://archive.org/download/archiveteam_gamefront_20151112045523
19:25 🔗 yipdw_ to demonstrate speed increases, I suggest unpacking the megawarc into its components (individual WARCs and extra files); the megawarc program can do this
19:25 🔗 yipdw_ then repacking them with your proposed algorithm
19:25 🔗 Yoshimura Yes, that is my plan. Will try, will do, and will come back with it, or shut up about this single problem.
19:26 🔗 yipdw_ I'd also be interested in whether I/O load goes down
19:27 🔗 SketchCow So
19:27 🔗 SketchCow http://fos.textfiles.com/ARCHIVETEAM/
19:27 🔗 Yoshimura I have to more thoroughly analyze the script then can tell more. io might not go down in bw, but might get converted to more sequential at cost of re-reading a file, which on hdd platter should be faster.
19:28 🔗 Yoshimura Yay! cheers
19:28 🔗 SketchCow Also, shout out to the three of you guys who messaged me WAIT NO DO NOT OBLITERATE THE NEWBIE
19:29 🔗 Yoshimura Thanks ^^
19:29 🔗 SketchCow As long as you realize there's 5 years of good decisions behind the current setup.
19:30 🔗 SketchCow With smart people making good passes, and divesting themselves of choices for reasons between efficiency and logic.
19:30 🔗 Yoshimura That's why I care. Stuff that is done poorly usually have people that do not accept constructive improvements.
19:30 🔗 SketchCow That's why I care that when you use phrases like "poorly" and "competent", you're being Linus without the cachet
19:31 🔗 SketchCow Or ...
19:31 🔗 * SketchCow looks around
19:31 🔗 SketchCow ...Theo
19:31 🔗 * zino hides
19:31 🔗 * SketchCow hears thunder
19:31 🔗 xmc theondr
19:32 🔗 yipdw_ obliterating the newbie is something I'm trying to purge from me
19:33 🔗 yipdw_ that and shitting on software
19:33 🔗 yipdw_ it's hard
19:33 🔗 yipdw_ or shitting on the author I guess
19:33 🔗 SketchCow So http://fos.textfiles.com/ARCHIVETEAM/ is the vanguard.
19:33 🔗 yipdw_ oh cool
19:33 🔗 SketchCow Now I have one script that packs up the items and then hands it to a general script.
19:33 🔗 SketchCow It used to be three, one in each folder.
19:33 🔗 Yoshimura Some social stuff is hard for different people. I grew up with tech, nature, not people.
19:34 🔗 SketchCow #!/bin/sh
19:34 🔗 SketchCow # SPLATTER - Pack up the content and place it up. Needs Upchuck to work.
19:34 🔗 SketchCow # EDIT THE SETTINGS TO MAKE SURE YOU GO TO THE RIGHT PLACE.
19:34 🔗 SketchCow TITLE="FTP Site Download"
19:34 🔗 SketchCow CLASSIFIER="ftp"
19:34 🔗 SketchCow COLLECTION="archiveteam_ftp"
19:34 🔗 SketchCow echo "We're putting this into $COLLECTION."
19:34 🔗 SketchCow each=$1 ITEMNAME=${CLASSIFIER}_$each
19:34 🔗 SketchCow echo "Going down the collection linearly."
19:34 🔗 SketchCow mkdir DONE
19:34 🔗 SketchCow for each in 2* do ionice -c 3 -n 0 python /0/SCRIPTCITY/megawarc --verbose pack ${CLASSIFIER}_$each $each
19:34 🔗 SketchCow mv $each DONE
19:34 🔗 SketchCow bash /0/SCRIPTCITY/upchuck "$COLLECTION" "$each" "${TITLE}"
19:34 🔗 SketchCow mv "${CLASSIFIER}_${each}."* DONE
19:34 🔗 SketchCow done
19:34 🔗 SketchCow So, that's the whole script. Note the TITLE/CLASSIFIER/COLLECTION settings. You set everything there.
19:34 🔗 SketchCow The rest does the work without me editing (chances for problems)
19:44 🔗 Start has quit IRC (Quit: Disconnected.)
19:47 🔗 SketchCow It now forces me to make a collection for each set.
19:47 🔗 SketchCow Which I should have been doing.
19:56 🔗 schbirid has quit IRC (Quit: Leaving)
20:04 🔗 SketchCow I want to eventually find out what zino is up to.
20:04 🔗 arkiver http://eldrimner.lysator.liu.se:8080/archiveteam.txt
20:04 🔗 arkiver We're in the process of uploading
20:48 🔗 SimpBrain has quit IRC (Quit: Leaving)
20:51 🔗 joepie91 Yoshimura: worth noting that IA / ArchiveTeam do not necessarily have the same requirement as $randomStartupThat'sGoingToGoBankruptInAYear
20:51 🔗 atrocity lol
20:51 🔗 joepie91 Yoshimura: the most important aspect of storage, for example, isn't that it's efficient or fast or elegant or whatever. the most important aspect is that it's robust
20:51 🔗 joepie91 that when you point somebody at it in 50 years
20:52 🔗 joepie91 they can still trivially decode/read/whatever it
20:52 🔗 Yoshimura Good joke.
20:52 🔗 alfie I consider "paper + nerd with a scanner" a perfectly acceptable form of long-term storage, fwiw :P
20:52 🔗 joepie91 Yoshimura: making light of a serious explanation is not a great way to encourage discussion.
20:52 🔗 Yoshimura But I agree. 50years well not sure what formats will be used.
20:53 🔗 joepie91 my point here is that something that may look 'suboptimal' isn't necessarily suboptimal, and even if it is, the benefits of 'fixing' it do not necessarily outweigh the drawbacks
20:53 🔗 Yoshimura Papers degrade, nerds and scanners also.
20:53 🔗 alfie Yoshimura: yes, but we still have paper docs from a very long time ago.
20:54 🔗 joepie91 papers degrade quite a bit slower than basically every other automatable form of storage we currently have available to us
20:54 🔗 xmc ^
20:54 🔗 JW_work but it's still worth *investigating* possible fixes — we can find out why they aren't a bad idea *after* they are available to be tried :-)
20:54 🔗 JW_work er, "are a bad idea" I meant.
20:54 🔗 alfie and as much as paper degrades, the pile of dead WD Blues I currently have my feet resting on would like to have a word with you ;)
20:54 🔗 joepie91 JW_work: sure, it's more the attitude I'm concerned with than the suggestions
20:54 🔗 Yoshimura What I gained from my life that the only way to ensure time longetivity is cahnging the format to one currently supported in that time period
20:54 🔗 alfie Yoshimura: s/supported/accessible.
20:54 🔗 joepie91 JW_work: I'm all for suggestions for improvement, but people have to stay realistic, and understand why there isn't a channel immediately jumping on a shiny-sounding idea to 'do everything better'
20:55 🔗 Yoshimura So warc itself is fine, way to store warcs do and will change.
20:55 🔗 joepie91 and why insisting on how much 'better' it is is just going to be counterproductive, as compared to writing out a proof of concept
20:55 🔗 joepie91 and a well-reasoned analysis of the benefits and drawbacks
20:55 🔗 alfie archival isn't anything like any other kinds of data storage you'll come across
20:56 🔗 JW_work joepie91: no argument from me
20:56 🔗 Yoshimura I stopped insisting anyone.
20:56 🔗 joepie91 JW_work: right. this is what I've been trying to explain to Yoshimura ;)
20:56 🔗 Yoshimura And proofs are not written in chat, so if someone keeps the conversation in loop, like you do now, I just reply with arguments, if you oppose.
20:57 🔗 joepie91 Yoshimura: it's perfectly acceptable to say "I'll get back to this later, let me write it out first"
20:57 🔗 Yoshimura Thats logical. So its not about "only throwing hands" but about not ignoring conversation.
20:57 🔗 joepie91 but once you start making claims, don't be surprised when people push back against them
20:57 🔗 joepie91 especially if said claims are made with an air of superiority
20:57 🔗 Yoshimura Yeah, that I agree.
20:58 🔗 joepie91 if you have an idea for improvement, then it'd be greatly appreciated if you could write it out in detail, analyzing the benefits and drawbacks, metrics, etc.
20:58 🔗 joepie91 there are definitely areas that could use improvement
20:58 🔗 alfie Yoshimura: my advice is that if you have a POC, show us, or if you think you can have a POC then say "hey i'm gonna go and hack on this till it works vaguely" and then come back and show us
20:58 🔗 joepie91 I'm just saying that an ad-hoc discussion here in channel like this is probably not the best medium for that :)
20:58 🔗 Yoshimura I do, but not for warc also is not using warc files
20:58 🔗 alfie #archiveteam-bs-bs? ;)
20:59 🔗 SketchCow This is the place for a discussion.
21:00 🔗 Yoshimura I did deflate with dictionary in 2008 used flat files. One for pages one for respose headers
21:00 🔗 Yoshimura So basically DIY warc.
21:01 🔗 joepie91 Yoshimura: have you read the WARC (draft) spec?
21:01 🔗 Yoshimura Why so?
21:01 🔗 arkiver Never ever create custom WARCs
21:01 🔗 joepie91 Yoshimura: no hidden point, just a general question :P
21:01 🔗 JW_work what's a "custom WARC"?
21:01 🔗 Yoshimura I did read stuff, not shole spec.
21:02 🔗 arkiver JW_work: I mean like stuffing some headers and data you found somewhere in a WARC
21:02 🔗 joepie91 Yoshimura: right. so one thing to take into consideration, aside from recommending that you read the full spec, is that not all responses are necessarily HTTP
21:02 🔗 joepie91 for example, Heritrix also writes DNS requests and responses to the WARC
21:02 🔗 joepie91 WARC is more-or-less agnostic to what exact kind of data it contains
21:03 🔗 joepie91 Yoshimura: there's a few subtleties like this, plus the unusual requirements of long-term archival, that make it somewhat difficult to design new/compatible implementations
21:03 🔗 joepie91 Yoshimura: also, I -think- the index file implementation that IA uses is not part of the WARC spec
21:04 🔗 joepie91 not sure what limitations that introduces
21:04 🔗 joepie91 docs-wise, there's https://archive.org/web/researcher/cdx_file_format.php and https://archive.org/web/researcher/cdx_legend.php and some stuff that can be inferred from code like https://github.com/joepie91/node-cdx
21:05 🔗 Yoshimura I did not say anything about new implementation.
21:05 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:05 🔗 joepie91 just dumping all info here :P
21:05 🔗 Yoshimura Just a better way to store the warc encoded data
21:05 🔗 JW_work arkiver: ah, hm. I noticed jake from IA converted a set of static HTML (from the BBC's website as of 1995) into WARCs using Python's simpleHTTPserver module: https://archive.org/details/bbcnc.org.uk-19950301
21:06 🔗 joepie91 Yoshimura: point is, .warc.gz is pretty much a fixed format in and of itself, in part due to the cdx files
21:06 🔗 joepie91 Yoshimura: for example, the cdx contains offsets of the compressed data
21:06 🔗 joepie91 using which you can read out specific records
21:07 🔗 joepie91 even over a HTTP range request
21:07 🔗 joepie91 un-gzipping the extracted range in the process
21:07 🔗 joepie91 these are all things to take into account
21:08 🔗 Yoshimura My stuff was per record.
21:09 🔗 dashcloud has joined #archiveteam-bs
21:09 🔗 Yoshimura And the conversation also.
21:12 🔗 Yoshimura Ungzipping over http range is honorable, but it does include the warc headers, so pretty useless. Also gzip is http capability, not needed part.
21:17 🔗 joepie91 Yoshimura: not sure what you mean
21:18 🔗 Yoshimura Gzip is used for whole record, including warc header.
21:18 🔗 Yoshimura But if you want to serve it as file you need to process first.
21:18 🔗 Yoshimura So gzip over http is half useless.
21:19 🔗 joepie91 Yoshimura: it's not. it's how I can search for and extract records from an archive.org-hosted WARC file without downloading terabytes of data.
21:19 🔗 joepie91 the gzip is a technical property, not a feature
21:20 🔗 Frogging I'm often concerned about how much data is duplicated in the WARCs on IA. There's probably quite a lot of stuff that is in there twice in separate WARCs
21:21 🔗 joepie91 Frogging: 'duplicated' by what definition?
21:21 🔗 Yoshimura joepie91: Are you dreaming or ignoring?
21:21 🔗 Frogging well, say I download a site and put the WARCs on IA. Then someone else does the same, and there's now two WARCs with mostly the same data
21:22 🔗 Yoshimura Yoshimura
21:22 🔗 Yoshimura My stuff was per record.
21:22 🔗 Yoshimura And the conversation also.
21:22 🔗 Yoshimura and deflate IS per record.
21:22 🔗 yipdw_ I remember xmc (maybe) posting some study that showed a 50% improvement over deduplicating some WARC dataset
21:22 🔗 yipdw_ so it's big, but not like fatal
21:23 🔗 yipdw_ trying to find that link
21:23 🔗 Frogging 50% sounds rather huge
21:23 🔗 yipdw_ it's one dataset and I don't remember the details
21:24 🔗 Frogging k
21:24 🔗 yipdw_ I also regret even citing that figure becaue now people will extrapolate it to literally everything
21:24 🔗 Yoshimura joepie91: While your input is indeed very constructive, the fact that ignores that I did not propose compressing more files at once is not.
21:24 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
21:24 🔗 * SketchCow breaks pool cue in half
21:24 🔗 SketchCow Back in an hour
21:26 🔗 Frogging yipdw_: Heh. Well, I'd be interested in running my own tests
21:26 🔗 SketchCow http://fos.textfiles.com/ARCHIVETEAM/ works
21:26 🔗 Frogging lmao. that gif
21:26 🔗 Yoshimura What I found strange or there is some problem with regular wget is that compressing 169MB warc.gz as one file makes 150MB
21:27 🔗 joepie91 Yoshimura: I'm not making any technical claims, just pointing out usecases to consider
21:27 🔗 yipdw_ that sounds like a reasonable gain, depending on the number of records in the warc
21:28 🔗 yipdw_ the reason why we don't do that is random access becomes more difficult
21:28 🔗 Yoshimura HTML pages only, not sure how many. Not reasonable much to me, as I remember that using deflate did save much more.
21:29 🔗 Yoshimura yipdw_: It was just as test. As the deflate with dictionary and gzip togehter should work kinda same way. As the dictionary is merely a precondition of sliding window
21:30 🔗 Yoshimura If its 20% I do not expect people to care. I just learned to handle stuff by care factor, not by real impact.
21:31 🔗 godane SketchCow: i'm up to 2015-10-01 with kpfa
21:31 🔗 godane we are very close to being complete
21:31 🔗 Yoshimura And if its better format that save a fck ton but is more complex in terms of tools and that it needs changes usualy argument is too much work.
21:31 🔗 Yoshimura So in either way stuff does not happen.
21:33 🔗 SketchCow Thank youuuu
21:33 🔗 yipdw_ operations, migration strategies, backwards compatibility are all big deals yeah
21:34 🔗 Frogging Why is SketchCow destroying pool cues
21:34 🔗 SketchCow Frogging: https://www.youtube.com/watch?v=VCXbib9MahE
21:34 🔗 Yoshimura yipdw_: Resources and hence money is big deal also. But seems most people are more keen at throwing more money or being able to achieve less.
21:35 🔗 JW_work I suspect in a couple of decades, if costs/gb fail to decline fast enough, IA may be a lot more interested in de-duplication; but at this point I think computing power is not sufficiently cheap (as compared with storage) to make it worth it.
21:35 🔗 Frogging lol I see :p
21:35 🔗 JW_work Still worth looking into though.
21:36 🔗 yipdw_ I feel like I just got told to go fuck myself
21:36 🔗 yipdw_ that's coo
21:38 🔗 yipdw_ apologies, that wasn't necessary
21:43 🔗 SketchCow http://fos.textfiles.com/ARCHIVETEAM/ seems to be working. Adding "size" now.
21:51 🔗 zino SketchCow: Anything in particular you wanted to know or just why the uploads are going so slow?
21:53 🔗 SketchCow No, no.
21:54 🔗 SketchCow It's that you're uploading and using methods and I'm uploading and adding methods.
21:54 🔗 SketchCow Are you being given new stuff or just doing a backfill?
21:54 🔗 zino SketchCow: I don't think I'm being fed anything since last thursday, so only archiving anf uploading now.
21:55 🔗 zino and*
21:58 🔗 zino Spent half an hour reading up on long-haul TCP Yesterday to see if I could push a bit more against IAs s3 bucket. Haven't had to do big data transfers over anything more than 1000km for over four years, so much of that has dropped out of my brain.
22:02 🔗 tomwsmf-a has joined #archiveteam-bs
22:03 🔗 SketchCow Yeah, you're doing a version of what I do, so we'll have a little weirdness for keeping track of things for a while.
22:04 🔗 SketchCow There's no reason/victory. You're our emergency generator, no need to line you out with needless accounting.
22:05 🔗 zino Minimal accounting sounds good. :-)
22:07 🔗 SketchCow Your contribution is appreciated.
22:08 🔗 Yoshimura I could contribute cpu but not storage.
22:08 🔗 Yoshimura There should be some social movement to make boinc-like network using tiny VMs.
22:10 🔗 Yoshimura zino: What did you read on the tcp? Is there anything how you can improve?
22:10 🔗 Yoshimura If it was me I would switch to udp
22:11 🔗 SketchCow OK, dude
22:11 🔗 SketchCow Hold up.
22:11 🔗 SketchCow BRAKES.
22:11 🔗 SketchCow People discussing anything in this channel and you coming in rushing in with 'solutions'
22:11 🔗 SketchCow Stop.
22:12 🔗 SketchCow And if you go, "Huh, ANOTHER community unwilling to hear new ideas."
22:12 🔗 SketchCow Start thinking.
22:12 🔗 SketchCow Maybe it's YOU.
22:12 🔗 Yoshimura Is that aimed to me?
22:12 🔗 xmc YES
22:12 🔗 zino Heh.
22:13 🔗 Yoshimura I just asked about how can I improve TCP, and did provide _personal_ (if it was me) _opinion_ not 'solutions'
22:13 🔗 SketchCow
22:13 🔗 SketchCow YYYYYYY YYYYYYYEEEEEEEEEEEEEEEEEEEEEE SSSSSSSSSSSSSSS
22:13 🔗 SketchCow Y:::::Y Y:::::YE::::::::::::::::::::E SS:::::::::::::::S
22:13 🔗 SketchCow Y:::::Y Y:::::YE::::::::::::::::::::ES:::::SSSSSS::::::S
22:13 🔗 SketchCow Y::::::Y Y::::::YEE::::::EEEEEEEEE::::ES:::::S SSSSSSS
22:13 🔗 SketchCow YYY:::::Y Y:::::YYY E:::::E EEEEEES:::::S
22:13 🔗 SketchCow Y:::::Y Y:::::Y E:::::E S:::::S
22:13 🔗 SketchCow Y:::::Y:::::Y E::::::EEEEEEEEEE S::::SSSS
22:13 🔗 SketchCow Y:::::::::Y E:::::::::::::::E SS::::::SSSSS
22:13 🔗 SketchCow Y:::::::Y E:::::::::::::::E SSS::::::::SS
22:13 🔗 SketchCow Y:::::Y E::::::EEEEEEEEEE SSSSSS::::S
22:13 🔗 SketchCow Y:::::Y E:::::E S:::::S
22:13 🔗 SketchCow Y:::::Y E:::::E EEEEEE S:::::S
22:13 🔗 SketchCow Y:::::Y EE::::::EEEEEEEE:::::ESSSSSSS S:::::S
22:13 🔗 SketchCow YYYY:::::YYYY E::::::::::::::::::::ES::::::SSSSSS:::::S
22:13 🔗 SketchCow Y:::::::::::Y E::::::::::::::::::::ES:::::::::::::::SS
22:13 🔗 SketchCow YYYYYYYYYYYYY EEEEEEEEEEEEEEEEEEEEEE SSSSSSSSSSSSSSS
22:13 🔗 SketchCow
22:13 🔗 SketchCow
22:13 🔗 SketchCow
22:13 🔗 zino Yoshimura: No biggie. And is Amazon had gridftp or similar set up, sure I could switch to that. Alas... :-P
22:14 🔗 Start has joined #archiveteam-bs
22:14 🔗 Yoshimura Apparently SketchCow Tripped over his brain. So tell that to him.
22:14 🔗 Yoshimura zino: Is there any link to what you read about TCP improvements?
22:14 🔗 SketchCow Do it.
22:14 🔗 SketchCow Take me up.
22:14 🔗 SketchCow Take me on.
22:16 🔗 Yoshimura I wanted to learn about the TCP, the UDP was _personal opinion side note. No reason to start fires over that.
22:16 🔗 zino Yoshimura: Those tabs are closed, but it's easily googlable. You need to be able to negotiate a TCP window big enough to not still over your latence (ping). And give the kernels some more memeory and bigger queues to work with.
22:16 🔗 zino And the same needs to be turned on in the reciver side as well.
22:17 🔗 zino s/still/stall/
22:17 🔗 Yoshimura Oh, great, yeah, that makes sense. Thank you ;)
22:18 🔗 VADemon has quit IRC (Quit: left4dead)
22:19 🔗 chfoo- has quit IRC (Read error: Operation timed out)
22:19 🔗 chfoo- has joined #archiveteam-bs
22:19 🔗 xmc Yoshimura: you're welcome to stay here but you are not allowed to say anything for the next 12 hours
22:20 🔗 xmc this applies to all archiveteam channels
22:22 🔗 wacky_ has quit IRC (Ping timeout: 244 seconds)
22:23 🔗 wacky has joined #archiveteam-bs
22:31 🔗 SketchCow http://fos.textfiles.com/ARCHIVETEAM/ now shows sizes!
22:40 🔗 Fletcher_ has quit IRC (Ping timeout: 250 seconds)
22:40 🔗 koon has quit IRC (Ping timeout: 250 seconds)
22:40 🔗 espes__ has quit IRC (Ping timeout: 250 seconds)
22:40 🔗 espes__ has joined #archiveteam-bs
22:43 🔗 SketchCow And (in theory) I just added Archivebot.
22:52 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
22:57 🔗 arkiver Looks good SketchCow
22:57 🔗 arkiver Liveprogress on uploads :D
23:03 🔗 SketchCow It won't show zino but that's OK.
23:03 🔗 SketchCow Just another thing for people to notice.
23:03 🔗 SketchCow Soon, I'm going to make it much more automatic (but also be linear.)
23:04 🔗 SketchCow So only one pack-and-ship is happening at once.
23:04 🔗 SketchCow With theory being it will just go relentlessly and not wait on me any further (and archivebot will go on its own.)
23:05 🔗 arkiver It would be nice if the page automatically refreshes when it's updated
23:05 🔗 arkiver So we can just leave it open in a monitor and watch it do stuff
23:06 🔗 Frogging What's zino?
23:06 🔗 SketchCow If someone wants to write that code and hand it to me, I'll shove it in.
23:07 🔗 SketchCow http://first.archival.club/ by the way
23:25 🔗 SketchCow Also, now I am making one subdirectory on FOS that does nothing but these pipes.
23:25 🔗 SketchCow This is an important step for the future for 1. Knowing what projects there are pipelines for (maybe run a script often to show it) and 2. Be able to write a script to say "run everything in here."
23:37 🔗 koon has joined #archiveteam-bs
23:54 🔗 tomwsmf-a has joined #archiveteam-bs
23:58 🔗 Stiletto has quit IRC (Read error: Operation timed out)

irclogger-viewer