#archiveteam-bs 2017-06-21,Wed

↑back Search

Time	Nickname	Message
00:00 ^🔗		j08nY has quit IRC (Quit: Leaving)
00:22 ^🔗	arkiver	MrRadar2: what URL?
00:25 ^🔗		sheaf has quit IRC (Quit: sheaf)
01:23 ^🔗		bitBaron has joined #archiveteam-bs
01:35 ^🔗	crusher	it does seem like something's up with imzy
01:36 ^🔗	crusher	i'm using the warrior script, i haven't seen any of my threads actually upload any data
01:40 ^🔗	crusher	my urlte.am and eroshare threads are archiving just fine though...
03:11 ^🔗		th1x has joined #archiveteam-bs
03:17 ^🔗		dashcloud has quit IRC (Remote host closed the connection)
03:18 ^🔗		dashcloud has joined #archiveteam-bs
04:43 ^🔗		Sk1d has quit IRC (Ping timeout: 194 seconds)
04:48 ^🔗		Sk1d has joined #archiveteam-bs
04:57 ^🔗		crusher has quit IRC (Ping timeout: 268 seconds)
05:10 ^🔗		Aranje has quit IRC (Quit: Three sheets to the wind)
05:19 ^🔗		Aranje has joined #archiveteam-bs
05:25 ^🔗		th1x has quit IRC (Read error: Operation timed out)
05:46 ^🔗		Aranje has quit IRC (Three sheets to the wind)
06:09 ^🔗		bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
06:22 ^🔗		schbirid has joined #archiveteam-bs
06:40 ^🔗		kyounko has joined #archiveteam-bs
06:41 ^🔗		voidsta has quit IRC (Remote host closed the connection)
06:43 ^🔗		voidsta- has joined #archiveteam-bs
06:44 ^🔗		voidsta- has quit IRC (Client Quit)
06:44 ^🔗		voidsta- has joined #archiveteam-bs
06:49 ^🔗		voidsta- is now known as voidsta
06:52 ^🔗		Jonison has joined #archiveteam-bs
07:17 ^🔗		j08nY has joined #archiveteam-bs
07:44 ^🔗		SHODAN_UI has joined #archiveteam-bs
07:59 ^🔗		jtn2 has joined #archiveteam-bs
08:14 ^🔗		jrwr has quit IRC (Read error: Operation timed out)
08:15 ^🔗		robogoat has quit IRC (Read error: Operation timed out)
08:15 ^🔗		robogoat has joined #archiveteam-bs
08:39 ^🔗		j08nY has quit IRC (Quit: Leaving)
09:10 ^🔗		SHODAN_UI has quit IRC (Remote host closed the connection)
09:19 ^🔗		BlueMaxim has quit IRC (Read error: Operation timed out)
10:28 ^🔗		logchfoo2 starts logging #archiveteam-bs at Wed Jun 21 10:28:19 2017
10:28 ^🔗		logchfoo2 has joined #archiveteam-bs
11:10 ^🔗		SHODAN_UI has joined #archiveteam-bs
11:48 ^🔗		victorbje has joined #archiveteam-bs
12:27 ^🔗		C4K3_ has joined #archiveteam-bs
12:30 ^🔗		C4K3 has quit IRC (Ping timeout: 260 seconds)
12:36 ^🔗		icedice has joined #archiveteam-bs
12:37 ^🔗		th1x has joined #archiveteam-bs
12:56 ^🔗	MrRadar	arkiver: For example https://www.imzy.com/api/accounts/profiles/daylen?check=true
12:57 ^🔗	MrRadar	I can send you one of the partial WARCs if that would help
12:57 ^🔗	MrRadar	They all have that ?check=true parameter
13:10 ^🔗		vbdc has joined #archiveteam-bs
13:11 ^🔗	vbdc	getting rate-limited when doing the upload, 120 connections seems like a small amount. Anything I can do to help workaround this bottleneck?
13:34 ^🔗	timmc	MrRadar: If I'm logged in and view my profile, the ?check=true API call comes back with empty 200 OK; for someone else's profile it is an empty 401. I suspect this is an authenticated API call and isn't suitable for archiving.
13:35 ^🔗	timmc	The 206 when unauthenticated is weird, though. I can check with weffey...
13:36 ^🔗		vbdc has quit IRC (Ping timeout: 268 seconds)
13:42 ^🔗	timmc	arkiver, MrRadar: weffey says any ?check call should be skipped -- it's an auth'd call to see if an object exists (and whether the user has permissions to it) without getting the full payload.
13:47 ^🔗		crusher_ has joined #archiveteam-bs
13:48 ^🔗	MrRadar	vbdc: That's just the limit for FOS
13:48 ^🔗	MrRadar	We've tried raising it in the past but it actually slows down due to the server's disk IO getting saturated
13:49 ^🔗	MrRadar	We could possibly add another rsync target
13:49 ^🔗	MrRadar	If someone wants to volunteer one
14:04 ^🔗	victorbje	MrRadar: what's the requirements for adding a rsync target? Might be able to host one
14:21 ^🔗		icedice has quit IRC (Read error: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac)
14:22 ^🔗		icedice has joined #archiveteam-bs
14:28 ^🔗	MrRadar	victorbje: IIRC at least 500 Mbit Internet and several TB storage. arkiver can give you more details
14:35 ^🔗	crusher_	i wonder if how hard it would be to redirect path the warrior uses to cache the scraped files based on file size
14:35 ^🔗	crusher_	wonder how hard*
14:36 ^🔗	crusher_	My biggest bottleneck right now is my lack of RAID disks and / or that it's a pretty slow drive
14:37 ^🔗	crusher_	so i was thinking of using a ramdisks to cache all the small files and selectively throw large ones to disk
14:39 ^🔗	JAA	So according to the Tilt API, the US and Australia are states, Canada and the UK are provinces, and Ireland's a county (no, not "country"). ¯\_(ツ)_/¯
14:41 ^🔗	crusher_	lol
14:41 ^🔗	MrRadar	cruster_: if you're referring to FOS, that actually wouldn't help too much. FOS runs the "megawarc factory" which combines individual items together into "megawarcs" so when there are projects with tons of small files it uses pretty much all the I/O resources it can
14:41 ^🔗	crusher_	I see.
14:42 ^🔗	crusher_	on average how big can those file-balls get?
14:42 ^🔗	MrRadar	40 GB is the usual size
14:42 ^🔗	MrRadar	Some projects with extra-large items use 80 GB
14:42 ^🔗	crusher_	spits cereal that's what a warrior spits out per thread?
14:42 ^🔗	crusher_	that doesn't sound right...
14:42 ^🔗	yipdw	no
14:43 ^🔗	MrRadar	No, that's the size of a megawarc
14:43 ^🔗	MrRadar	Individual items can range from the KB to a dozen or so GB depending on the project
14:43 ^🔗		bitBaron has joined #archiveteam-bs
14:44 ^🔗	crusher_	that makes more sense
14:44 ^🔗	MrRadar	You can get a sense of it from here: http://fos.textfiles.com/pipeline.html
14:44 ^🔗	MrRadar	The "inbox" is the items waiting to be megawarc'ed
14:44 ^🔗	MrRadar	The outbox are megawarcs waiting to be uploaded to IA
14:45 ^🔗	MrRadar	Another useful page here lists the items as they are uploaded: http://fos.textfiles.com/ARCHIVETEAM/
14:47 ^🔗	crusher_	interesting
14:48 ^🔗	crusher_	so on the client side, a ramdisk could be useful for small files provided the connection isn't saturated, correct?
14:48 ^🔗	JAA	Nice. I've never seen that before.
14:49 ^🔗	MrRadar	From what I understand, the data is saved to a temporary file, gzipped, and then concatenated on to the end of the result WARC file
14:50 ^🔗	MrRadar	So I'm not sure how much a ramdisk would help, especially since Linux in general has a very good disk cache
14:50 ^🔗	MrRadar	(As long as it has enough free memory)
14:51 ^🔗	crusher_	so in essence, i should allocate more ram to the warriors and let them do their thing
14:51 ^🔗	crusher_	whatever it takes to give the poor HDD some breathing room
14:51 ^🔗	MrRadar	Yes, it's worth a try. I think it does flush the file out to disk when it's done downloading it but when it gets read back it should be reading from the cache
14:52 ^🔗	crusher_	right now it's getting hammered to 100% with non-sequential I/O
14:52 ^🔗	crusher_	probably because it's running 10 warriors but....
14:52 ^🔗	crusher_	(shhh details)
14:55 ^🔗		odemg has quit IRC (Read error: Operation timed out)
15:07 ^🔗	crusher_	looking at the ram usage, i can see of the 400MB allocated, it's only using between 60-82 Megs
15:07 ^🔗	crusher_	(400 each)
15:08 ^🔗	MrRadar	Check out what top says inside the VM
15:12 ^🔗		bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
15:15 ^🔗		odemg has joined #archiveteam-bs
15:18 ^🔗	JAA	crusher_: You can also try running the scripts directly to reduce the overhead.
15:18 ^🔗	MrRadar	Yeah, having 1 kernel schedule the I/O for everything would probably do a better job than 10 kernels that aren't aware of what the others are doing
15:20 ^🔗	crusher_	yeah...
15:21 ^🔗	Kaz	scrolling, catching up
15:22 ^🔗		odemg has quit IRC (Read error: Operation timed out)
15:22 ^🔗	Kaz	crusher_: ram disk won't really provide any benefit
15:22 ^🔗	Kaz	I've seen projects that we've maxed out multiple gigabit links constantly, you wouldn't be able to get it into ram, megawarc it, then offload quick enough
15:22 ^🔗		odemg has joined #archiveteam-bs
15:23 ^🔗	crusher_	i'm talking client (warrior) side
15:23 ^🔗	Kaz	ah, warrior side I've run stuff in ram before
15:24 ^🔗	Kaz	as long as you run under capacity - knwing that some items will obviously be a long way outside the average, and cause issues
15:24 ^🔗	MrRadar	For example, items for Eroshare have huge variation in size. From a few MB to 15+ GB
15:25 ^🔗	crusher_	that's what i'm currently using most of the warriors on
15:25 ^🔗	MrRadar	But even for projects like Yahoo Answers which is mostly in the neighborhood of 100 MB per item I've hard a few GB-sized items
15:25 ^🔗	crusher_	it seems to be the only current project that isn't 100% saturated or done
15:27 ^🔗	crusher_	i'd help with newsgrabber, but the warrior seems to freeze and do nothing on that one...
15:28 ^🔗	Kaz	hmm
15:28 ^🔗	Kaz	what does the webui show when it's frozen?
15:28 ^🔗	crusher_	current project screen is blank
15:28 ^🔗	Kaz	oh hm
15:28 ^🔗	crusher_	available shows it is working on newsgrabber
15:29 ^🔗	Kaz	newsgrabber has a ton of requirements, possible that the warrior install script doesn't get it all
15:29 ^🔗	crusher_	hmm.
15:34 ^🔗	crusher_	so if i was to run them in the host OS, how difficult of a process would that be
15:35 ^🔗	Kaz	not too much work, all the setup instructions are in the git repo
15:38 ^🔗		LastNinja has quit IRC (Read error: Operation timed out)
15:44 ^🔗	crusher_	dumb question, but there are 12 pages of projects... Which one should i be looking for?
15:44 ^🔗	JAA	For people using OpenVPN: https://guidovranken.wordpress.com/2017/06/21/the-openvpn-post-audit-bug-bonanza/
15:45 ^🔗	JAA	crusher_: I'd start from the wiki homepage, where the currently active projects are listed.
15:45 ^🔗	crusher_	ah, so there's no way to run them warrior like in the host
15:45 ^🔗	crusher_	it's all manual?
15:46 ^🔗	JAA	Not sure what you mean by "all", but yes, more things are manual than in the warrior. The code doesn't update automatically, for example, and there is no "ArchiveTeam's Choice" equivalent for scripts.
15:46 ^🔗	crusher_	right
15:47 ^🔗	JAA	For most projects, you clone the corresponding git repository and execute something like run-pipeline pipeline.py --concurrent N NICK
15:47 ^🔗	JAA	Once you have the dependencies installed, that is.
15:47 ^🔗	JAA	You may also want to use --disable-web-server, depending on your setup.
15:48 ^🔗	crusher_	if you had a spare i5 with 8Gigs of ram and a 300 / 300 internet connection, what would you run?
15:48 ^🔗	MrRadar	If you run multiple pipelines and you do want to run the web UI make sure you assign each pipeline a different port
15:49 ^🔗	MrRadar	I'd run URLTeam, Yahoo Answers, Eroshare, and maybe Imzy (though I suspect that needs a script update)
15:50 ^🔗	crusher_	how many concurrent runs for each?
15:51 ^🔗	crusher_	this machine is 100% available
15:51 ^🔗	JAA	I'm running 10 on URLTeam and 3 on Yahoo Answers.
15:51 ^🔗	Kaz	github.com/archiveteam/newsgrabber-warrior
15:51 ^🔗	JAA	Yahoo bans pretty quickly if you go too fast.
15:52 ^🔗	crusher_	something i noticed with the urlteam on warrior was that it was constantly running out of tasks to do
15:52 ^🔗	JAA	Imzy was running fine with 6 concurrent threads before, but I haven't tried again since the latest updates.
15:53 ^🔗	Kaz	newsgrabber will never run out of jobs, that's part of the fun
15:59 ^🔗	victorbje	is the "time left" the time until the service shuts down or until all items are done at current speed?
15:59 ^🔗	victorbje	in the web ui, top right
16:00 ^🔗		bitBaron has joined #archiveteam-bs
16:01 ^🔗	MrRadar	Until the service shuts down
16:01 ^🔗	MrRadar	Though they sometimes stay up for hours or even days past their official shutdown time
16:02 ^🔗	crusher_	or shut down sooner than they said they would
16:02 ^🔗	MrRadar	Or occaisonally they whitelist us to let us access the service after the official shutdown
16:07 ^🔗	victorbje	all right, thanks
16:27 ^🔗	timmc	Speaking of which, I heard from weffey that Imzy will probably go dark at around 06:00 UTC on 2017-06-23, depending on other scheduling constraints.
16:28 ^🔗	crusher_	well, either the script broke or there's nothing left to grab
16:28 ^🔗	MrRadar	crusher_: The Imzy script needs an update to ignore the ?check=true URLs and then a requeue
16:29 ^🔗	MrRadar	I'm sure arkiver will get around to it when he has time
17:24 ^🔗	JAA	http://sarahcandersen.com/post/162085779429 ;-)
17:30 ^🔗	MrRadar	An interesting sci-fi story on that subject from the author of the story Arrival was based on: http://subterraneanpress.com/magazine/fall_2013/the_truth_of_fact_the_truth_of_feeling_by_ted_chiang
17:31 ^🔗	MrRadar	It's food for thought
17:53 ^🔗		jrwr has joined #archiveteam-bs
17:58 ^🔗	kisspunch	Is there a tool to download wordpress sites? (other than wget)
18:05 ^🔗	Frogging	what would such a tool do that wget doesn't do?
18:09 ^🔗	schbirid	wpull :P
18:09 ^🔗	kisspunch	same sort of thing as tumblr tools I've seen. a big one is dealing with pagination changing between crawls and tags. any kind of parsing (here are the comments, here is the post, here are the tags) would be extra. i'd also be happy being pointed at a good wget config, though.
18:10 ^🔗	kisspunch	so, if I visit a site again after 1 blog post, I'd rather not download the ENTIRE blog worth of index, which is what will happen right now
18:13 ^🔗	Frogging	hmm, good point
18:15 ^🔗	kisspunch	This is actually a problem with lots of stuff, not just wordpress :(
18:44 ^🔗		SHODAN_UI has quit IRC (Remote host closed the connection)
18:48 ^🔗	jrwr	Was watching the slides on how Jason got sued for two billion dollars, found this on the IA https://archive.org/details/ModeleskiCompOrder
18:48 ^🔗	jrwr	Pretty much he has been marked insane by the courts
18:49 ^🔗	RedType	Founder Paul Andrew Mitchell, an advanced systems development consultant for 35 years, has spent the past sixteen years since 1990 A.D. doing a detailed investigation of the United States Constitution, federal statute laws, and the important court cases.
18:49 ^🔗	RedType	AD
18:49 ^🔗	MrRadar	LOL
18:50 ^🔗	jrwr	Yep
18:50 ^🔗	jrwr	I'm reading the whole thing now
18:50 ^🔗	jrwr	holy shit
18:51 ^🔗	jrwr	the top of page 3 is fucking gold
18:51 ^🔗	jrwr	GOLD
18:52 ^🔗	MrRadar	Another "entertaining" "sued by an insane idiot" story is the time "game studio" Digital Homocide sued Jim Sterling for $10M+ for trashing one of their garbage shovelware games: https://www.youtube.com/watch?v=qS-LXvhy1Do
18:52 ^🔗	jrwr	SketchCo1: This guy is a hoot
18:54 ^🔗	timmc	"Defendant Mitchell shall undertake formal competency restoration procedures at a qualified federal medical center" <-- what does that mean?
18:54 ^🔗		kisspunch has quit IRC (Quit: ZNC - http://znc.in)
18:54 ^🔗	MrRadar	In hindsight. I'm sure it was frustrating for him to deal with this baloney :/
18:54 ^🔗	MrRadar	When the case was ongoing
18:54 ^🔗	jrwr	Right
18:54 ^🔗		kisspunch has joined #archiveteam-bs
18:55 ^🔗	timmc	I feel bad for both parties but for different reasons.
18:55 ^🔗	jrwr	he was deemed a "Mass Mailer" by the courts as well
18:55 ^🔗	jrwr	https://www.plainsite.org/dockets/1z7yzelvr/washington-western-district-court/usa-v-modeleski/
18:55 ^🔗	jrwr	my god
18:55 ^🔗	jrwr	there is so much content
18:55 ^🔗		powerKitt has joined #archiveteam-bs
18:55 ^🔗	jrwr	wait that whole site links back to the IA
18:55 ^🔗	jrwr	thats interesting
18:56 ^🔗	powerKitt	IA spat out an rsync error trying to transfer the files for something I uploaded via torrent, and I stupidly deleted the files off my hard drive since I thought it was done. Is there anyway they can be recovered from a backup on IA's end?
18:56 ^🔗	powerKitt	https://catalogd.archive.org/log_show.php?task_id=682448038&full=1
19:03 ^🔗	kisspunch	ok wait re: the topic we found a way to scrape arbitrary dominos orders by enumerating urls...
19:03 ^🔗	jrwr	wat
19:04 ^🔗	xmc	lol
19:04 ^🔗	MrRadar	LOL
19:04 ^🔗	kisspunch	yeah we were trying to automate ordering pizza like sensible programmers and typo-d something, and got someone else's order?
19:04 ^🔗	jrwr	make a warrior
19:04 ^🔗	jrwr	it will be good data for the future
19:05 ^🔗	crusher_	find the pizza order and time that gets it to you the fastest
19:11 ^🔗	kisspunch	Hmm so re: valhalla, I don't feel like the approach will work, because it ultimately needs you to run some weird VM and it's a pain. Would anyone object if I wrote a (compatible) windows program?
19:11 ^🔗	jrwr	for news grabber?
19:11 ^🔗	kisspunch	jwr: for IA.BAK
19:11 ^🔗	kisspunch	jrwr sorry
19:12 ^🔗	kisspunch	It's definitely trading off total space available and reliability of that space
19:13 ^🔗	kisspunch	But I feel like increasing redundancy can compensate for worse reliability? It's not clear, transfers aren't free if anyone has numbers to plug in
19:14 ^🔗	kisspunch	Also there are decent arguments against writing a 'compatible' program
19:16 ^🔗	kisspunch	I'm thinking here of the success of things like @Home, which is something like "double click to install, press OK" and then it runs forever across reboots by default
19:17 ^🔗	jrwr	not a bad idea kisspunch
19:17 ^🔗	jrwr	spreading the love is key
19:17 ^🔗	jrwr	I though it really just used git + some magic
19:18 ^🔗	kisspunch	I thought it was using git-annex
19:18 ^🔗	jrwr	Ya
19:18 ^🔗	kisspunch	Anyway yes step 2 is writing the program
19:18 ^🔗	jrwr	Yep
19:18 ^🔗	kisspunch	I wanted to sound out peeps for whether they will object even once my program works though :)
19:18 ^🔗	jrwr	No, We always love anything new
19:18 ^🔗	jrwr	just don't expect much support besides the basics
19:19 ^🔗	kisspunch	That's totally fine
19:19 ^🔗	jrwr	I approve, but I do suggest making it cross plat as well
19:19 ^🔗	kisspunch	I generally like that sort of thing, but any particular reason?
19:20 ^🔗		icedice has quit IRC (Ping timeout: 260 seconds)
19:38 ^🔗	schbirid	i am a geo guy, if you want a map
19:40 ^🔗	schbirid	of those pizzas
19:59 ^🔗		ruunyan has quit IRC (Read error: Operation timed out)
20:00 ^🔗		ruunyan has joined #archiveteam-bs
20:15 ^🔗	crusher_	how many machines does mundus2018 have...
20:25 ^🔗		powerKitt has quit IRC (Quit: Page closed)
20:26 ^🔗		schbirid has quit IRC (Quit: Leaving)
20:28 ^🔗		kisspunch has quit IRC (Quit: ZNC - http://znc.in)
20:29 ^🔗		kisspunch has joined #archiveteam-bs
20:29 ^🔗		Jonison2 has joined #archiveteam-bs
20:31 ^🔗		Jonison has quit IRC (Ping timeout: 260 seconds)
20:52 ^🔗		Jonison2 has quit IRC (Quit: Leaving)
20:53 ^🔗		SHODAN_UI has joined #archiveteam-bs
21:12 ^🔗		_Crusher_ has joined #archiveteam-bs
21:12 ^🔗		crusher_ has quit IRC (Quit: Page closed)
21:12 ^🔗		_Crusher_ is now known as crusher
21:18 ^🔗		Jonison has joined #archiveteam-bs
21:20 ^🔗	MrRadar	Could someone with a Japanaese IP address help me grab a file? It's geoip filtered for some reason.
21:20 ^🔗	MrRadar	File is here: http://dambo.mydns.jp/uploader/giga/file/GigaPp8347.wav.html
21:20 ^🔗	MrRadar	"Password" is YM1980BD
21:21 ^🔗		crusher2 has joined #archiveteam-bs
21:21 ^🔗	crusher	Can you repost that link again
21:21 ^🔗	MrRadar	http://dambo.mydns.jp/uploader/giga/file/GigaPp8347.wav.html
21:22 ^🔗	MrRadar	There's a copy on YouTube but I'd prefer to get the original uncompressed version if possible
21:24 ^🔗	crusher	I'll have it in.... About half an hour
21:24 ^🔗	MrRadar	Thanks!
21:25 ^🔗	MrRadar	"Password" is YM1980BD in case you missed that too
21:25 ^🔗	crusher	I saw that, just was hoping to avoid typing the url into the browser :P
21:52 ^🔗		crusher has quit IRC (Ping timeout: 492 seconds)
21:59 ^🔗		Crusher has joined #archiveteam-bs
22:08 ^🔗		SHODAN_UI has quit IRC (Remote host closed the connection)
22:09 ^🔗		Crusher_ has joined #archiveteam-bs
22:09 ^🔗		Crusher has quit IRC (Read error: Connection reset by peer)
22:19 ^🔗		Jonison has quit IRC (Ping timeout: 260 seconds)
22:41 ^🔗	Frogging	Imgur has started redirecting direct links on the desktop
22:47 ^🔗	MrRadar	E.g. from i.imgur.com/blah to imgur.com/bla ?
22:55 ^🔗		sheaf has joined #archiveteam-bs
22:57 ^🔗	Frogging	sort off... I think it's more of a server side rewrite http://www.fastquake.com/images/screen-imgurredir-20170621-183414.png
22:58 ^🔗		sheaf has quit IRC (Remote host closed the connection)
22:59 ^🔗	timmc	Redirecting from direct image view to image embedded in page?
22:59 ^🔗	Frogging	yes
22:59 ^🔗	timmc	Yeah, they've been trying to get away from being hotlink-friendly.
23:00 ^🔗	Frogging	it's concerning to me
23:00 ^🔗	timmc	Running an image host is basically a sucker's game. I'm surprised they've lasted this long, honestly.
23:00 ^🔗	Frogging	same
23:00 ^🔗	Frogging	they might be on the way down
23:03 ^🔗	Crusher_	Any idea why the urlte.am warrior likes to report "no items available"
23:04 ^🔗	Crusher_	It seems like that would be something you'd expect to have loads available
23:06 ^🔗	Frogging	just wait, you'll get items
23:06 ^🔗	Frogging	the tracker doesn't generate items as fast as people take them
23:06 ^🔗	crusher2	i do, it's just that my machine is outpacing it
23:07 ^🔗	crusher2	oh i see
23:07 ^🔗	crusher2	so in other words, pick a different project, this one's covered
23:07 ^🔗	crusher2	right?
23:10 ^🔗	Frogging	I'm not sure
23:10 ^🔗	crusher2	Is the vine one still shut down?
23:11 ^🔗	Frogging	looks like it http://tracker.archiveteam.org/vine/
23:11 ^🔗	arkiver	I requeued imzy
23:11 ^🔗	arkiver	and also queued posts
23:11 ^🔗	Frogging	I've been told though that doing URLTeam is useful. I don't know the details of how the tracker works or where the bottleneck really is
23:12 ^🔗	Frogging	I just know that with high concurrency it often can't get items
23:12 ^🔗	Frogging	but it still runs most of the time
23:13 ^🔗	crusher2	arkiver: i'm still getting the same Server returned 0 error
23:14 ^🔗	MrRadar	arkiver: Did you see the comments from earlier about ignoring ?check=true URLs?
23:15 ^🔗	crusher2	how would i go about doing that?
23:15 ^🔗	crusher2	and no not really
23:15 ^🔗	arkiver	MrRadar: we're already skipping those if a 206 is received
23:15 ^🔗	arkiver	I tested it and it really should work
23:15 ^🔗	MrRadar	OK. I'll make sure my scripts are fully updated
23:15 ^🔗	arkiver	yeah, server is being a little hammered right now
23:15 ^🔗	arkiver	I'm making an update though the skip some URLs
23:19 ^🔗	crusher2	all my threads are sleeping from the error
23:19 ^🔗	crusher2	aside from a couple that say they are being limited
23:20 ^🔗	arkiver	or there's maybe something wrong with you connection
23:20 ^🔗	arkiver	mine do connect
23:20 ^🔗	crusher2	hmm.
23:20 ^🔗	crusher2	i can do eroshare and urlte.am just fine
23:21 ^🔗	arkiver	I have updated imzy
23:22 ^🔗	crusher2	i see it
23:23 ^🔗	crusher2	am i the only one with a bunch of batch scripts to do basic control over the warriors? xD
23:26 ^🔗	crusher2	odd...
23:26 ^🔗	crusher2	arkiver: Nope. Still the same post reboot
23:28 ^🔗	MrRadar	Damn, I just came up with the perfect Imzy channel name: #thelasyimzy (reference to the 2007 film The Last Mimzy)
23:28 ^🔗	MrRadar	*#thelastimzy
23:28 ^🔗	Frogging	haha that's good
23:28 ^🔗	MrRadar	Of course the project is nearly done at this point
23:29 ^🔗	crusher2	well, i can still connect to their site, so i'm not blocked or anything
23:30 ^🔗	crusher2	the only errors im getting are a pair of 422 on their splash page for some gifs
23:31 ^🔗	crusher2	i'm loading up wireshark to see if that tells me anything
23:33 ^🔗	crusher2	it must be something on my end, there are other warriors getting through...
23:33 ^🔗	MrRadar	Man, the huge items from Eroshare are really blocking up FOS's rsync connections.
23:34 ^🔗	crusher2	i've still got two that are going to take another hour
23:40 ^🔗		lucysun has joined #archiveteam-bs
23:41 ^🔗	lucysun	can someone help me find archives of aol forums or chat rooms from 1995 and before - does this even exist?
23:45 ^🔗	DFJustin	lucysun: archive team downloaded a bunch of the file collections from aol groups, some of them have logs I think https://archive.org/search.php?query=subject%3A%22aol+files%22&page=2
23:45 ^🔗	DFJustin	er https://archive.org/search.php?query=subject%3A%22aol+files%22
23:52 ^🔗	crusher2	arkiver: i still have to narrow it down to see if these packets are for the imzy warriors or not,
23:53 ^🔗	crusher2	but i'm getting loads of FCS errors from an ip that points right at archive.org
23:54 ^🔗	crusher2	specifically a map telling me how many books were scanned in the last 12 hours.
23:56 ^🔗	crusher2	would you like a short packet capture?
23:57 ^🔗		antomatic has quit IRC (Read error: Operation timed out)

irclogger-viewer