#archiveteam-bs 2017-09-05,Tue

↑back Search

Time	Nickname	Message
01:06 ^🔗	hook54321	kisspunch: There are at least a couple IA staff here, I'm not sure if Somebody2 works there, but oftentimes someone will know the answer to a question even if they don't work at IA, or they'll redirect you to someone who would be more likely to know the answer.
01:27 ^🔗		Stilett0 is now known as Stiletto
01:50 ^🔗		espes__ has quit IRC (Ping timeout: 250 seconds)
02:00 ^🔗		drumstick has joined #archiveteam-bs
02:00 ^🔗	Somebody2	kisspunch: I do not, but I hang around with people who do...
02:07 ^🔗		espes__ has joined #archiveteam-bs
03:10 ^🔗	dashcloud	kisspunch: I guess to answer that question, you have to first answer another one: what do you want the person downloading the files to be able to do? just see the content? track changes between time frames? recreate the exact experience a person would've had at a point in time? something else?
03:12 ^🔗	dashcloud	kisspunch: I saw your earlier thing talking about what kind of thing you have- since it's code, this talk is probably along the lines of what you want: https://www.youtube.com/watch?v=Xx6Bb2sY4zo
03:13 ^🔗	dashcloud	it's basically an archive of everything on GitHub that has 10 stars or more, without using endless space
03:37 ^🔗		Stilett0 has joined #archiveteam-bs
03:42 ^🔗		Stiletto has quit IRC (Read error: Operation timed out)
03:47 ^🔗		marvinw is now known as ivan
03:49 ^🔗	ivan	dashcloud: nice
03:51 ^🔗		drumstick has quit IRC (Read error: Operation timed out)
04:01 ^🔗	SketchCow	I've moved to a new apartment.
04:01 ^🔗	SketchCow	Massive connection, and actual heat, air conditioning and working bathroom. And drinkable water!
04:01 ^🔗	SketchCow	Will be more productive
04:02 ^🔗	astrid	MORE productive :O
04:02 ^🔗	SketchCow	Lot to do
04:02 ^🔗	SketchCow	Lot to make up
04:04 ^🔗	ivan	gigabit?
04:12 ^🔗	SketchCow	Let's not go crazy.
04:12 ^🔗	SketchCow	300mb. Quite good.
04:17 ^🔗		kyounko has joined #archiveteam-bs
04:21 ^🔗	ivan	cool
04:23 ^🔗	ivan	"How is internet in your area? I pay $27 for this crap. Supposed to be 500mbit. In Kyiv you can have 1gbit for 6 euro." Russians complaining about their 380/500
04:23 ^🔗	astrid	fuckin
04:26 ^🔗	pikhq	Right now I'd be jealous for having breathable air.
04:32 ^🔗		Stilett0 is now known as Stiletto
04:34 ^🔗	Asparagir	China? or Burbank?
04:40 ^🔗	pikhq	Colorado Springs, actually.
04:46 ^🔗	*	Asparagir google
04:46 ^🔗	*	Asparagir googles
04:46 ^🔗	*	Asparagir can't spell
04:46 ^🔗	Asparagir	oh
05:00 ^🔗		Sk1d has quit IRC (Ping timeout: 250 seconds)
05:07 ^🔗		Sk1d has joined #archiveteam-bs
05:22 ^🔗		Asparagir has quit IRC (Asparagir)
05:35 ^🔗		BlueMaxim has joined #archiveteam-bs
06:27 ^🔗		drumstick has joined #archiveteam-bs
07:30 ^🔗		HCross2 has joined #archiveteam-bs
07:45 ^🔗	hook54321	Question: If North Korea fixed this issue, then why do some domains still work? https://github.com/mandatoryprogrammer/NorthKoreaDNSLeak
08:07 ^🔗		drumstick has quit IRC (Remote host closed the connection)
08:08 ^🔗		drumstick has joined #archiveteam-bs
08:38 ^🔗		drumstick has quit IRC (Read error: Operation timed out)
08:50 ^🔗		Honno has joined #archiveteam-bs
08:51 ^🔗		drumstick has joined #archiveteam-bs
09:20 ^🔗		BlueMaxim has quit IRC (Quit: Leaving)
09:44 ^🔗		kristian_ has joined #archiveteam-bs
10:11 ^🔗	JAA	hook54321: They fixed the leak, i.e. you can't get a list of domains through AXFR anymore.
11:36 ^🔗		drumstick has quit IRC (Ping timeout: 600 seconds)
12:26 ^🔗		odemg has quit IRC (Read error: Operation timed out)
12:28 ^🔗		Kalroth has quit IRC (Ping timeout: 250 seconds)
12:32 ^🔗		Mateon1 has quit IRC (Ping timeout: 250 seconds)
12:33 ^🔗		Mateon1 has joined #archiveteam-bs
12:35 ^🔗		kristian_ has quit IRC (Quit: Leaving)
12:38 ^🔗		Kalroth has joined #archiveteam-bs
12:40 ^🔗		slackpi has joined #archiveteam-bs
12:41 ^🔗	slackpi	hey everyone
12:41 ^🔗	slackpi	i'm on my slackware rpi distro i just build this morning
12:41 ^🔗	slackpi	turned out part of my problem was glibc-solibs script
12:42 ^🔗	slackpi	it was not making the links
12:43 ^🔗	slackpi	that was what was crashing berryboot kernel
12:49 ^🔗		odemg has joined #archiveteam-bs
12:50 ^🔗	slackpi	hey odemg
12:52 ^🔗	odemg	hey
12:52 ^🔗	slackpi	i got slackware arm working
13:17 ^🔗	slackpi	odemg: my plan is to make a librarybox+kiwix hybrid on slackware arm
13:27 ^🔗		Honno has quit IRC (Read error: Operation timed out)
14:44 ^🔗	Jon	hey the other day I asked about scanning/submitting some old UK SF magazines (Interzone) and was advised to do 600dpi/TIFF. No problem. Any other advice or tips or URLs to read on scanning projects in general? Should I chop up the resulting TIFFs into sub-pages (each side is two separate pages from the publication)
14:44 ^🔗	Jon	etc
14:56 ^🔗		Honno has joined #archiveteam-bs
16:17 ^🔗		ld1 has quit IRC (Ping timeout: 260 seconds)
16:17 ^🔗		ld1 has joined #archiveteam-bs
16:25 ^🔗		schbirid has joined #archiveteam-bs
16:51 ^🔗		ld1 has quit IRC (Ping timeout: 260 seconds)
16:53 ^🔗		ld1 has joined #archiveteam-bs
16:56 ^🔗	odemg	slackpi, are you confusing me with someone else, this is the first I'm hearing of it?
16:59 ^🔗	astrid	http://radio.garden/ - a nice distraction, at least
17:03 ^🔗	slackpi	i have talked about it before on archiveteam-bs
17:03 ^🔗	slackpi	at least i think i talked about it here
17:06 ^🔗	astrid	slackpi: o wait are you godane
17:06 ^🔗	slackpi	yes
17:06 ^🔗	astrid	kool
17:06 ^🔗	slackpi	i'm on my raspberry pi 2
17:06 ^🔗	slackpi	same room
17:07 ^🔗	astrid	yeah, i seem to recall having seen you mention radio.garden a while ago
17:07 ^🔗	slackpi	thats the one with radio stations around the world
17:08 ^🔗	astrid	yeup
17:22 ^🔗	namibj1	Jon: you might be interested in writing to the internet archive directly, as they routinely do book scanning, or just look at how they handle recently scanned books that are handled internally by them.
17:25 ^🔗	godane	i'm now back on my main system
17:25 ^🔗	godane	for the moment :P
17:26 ^🔗	godane	i'm at 3231 items for this month so far
17:27 ^🔗	godane	i'm getting close to half of the items i had last month
17:30 ^🔗	astrid	Jon: yes each tiff should be a left or a right hand, not both
17:31 ^🔗	astrid	name them 0001.tif 0002.tif etc, and put them in an archive named (whatever)_images.tar
17:31 ^🔗	astrid	you don't have to name them anything in particular so long as they sort correctly
17:44 ^🔗		MartinThe has joined #archiveteam-bs
17:45 ^🔗	MartinThe	Hey guys, I have a question about archiving something
17:45 ^🔗	Frogging	ask away
17:45 ^🔗	MartinThe	I want to archive a couple of Diney website games, but they seem to be some sort of horrible multi-part / multi-file SWF files
17:45 ^🔗	joepie91_	godane: http://www.oldradioworld.com/media/ (via /r/opendirectories)
17:46 ^🔗	MartinThe	So not sure how to proceed
17:46 ^🔗	joepie91_	MartinThe: ah, the type that loads new files on demand as you click through the game?
17:46 ^🔗	MartinThe	I'm trying WarcMITMProxy, but last commit is from 4 years ago and it looks like dependencies broke big time. I'm running Ubuntu 16.04 LTS
17:46 ^🔗	MartinThe	joepie91_, Correct
17:46 ^🔗	joepie91_	ah yes, those are a pain, I don't think there's a bulletproof solution for those yet
17:46 ^🔗	astrid	MartinThe: link to this software?
17:47 ^🔗	MartinThe	https://github.com/odie5533/WarcMITMProxy
17:47 ^🔗	MartinThe	astrid, ^^ was linked to on the a-t.org wiki
17:47 ^🔗	joepie91_	MartinThe: afaik, your options are indeed either a warc proxy of some sort, or using a decompiler/converter that can take apart the SWFs and scripting your way around it
17:47 ^🔗	joepie91_	former being theoretically easiest
17:47 ^🔗	MartinThe	joepie91_, Augh, decompiling is something I'd rather not do. WARC proxy looks like the best option
17:47 ^🔗	hook54321	Would webrecorder work?
17:48 ^🔗	hook54321	https://webrecorder.io/
17:48 ^🔗	MartinThe	hook54321, Not sure, the new downloads are triggered from the running SWF
17:48 ^🔗	MartinThe	hook54321, I presume webrecorder is basically a wget-type deal?
17:48 ^🔗	MrRadar	Have you tried warcprox? https://github.com/internetarchive/warcprox
17:49 ^🔗	godane	joepie91: i'm going to be lazy and give it to archivebot
17:49 ^🔗	MartinThe	MrRadar, Looks cool, will check it out in a minute. Thanks a lot!
17:49 ^🔗	hook54321	MartinThe: You enter a starting URL and then you browse stuff manually and it puts it all into a WARC
17:51 ^🔗	joepie91_	godane: heh. just figured you might be interested in it given that you seem to do a lot of podcast/radio stuff :)
17:57 ^🔗	MartinThe	Oh heck, not just SWF. This thing's doing XML requests too. Whoa. Yup, WARC looks like the only way. MrRadar: Warcprox works fine
18:01 ^🔗	MrRadar	Glad I could help
18:02 ^🔗		cf has quit IRC (Read error: Operation timed out)
18:03 ^🔗		cf has joined #archiveteam-bs
19:01 ^🔗	hook54321	arkiver: Did the imgh.us person reply? Also, did you contact them through the email address listed in whois or through the form on their site?
19:06 ^🔗		MartinThe has quit IRC (Remote host closed the connection)
19:07 ^🔗		Sanqui has quit IRC (Ping timeout: 260 seconds)
19:15 ^🔗		Sanqui has joined #archiveteam-bs
20:26 ^🔗		Aranje has joined #archiveteam-bs
20:27 ^🔗	SketchCow	Anyone in here comfortable parsing XML?
20:39 ^🔗	namibj1	SketchCow: what for?
20:41 ^🔗	namibj1	I would get my skills in perl6 honed a bit, if the task seems like I could handle it. I think the motivation is enough to make me do it. Would take like at least 12h though...
20:41 ^🔗	SketchCow	I'm going to do it a stupid way
20:41 ^🔗	SketchCow	Hold my avocado
20:41 ^🔗	namibj1	Ha
20:41 ^🔗	namibj1	Ok, just thought you might have some time to get it done.
20:45 ^🔗	JAA	Parse it with regex! :-)
20:46 ^🔗		Aranje has quit IRC (Quit: Three sheets to the wind)
20:46 ^🔗	namibj1	I think he does, i guess that is the only stupid way.
20:58 ^🔗		sun_shine has joined #archiveteam-bs
20:58 ^🔗	sun_shine	I have a question about the wayback machine I'm not sure where else to pose
20:59 ^🔗	atluxity	sun_shine: shoot
20:59 ^🔗	sun_shine	An historically important website I need for research purposes has been maliciously excluded
20:59 ^🔗	sun_shine	the domain is now owned by spammers who aren't interested in selling it. I'm not sure that the creators of the site can be contacted
20:59 ^🔗	sun_shine	is there anything I can do?
21:14 ^🔗	schbirid	nope
21:14 ^🔗		schbirid has quit IRC (Quit: Leaving)
21:15 ^🔗	*	JAA is now listening to: Metallica - Sad But True
21:38 ^🔗	hook54321	sun_shine: Did you check if the creators of the site had an email listed in the whois for the domain?
21:39 ^🔗	sun_shine	this was back in 2009. is there anywhere i can look up historical whois stuff like that?
21:39 ^🔗	hook54321	what's the site?
21:40 ^🔗	sun_shine	isaccorp.org
21:41 ^🔗	hook54321	Seems to work fine for me. https://web.archive.org/web/*/http://isaccorp.com/
21:41 ^🔗	sun_shine	the site was at isaccorp.com until 2005, when it moved to isaccorp.org
21:41 ^🔗	hook54321	oh
21:42 ^🔗	sun_shine	The site had enemies. I can't say for certain that the original owners weren't the ones who asked for it to be excluded, but it would be out of character.
21:42 ^🔗	sun_shine	And it seems like it was manually excluded rather than by robots.txt
21:45 ^🔗	hook54321	There's a mirror of the wayback machine, it isn't up to date though. http://web.archive.bibalex.org/web/*/http://isaccorp.org
21:45 ^🔗	hook54321	I'm gonna try to find a way to contact the previous owners.
21:47 ^🔗	sun_shine	wait, so does this show the captures that exist but currently aren't available?
21:47 ^🔗	hook54321	Right now, someone in Ukraine named Andrey Ahiezer owns the domain.
21:47 ^🔗	sun_shine	Have you ever heard of a site being unexcluded? I know if the issue is robots.txt, then whoever controls the domain effectively controls its past availability as well
21:48 ^🔗	hook54321	It shows the captures that existed at the time they mirrored the wayback machine.
21:48 ^🔗	sun_shine	But since it was manually excluded I'm not sure that someone could override that even if, say, the present owner decided to
21:49 ^🔗	hook54321	Could someone from Ukraine or someone named Andrey Ahiezer have been an enemy of the site?
21:49 ^🔗	sun_shine	Really unlikely.
21:50 ^🔗	sun_shine	If the archive cuts off at 2007, though, that seems to suggest when the request for removal was sent
21:50 ^🔗	hook54321	Or when they last updated the mirror
21:50 ^🔗	hook54321	When a site is excluded manually they will still crawl it
21:51 ^🔗	sun_shine	oh, nevermind they last updated the mirror in 2007
21:51 ^🔗	sun_shine	http://web.archive.bibalex.org/web/*/http://example.org
22:02 ^🔗	hook54321	sun_shine: domaintools has whois history, it's not free though. https://whois.domaintools.com/isaccorp.org
22:03 ^🔗	sun_shine	you know, I have very rarely encountered 'domain excluded' errors when using wayback
22:03 ^🔗	sun_shine	and I'm a really heavy user
22:04 ^🔗	sun_shine	I just checked on two other defunct advocacy websites in the same area. Both excluded - and I know that the first one was purchased by the corporation it published exposes on after the owner died.
22:04 ^🔗	sun_shine	I think they bought the domains after the expired, had them excluded, and then dumped them
22:06 ^🔗	hook54321	What are the other two domains?
22:06 ^🔗	hook54321	and the corporation
22:07 ^🔗		drumstick has joined #archiveteam-bs
22:07 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
22:10 ^🔗		dashcloud has joined #archiveteam-bs
22:14 ^🔗	sun_shine	intrepidnetreporter.com and caica.org . The corporation that bought intrepidnetreporter is called WWASP and has a documented history of suing online critics. All three of these websites reported critically on them. https://en.wikipedia.org/wiki/World_Wide_Association_of_Specialty_Programs_and_Schools
22:15 ^🔗	sun_shine	I think I'm just going to write info@archive.org and ask nicely. I'm not sure there's any other option.
22:15 ^🔗	astrid	probably yeah
22:21 ^🔗		namibj_ has quit IRC (Ping timeout: 260 seconds)
22:33 ^🔗		namibj_ has joined #archiveteam-bs
22:34 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
22:34 ^🔗		dashcloud has joined #archiveteam-bs
22:35 ^🔗		Soni has quit IRC (Read error: Operation timed out)
22:36 ^🔗		Soni has joined #archiveteam-bs
23:19 ^🔗		Asparagir has joined #archiveteam-bs
23:41 ^🔗		Honno has quit IRC (Read error: Operation timed out)
23:56 ^🔗		slackpi has quit IRC (Read error: Connection reset by peer)

irclogger-viewer