#archiveteam-bs 2017-09-05,Tue

↑back Search

Time Nickname Message
01:06 🔗 hook54321 kisspunch: There are at least a couple IA staff here, I'm not sure if Somebody2 works there, but oftentimes someone will know the answer to a question even if they don't work at IA, or they'll redirect you to someone who would be more likely to know the answer.
01:27 🔗 Stilett0 is now known as Stiletto
01:50 🔗 espes__ has quit IRC (Ping timeout: 250 seconds)
02:00 🔗 drumstick has joined #archiveteam-bs
02:00 🔗 Somebody2 kisspunch: I do not, but I hang around with people who do...
02:07 🔗 espes__ has joined #archiveteam-bs
03:10 🔗 dashcloud kisspunch: I guess to answer that question, you have to first answer another one: what do you want the person downloading the files to be able to do? just see the content? track changes between time frames? recreate the exact experience a person would've had at a point in time? something else?
03:12 🔗 dashcloud kisspunch: I saw your earlier thing talking about what kind of thing you have- since it's code, this talk is probably along the lines of what you want: https://www.youtube.com/watch?v=Xx6Bb2sY4zo
03:13 🔗 dashcloud it's basically an archive of everything on GitHub that has 10 stars or more, without using endless space
03:37 🔗 Stilett0 has joined #archiveteam-bs
03:42 🔗 Stiletto has quit IRC (Read error: Operation timed out)
03:47 🔗 marvinw is now known as ivan
03:49 🔗 ivan dashcloud: nice
03:51 🔗 drumstick has quit IRC (Read error: Operation timed out)
04:01 🔗 SketchCow I've moved to a new apartment.
04:01 🔗 SketchCow Massive connection, and actual heat, air conditioning and working bathroom. And drinkable water!
04:01 🔗 SketchCow Will be more productive
04:02 🔗 astrid MORE productive :O
04:02 🔗 SketchCow Lot to do
04:02 🔗 SketchCow Lot to make up
04:04 🔗 ivan gigabit?
04:12 🔗 SketchCow Let's not go crazy.
04:12 🔗 SketchCow 300mb. Quite good.
04:17 🔗 kyounko has joined #archiveteam-bs
04:21 🔗 ivan cool
04:23 🔗 ivan "How is internet in your area? I pay $27 for this crap. Supposed to be 500mbit. In Kyiv you can have 1gbit for 6 euro." Russians complaining about their 380/500
04:23 🔗 astrid fuckin
04:26 🔗 pikhq Right now I'd be jealous for having breathable air.
04:32 🔗 Stilett0 is now known as Stiletto
04:34 🔗 Asparagir China? or Burbank?
04:40 🔗 pikhq Colorado Springs, actually.
04:46 🔗 * Asparagir google
04:46 🔗 * Asparagir googles
04:46 🔗 * Asparagir can't spell
04:46 🔗 Asparagir oh
05:00 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:07 🔗 Sk1d has joined #archiveteam-bs
05:22 🔗 Asparagir has quit IRC (Asparagir)
05:35 🔗 BlueMaxim has joined #archiveteam-bs
06:27 🔗 drumstick has joined #archiveteam-bs
07:30 🔗 HCross2 has joined #archiveteam-bs
07:45 🔗 hook54321 Question: If North Korea fixed this issue, then why do some domains still work? https://github.com/mandatoryprogrammer/NorthKoreaDNSLeak
08:07 🔗 drumstick has quit IRC (Remote host closed the connection)
08:08 🔗 drumstick has joined #archiveteam-bs
08:38 🔗 drumstick has quit IRC (Read error: Operation timed out)
08:50 🔗 Honno has joined #archiveteam-bs
08:51 🔗 drumstick has joined #archiveteam-bs
09:20 🔗 BlueMaxim has quit IRC (Quit: Leaving)
09:44 🔗 kristian_ has joined #archiveteam-bs
10:11 🔗 JAA hook54321: They fixed the leak, i.e. you can't get a list of domains through AXFR anymore.
11:36 🔗 drumstick has quit IRC (Ping timeout: 600 seconds)
12:26 🔗 odemg has quit IRC (Read error: Operation timed out)
12:28 🔗 Kalroth has quit IRC (Ping timeout: 250 seconds)
12:32 🔗 Mateon1 has quit IRC (Ping timeout: 250 seconds)
12:33 🔗 Mateon1 has joined #archiveteam-bs
12:35 🔗 kristian_ has quit IRC (Quit: Leaving)
12:38 🔗 Kalroth has joined #archiveteam-bs
12:40 🔗 slackpi has joined #archiveteam-bs
12:41 🔗 slackpi hey everyone
12:41 🔗 slackpi i'm on my slackware rpi distro i just build this morning
12:41 🔗 slackpi turned out part of my problem was glibc-solibs script
12:42 🔗 slackpi it was not making the links
12:43 🔗 slackpi that was what was crashing berryboot kernel
12:49 🔗 odemg has joined #archiveteam-bs
12:50 🔗 slackpi hey odemg
12:52 🔗 odemg hey
12:52 🔗 slackpi i got slackware arm working
13:17 🔗 slackpi odemg: my plan is to make a librarybox+kiwix hybrid on slackware arm
13:27 🔗 Honno has quit IRC (Read error: Operation timed out)
14:44 🔗 Jon hey the other day I asked about scanning/submitting some old UK SF magazines (Interzone) and was advised to do 600dpi/TIFF. No problem. Any other advice or tips or URLs to read on scanning projects in general? Should I chop up the resulting TIFFs into sub-pages (each side is two separate pages from the publication)
14:44 🔗 Jon etc
14:56 🔗 Honno has joined #archiveteam-bs
16:17 🔗 ld1 has quit IRC (Ping timeout: 260 seconds)
16:17 🔗 ld1 has joined #archiveteam-bs
16:25 🔗 schbirid has joined #archiveteam-bs
16:51 🔗 ld1 has quit IRC (Ping timeout: 260 seconds)
16:53 🔗 ld1 has joined #archiveteam-bs
16:56 🔗 odemg slackpi, are you confusing me with someone else, this is the first I'm hearing of it?
16:59 🔗 astrid http://radio.garden/ - a nice distraction, at least
17:03 🔗 slackpi i have talked about it before on archiveteam-bs
17:03 🔗 slackpi at least i think i talked about it here
17:06 🔗 astrid slackpi: o wait are you godane
17:06 🔗 slackpi yes
17:06 🔗 astrid kool
17:06 🔗 slackpi i'm on my raspberry pi 2
17:06 🔗 slackpi same room
17:07 🔗 astrid yeah, i seem to recall having seen you mention radio.garden a while ago
17:07 🔗 slackpi thats the one with radio stations around the world
17:08 🔗 astrid yeup
17:22 🔗 namibj1 Jon: you might be interested in writing to the internet archive directly, as they routinely do book scanning, or just look at how they handle recently scanned books that are handled internally by them.
17:25 🔗 godane i'm now back on my main system
17:25 🔗 godane for the moment :P
17:26 🔗 godane i'm at 3231 items for this month so far
17:27 🔗 godane i'm getting close to half of the items i had last month
17:30 🔗 astrid Jon: yes each tiff should be a left or a right hand, not both
17:31 🔗 astrid name them 0001.tif 0002.tif etc, and put them in an archive named (whatever)_images.tar
17:31 🔗 astrid you don't have to name them anything in particular so long as they sort correctly
17:44 🔗 MartinThe has joined #archiveteam-bs
17:45 🔗 MartinThe Hey guys, I have a question about archiving something
17:45 🔗 Frogging ask away
17:45 🔗 MartinThe I want to archive a couple of Diney website games, but they seem to be some sort of horrible multi-part / multi-file SWF files
17:45 🔗 joepie91_ godane: http://www.oldradioworld.com/media/ (via /r/opendirectories)
17:46 🔗 MartinThe So not sure how to proceed
17:46 🔗 joepie91_ MartinThe: ah, the type that loads new files on demand as you click through the game?
17:46 🔗 MartinThe I'm trying WarcMITMProxy, but last commit is from 4 years ago and it looks like dependencies broke big time. I'm running Ubuntu 16.04 LTS
17:46 🔗 MartinThe joepie91_, Correct
17:46 🔗 joepie91_ ah yes, those are a pain, I don't think there's a bulletproof solution for those yet
17:46 🔗 astrid MartinThe: link to this software?
17:47 🔗 MartinThe https://github.com/odie5533/WarcMITMProxy
17:47 🔗 MartinThe astrid, ^^ was linked to on the a-t.org wiki
17:47 🔗 joepie91_ MartinThe: afaik, your options are indeed either a warc proxy of some sort, or using a decompiler/converter that can take apart the SWFs and scripting your way around it
17:47 🔗 joepie91_ former being theoretically easiest
17:47 🔗 MartinThe joepie91_, Augh, decompiling is something I'd rather not do. WARC proxy looks like the best option
17:47 🔗 hook54321 Would webrecorder work?
17:48 🔗 hook54321 https://webrecorder.io/
17:48 🔗 MartinThe hook54321, Not sure, the new downloads are triggered from the running SWF
17:48 🔗 MartinThe hook54321, I presume webrecorder is basically a wget-type deal?
17:48 🔗 MrRadar Have you tried warcprox? https://github.com/internetarchive/warcprox
17:49 🔗 godane joepie91: i'm going to be lazy and give it to archivebot
17:49 🔗 MartinThe MrRadar, Looks cool, will check it out in a minute. Thanks a lot!
17:49 🔗 hook54321 MartinThe: You enter a starting URL and then you browse stuff manually and it puts it all into a WARC
17:51 🔗 joepie91_ godane: heh. just figured you might be interested in it given that you seem to do a lot of podcast/radio stuff :)
17:57 🔗 MartinThe Oh heck, not just SWF. This thing's doing XML requests too. Whoa. Yup, WARC looks like the only way. MrRadar: Warcprox works fine
18:01 🔗 MrRadar Glad I could help
18:02 🔗 cf has quit IRC (Read error: Operation timed out)
18:03 🔗 cf has joined #archiveteam-bs
19:01 🔗 hook54321 arkiver: Did the imgh.us person reply? Also, did you contact them through the email address listed in whois or through the form on their site?
19:06 🔗 MartinThe has quit IRC (Remote host closed the connection)
19:07 🔗 Sanqui has quit IRC (Ping timeout: 260 seconds)
19:15 🔗 Sanqui has joined #archiveteam-bs
20:26 🔗 Aranje has joined #archiveteam-bs
20:27 🔗 SketchCow Anyone in here comfortable parsing XML?
20:39 🔗 namibj1 SketchCow: what for?
20:41 🔗 namibj1 I would get my skills in perl6 honed a bit, if the task seems like I could handle it. I think the motivation is enough to make me do it. Would take like at least 12h though...
20:41 🔗 SketchCow I'm going to do it a stupid way
20:41 🔗 SketchCow Hold my avocado
20:41 🔗 namibj1 Ha
20:41 🔗 namibj1 Ok, just thought you might have some time to get it done.
20:45 🔗 JAA Parse it with regex! :-)
20:46 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
20:46 🔗 namibj1 I think he does, i guess that is the only stupid way.
20:58 🔗 sun_shine has joined #archiveteam-bs
20:58 🔗 sun_shine I have a question about the wayback machine I'm not sure where else to pose
20:59 🔗 atluxity sun_shine: shoot
20:59 🔗 sun_shine An historically important website I need for research purposes has been maliciously excluded
20:59 🔗 sun_shine the domain is now owned by spammers who aren't interested in selling it. I'm not sure that the creators of the site can be contacted
20:59 🔗 sun_shine is there anything I can do?
21:14 🔗 schbirid nope
21:14 🔗 schbirid has quit IRC (Quit: Leaving)
21:15 🔗 * JAA is now listening to: Metallica - Sad But True
21:38 🔗 hook54321 sun_shine: Did you check if the creators of the site had an email listed in the whois for the domain?
21:39 🔗 sun_shine this was back in 2009. is there anywhere i can look up historical whois stuff like that?
21:39 🔗 hook54321 what's the site?
21:40 🔗 sun_shine isaccorp.org
21:41 🔗 hook54321 Seems to work fine for me. https://web.archive.org/web/*/http://isaccorp.com/
21:41 🔗 sun_shine the site was at isaccorp.com until 2005, when it moved to isaccorp.org
21:41 🔗 hook54321 oh
21:42 🔗 sun_shine The site had enemies. I can't say for certain that the original owners weren't the ones who asked for it to be excluded, but it would be out of character.
21:42 🔗 sun_shine And it seems like it was manually excluded rather than by robots.txt
21:45 🔗 hook54321 There's a mirror of the wayback machine, it isn't up to date though. http://web.archive.bibalex.org/web/*/http://isaccorp.org
21:45 🔗 hook54321 I'm gonna try to find a way to contact the previous owners.
21:47 🔗 sun_shine wait, so does this show the captures that exist but currently aren't available?
21:47 🔗 hook54321 Right now, someone in Ukraine named Andrey Ahiezer owns the domain.
21:47 🔗 sun_shine Have you ever heard of a site being unexcluded? I know if the issue is robots.txt, then whoever controls the domain effectively controls its past availability as well
21:48 🔗 hook54321 It shows the captures that existed at the time they mirrored the wayback machine.
21:48 🔗 sun_shine But since it was manually excluded I'm not sure that someone could override that even if, say, the present owner decided to
21:49 🔗 hook54321 Could someone from Ukraine or someone named Andrey Ahiezer have been an enemy of the site?
21:49 🔗 sun_shine Really unlikely.
21:50 🔗 sun_shine If the archive cuts off at 2007, though, that seems to suggest when the request for removal was sent
21:50 🔗 hook54321 Or when they last updated the mirror
21:50 🔗 hook54321 When a site is excluded manually they will still crawl it
21:51 🔗 sun_shine oh, nevermind they last updated the mirror in 2007
21:51 🔗 sun_shine http://web.archive.bibalex.org/web/*/http://example.org
22:02 🔗 hook54321 sun_shine: domaintools has whois history, it's not free though. https://whois.domaintools.com/isaccorp.org
22:03 🔗 sun_shine you know, I have very rarely encountered 'domain excluded' errors when using wayback
22:03 🔗 sun_shine and I'm a really heavy user
22:04 🔗 sun_shine I just checked on two other defunct advocacy websites in the same area. Both excluded - and I know that the first one was purchased by the corporation it published exposes on after the owner died.
22:04 🔗 sun_shine I think they bought the domains after the expired, had them excluded, and then dumped them
22:06 🔗 hook54321 What are the other two domains?
22:06 🔗 hook54321 and the corporation
22:07 🔗 drumstick has joined #archiveteam-bs
22:07 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:10 🔗 dashcloud has joined #archiveteam-bs
22:14 🔗 sun_shine intrepidnetreporter.com and caica.org . The corporation that bought intrepidnetreporter is called WWASP and has a documented history of suing online critics. All three of these websites reported critically on them. https://en.wikipedia.org/wiki/World_Wide_Association_of_Specialty_Programs_and_Schools
22:15 🔗 sun_shine I think I'm just going to write info@archive.org and ask nicely. I'm not sure there's any other option.
22:15 🔗 astrid probably yeah
22:21 🔗 namibj_ has quit IRC (Ping timeout: 260 seconds)
22:33 🔗 namibj_ has joined #archiveteam-bs
22:34 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:34 🔗 dashcloud has joined #archiveteam-bs
22:35 🔗 Soni has quit IRC (Read error: Operation timed out)
22:36 🔗 Soni has joined #archiveteam-bs
23:19 🔗 Asparagir has joined #archiveteam-bs
23:41 🔗 Honno has quit IRC (Read error: Operation timed out)
23:56 🔗 slackpi has quit IRC (Read error: Connection reset by peer)

irclogger-viewer