#archiveteam-bs 2017-09-05,Tue

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
hook54321kisspunch: There are at least a couple IA staff here, I'm not sure if Somebody2 works there, but oftentimes someone will know the answer to a question even if they don't work at IA, or they'll redirect you to someone who would be more likely to know the answer. [01:06]
..... (idle for 21mn)
***Stilett0 is now known as Stiletto [01:27]
..... (idle for 23mn)
espes__ has quit IRC (Ping timeout: 250 seconds) [01:50]
drumstick has joined #archiveteam-bs [02:00]
Somebody2kisspunch: I do not, but I hang around with people who do... [02:00]
***espes__ has joined #archiveteam-bs [02:07]
............. (idle for 1h3mn)
dashcloudkisspunch: I guess to answer that question, you have to first answer another one: what do you want the person downloading the files to be able to do? just see the content? track changes between time frames? recreate the exact experience a person would've had at a point in time? something else?
kisspunch: I saw your earlier thing talking about what kind of thing you have- since it's code, this talk is probably along the lines of what you want: https://www.youtube.com/watch?v=Xx6Bb2sY4zo
it's basically an archive of everything on GitHub that has 10 stars or more, without using endless space
[03:10]
..... (idle for 24mn)
***Stilett0 has joined #archiveteam-bs [03:37]
Stiletto has quit IRC (Read error: Operation timed out) [03:42]
marvinw is now known as ivan [03:47]
ivandashcloud: nice [03:49]
***drumstick has quit IRC (Read error: Operation timed out) [03:51]
SketchCowI've moved to a new apartment.
Massive connection, and actual heat, air conditioning and working bathroom. And drinkable water!
Will be more productive
[04:01]
astridMORE productive :O [04:02]
SketchCowLot to do
Lot to make up
[04:02]
ivangigabit? [04:04]
SketchCowLet's not go crazy.
300mb. Quite good.
[04:12]
***kyounko has joined #archiveteam-bs [04:17]
ivancool
"How is internet in your area? I pay $27 for this crap. Supposed to be 500mbit. In Kyiv you can have 1gbit for 6 euro." Russians complaining about their 380/500
[04:21]
astridfuckin [04:23]
pikhqRight now I'd be jealous for having breathable air. [04:26]
***Stilett0 is now known as Stiletto [04:32]
AsparagirChina? or Burbank? [04:34]
pikhqColorado Springs, actually. [04:40]
AsparagirAsparagir google
Asparagir googles
Asparagir can't spell
oh
[04:46]
***Sk1d has quit IRC (Ping timeout: 250 seconds) [05:00]
Sk1d has joined #archiveteam-bs [05:07]
.... (idle for 15mn)
Asparagir has quit IRC (Asparagir) [05:22]
BlueMaxim has joined #archiveteam-bs [05:35]
........... (idle for 52mn)
drumstick has joined #archiveteam-bs [06:27]
............. (idle for 1h3mn)
HCross2 has joined #archiveteam-bs [07:30]
.... (idle for 15mn)
hook54321Question: If North Korea fixed this issue, then why do some domains still work? https://github.com/mandatoryprogrammer/NorthKoreaDNSLeak [07:45]
..... (idle for 22mn)
***drumstick has quit IRC (Remote host closed the connection)
drumstick has joined #archiveteam-bs
[08:07]
....... (idle for 30mn)
drumstick has quit IRC (Read error: Operation timed out) [08:38]
Honno has joined #archiveteam-bs
drumstick has joined #archiveteam-bs
[08:50]
...... (idle for 29mn)
BlueMaxim has quit IRC (Quit: Leaving) [09:20]
..... (idle for 24mn)
kristian_ has joined #archiveteam-bs [09:44]
...... (idle for 27mn)
JAAhook54321: They fixed the leak, i.e. you can't get a list of domains through AXFR anymore. [10:11]
.................. (idle for 1h25mn)
***drumstick has quit IRC (Ping timeout: 600 seconds) [11:36]
........... (idle for 50mn)
odemg has quit IRC (Read error: Operation timed out)
Kalroth has quit IRC (Ping timeout: 250 seconds)
Mateon1 has quit IRC (Ping timeout: 250 seconds)
Mateon1 has joined #archiveteam-bs
kristian_ has quit IRC (Quit: Leaving)
Kalroth has joined #archiveteam-bs
slackpi has joined #archiveteam-bs
[12:26]
slackpihey everyone
i'm on my slackware rpi distro i just build this morning
turned out part of my problem was glibc-solibs script
it was not making the links
that was what was crashing berryboot kernel
[12:41]
***odemg has joined #archiveteam-bs [12:49]
slackpihey odemg [12:50]
odemghey [12:52]
slackpii got slackware arm working [12:52]
...... (idle for 25mn)
odemg: my plan is to make a librarybox+kiwix hybrid on slackware arm [13:17]
***Honno has quit IRC (Read error: Operation timed out) [13:27]
................ (idle for 1h17mn)
Jonhey the other day I asked about scanning/submitting some old UK SF magazines (Interzone) and was advised to do 600dpi/TIFF. No problem. Any other advice or tips or URLs to read on scanning projects in general? Should I chop up the resulting TIFFs into sub-pages (each side is two separate pages from the publication)
etc
[14:44]
***Honno has joined #archiveteam-bs [14:56]
................. (idle for 1h21mn)
ld1 has quit IRC (Ping timeout: 260 seconds)
ld1 has joined #archiveteam-bs
[16:17]
schbirid has joined #archiveteam-bs [16:25]
...... (idle for 26mn)
ld1 has quit IRC (Ping timeout: 260 seconds)
ld1 has joined #archiveteam-bs
[16:51]
odemgslackpi, are you confusing me with someone else, this is the first I'm hearing of it? [16:56]
astridhttp://radio.garden/ - a nice distraction, at least [16:59]
slackpii have talked about it before on archiveteam-bs
at least i think i talked about it here
[17:03]
astridslackpi: o wait are you godane [17:06]
slackpiyes [17:06]
astridkool [17:06]
slackpii'm on my raspberry pi 2
same room
[17:06]
astridyeah, i seem to recall having seen you mention radio.garden a while ago [17:07]
slackpithats the one with radio stations around the world [17:07]
astridyeup [17:08]
namibj1Jon: you might be interested in writing to the internet archive directly, as they routinely do book scanning, or just look at how they handle recently scanned books that are handled internally by them. [17:22]
godanei'm now back on my main system
for the moment :P
i'm at 3231 items for this month so far
i'm getting close to half of the items i had last month
[17:25]
astridJon: yes each tiff should be a left or a right hand, not both
name them 0001.tif 0002.tif etc, and put them in an archive named (whatever)_images.tar
you don't have to name them anything in particular so long as they sort correctly
[17:30]
***MartinThe has joined #archiveteam-bs [17:44]
MartinTheHey guys, I have a question about archiving something [17:45]
Froggingask away [17:45]
MartinTheI want to archive a couple of Diney website games, but they seem to be some sort of horrible multi-part / multi-file SWF files [17:45]
joepie91_godane: http://www.oldradioworld.com/media/ (via /r/opendirectories) [17:45]
MartinTheSo not sure how to proceed [17:46]
joepie91_MartinThe: ah, the type that loads new files on demand as you click through the game? [17:46]
MartinTheI'm trying WarcMITMProxy, but last commit is from 4 years ago and it looks like dependencies broke big time. I'm running Ubuntu 16.04 LTS
joepie91_, Correct
[17:46]
joepie91_ah yes, those are a pain, I don't think there's a bulletproof solution for those yet [17:46]
astridMartinThe: link to this software? [17:46]
MartinThehttps://github.com/odie5533/WarcMITMProxy
astrid, ^^ was linked to on the a-t.org wiki
[17:47]
joepie91_MartinThe: afaik, your options are indeed either a warc proxy of some sort, or using a decompiler/converter that can take apart the SWFs and scripting your way around it
former being theoretically easiest
[17:47]
MartinThejoepie91_, Augh, decompiling is something I'd rather not do. WARC proxy looks like the best option [17:47]
hook54321Would webrecorder work?
https://webrecorder.io/
[17:47]
MartinThehook54321, Not sure, the new downloads are triggered from the running SWF
hook54321, I presume webrecorder is basically a wget-type deal?
[17:48]
MrRadarHave you tried warcprox? https://github.com/internetarchive/warcprox [17:48]
godanejoepie91: i'm going to be lazy and give it to archivebot [17:49]
MartinTheMrRadar, Looks cool, will check it out in a minute. Thanks a lot! [17:49]
hook54321MartinThe: You enter a starting URL and then you browse stuff manually and it puts it all into a WARC [17:49]
joepie91_godane: heh. just figured you might be interested in it given that you seem to do a lot of podcast/radio stuff :) [17:51]
MartinTheOh heck, not just SWF. This thing's doing XML requests too. Whoa. Yup, WARC looks like the only way. MrRadar: Warcprox works fine [17:57]
MrRadarGlad I could help [18:01]
***cf has quit IRC (Read error: Operation timed out)
cf has joined #archiveteam-bs
[18:02]
............ (idle for 58mn)
hook54321arkiver: Did the imgh.us person reply? Also, did you contact them through the email address listed in whois or through the form on their site? [19:01]
***MartinThe has quit IRC (Remote host closed the connection)
Sanqui has quit IRC (Ping timeout: 260 seconds)
[19:06]
Sanqui has joined #archiveteam-bs [19:15]
............... (idle for 1h11mn)
Aranje has joined #archiveteam-bs [20:26]
SketchCowAnyone in here comfortable parsing XML? [20:27]
namibj1SketchCow: what for?
I would get my skills in perl6 honed a bit, if the task seems like I could handle it. I think the motivation is enough to make me do it. Would take like at least 12h though...
[20:39]
SketchCowI'm going to do it a stupid way
Hold my avocado
[20:41]
namibj1Ha
Ok, just thought you might have some time to get it done.
[20:41]
JAAParse it with regex! :-) [20:45]
***Aranje has quit IRC (Quit: Three sheets to the wind) [20:46]
namibj1I think he does, i guess that is the only stupid way. [20:46]
***sun_shine has joined #archiveteam-bs [20:58]
sun_shineI have a question about the wayback machine I'm not sure where else to pose [20:58]
atluxitysun_shine: shoot [20:59]
sun_shineAn historically important website I need for research purposes has been maliciously excluded
the domain is now owned by spammers who aren't interested in selling it. I'm not sure that the creators of the site can be contacted
is there anything I can do?
[20:59]
.... (idle for 15mn)
schbiridnope [21:14]
***schbirid has quit IRC (Quit: Leaving) [21:14]
JAAJAA is now listening to: Metallica - Sad But True [21:15]
..... (idle for 23mn)
hook54321sun_shine: Did you check if the creators of the site had an email listed in the whois for the domain? [21:38]
sun_shinethis was back in 2009. is there anywhere i can look up historical whois stuff like that? [21:39]
hook54321what's the site? [21:39]
sun_shineisaccorp.org [21:40]
hook54321Seems to work fine for me. https://web.archive.org/web/*/http://isaccorp.com/ [21:41]
sun_shinethe site was at isaccorp.com until 2005, when it moved to isaccorp.org [21:41]
hook54321oh [21:41]
sun_shineThe site had enemies. I can't say for certain that the original owners weren't the ones who asked for it to be excluded, but it would be out of character.
And it seems like it was manually excluded rather than by robots.txt
[21:42]
hook54321There's a mirror of the wayback machine, it isn't up to date though. http://web.archive.bibalex.org/web/*/http://isaccorp.org
I'm gonna try to find a way to contact the previous owners.
[21:45]
sun_shinewait, so does this show the captures that exist but currently aren't available? [21:47]
hook54321Right now, someone in Ukraine named Andrey Ahiezer owns the domain. [21:47]
sun_shineHave you ever heard of a site being unexcluded? I know if the issue is robots.txt, then whoever controls the domain effectively controls its past availability as well [21:47]
hook54321It shows the captures that existed at the time they mirrored the wayback machine. [21:48]
sun_shineBut since it was manually excluded I'm not sure that someone could override that even if, say, the present owner decided to [21:48]
hook54321Could someone from Ukraine or someone named Andrey Ahiezer have been an enemy of the site? [21:49]
sun_shineReally unlikely.
If the archive cuts off at 2007, though, that seems to suggest when the request for removal was sent
[21:49]
hook54321Or when they last updated the mirror
When a site is excluded manually they will still crawl it
[21:50]
sun_shineoh, nevermind they last updated the mirror in 2007
http://web.archive.bibalex.org/web/*/http://example.org
[21:51]
hook54321sun_shine: domaintools has whois history, it's not free though. https://whois.domaintools.com/isaccorp.org [22:02]
sun_shineyou know, I have very rarely encountered 'domain excluded' errors when using wayback
and I'm a really heavy user
I just checked on two other defunct advocacy websites in the same area. Both excluded - and I know that the first one was purchased by the corporation it published exposes on after the owner died.
I think they bought the domains after the expired, had them excluded, and then dumped them
[22:03]
hook54321What are the other two domains?
and the corporation
[22:06]
***drumstick has joined #archiveteam-bs
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[22:07]
sun_shineintrepidnetreporter.com and caica.org . The corporation that bought intrepidnetreporter is called WWASP and has a documented history of suing online critics. All three of these websites reported critically on them. https://en.wikipedia.org/wiki/World_Wide_Association_of_Specialty_Programs_and_Schools
I think I'm just going to write info@archive.org and ask nicely. I'm not sure there's any other option.
[22:14]
astridprobably yeah [22:15]
***namibj_ has quit IRC (Ping timeout: 260 seconds) [22:21]
namibj_ has joined #archiveteam-bs
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
Soni has quit IRC (Read error: Operation timed out)
Soni has joined #archiveteam-bs
[22:33]
......... (idle for 43mn)
Asparagir has joined #archiveteam-bs [23:19]
..... (idle for 22mn)
Honno has quit IRC (Read error: Operation timed out) [23:41]
.... (idle for 15mn)
slackpi has quit IRC (Read error: Connection reset by peer) [23:56]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)