[00:07] *** REiN^ has quit IRC (Read error: Operation timed out) [00:13] *** REiN^ has joined #archiveteam-bs [00:30] Total size from those is 85.6 GiB, by the way. [00:47] *** jacketcha has quit IRC (Read error: Connection reset by peer) [01:01] SketchCow: i'm now down to 3 tapes i need to digitize from your box [01:02] this one tape is going to have a file called random-mtv-the-prisoner-incomplete-fall-1991 or something [01:02] cause it goes very random with no complete blocks [02:24] *** Soni has quit IRC (Ping timeout: 264 seconds) [02:28] *** Atom-- has joined #archiveteam-bs [02:33] *** Soni has joined #archiveteam-bs [03:37] *** MrDignity has quit IRC (Read error: Operation timed out) [03:37] *** MrDignity has joined #archiveteam-bs [03:53] *** Valentine has quit IRC (Read error: Operation timed out) [03:58] *** Valentine has joined #archiveteam-bs [04:18] *** BlueMaxim has quit IRC (Leaving) [04:44] *** qw3rty119 has joined #archiveteam-bs [04:48] *** qw3rty118 has quit IRC (Read error: Operation timed out) [05:33] *** BlueMaxim has joined #archiveteam-bs [05:38] JAA: ok, I'll ask some people in a Catalan discord server if there's any reason to keep grabbing them, I haven't really been following up on what's going on with Catalonia and Spain, lol. [05:41] *** Jonimus has quit IRC (Read error: Operation timed out) [05:42] *** dashcloud has quit IRC (No Ping reply in 180 seconds.) [05:43] *** Jonimus has joined #archiveteam-bs [05:43] *** swebb sets mode: +o Jonimus [05:44] !ig 10sb194o1dqht3i8nmqxeeg6g ^https?://www\.oswegofirst\.org/ [05:44] *** dashcloud has joined #archiveteam-bs [05:45] *** BlueMaxim has quit IRC (Read error: Operation timed out) [05:46] *** BlueMaxim has joined #archiveteam-bs [05:51] SketchCow: so i done 37,055 items this month [06:31] YES [06:52] i have very slow with my uploads over the past year [06:53] i'm trying to get tons dtic pdfs uploaded this year [06:53] so i can be done [08:31] *** Mateon1 has quit IRC (Read error: Operation timed out) [08:32] *** Mateon1 has joined #archiveteam-bs [08:34] SketchCow: btw the box you mailed me had audio tapes for some reason [09:07] *** REiN^ has quit IRC (no.money.no.love) [09:08] *** REiN^ has joined #archiveteam-bs [09:11] *** BlueMaxim has quit IRC (Ping timeout: 252 seconds) [09:12] *** BlueMaxim has joined #archiveteam-bs [09:36] *** REiN^ has quit IRC (no.money.no.love) [09:50] *** REiN^ has joined #archiveteam-bs [09:57] *** schbirid has joined #archiveteam-bs [11:22] should we make some strava map tile scraping a warrior project? i have no idea how to do that but the tiles are super easy to get and they throw proper errors if you get banned (seems to be based on volume or patterns, not sure) [11:23] up to zoomlevel 11 (including) was easy to do on a single connection [11:23] but 12 is just too huge [11:27] there are (2**12)**2 -> 16777216 tiles at that level (total filesize would be around 60G uncompressed) [11:27] getting 4/5 per second has not banned me so far but eh [11:34] *** ranav has quit IRC (Read error: Operation timed out) [11:55] *** ranavalon has joined #archiveteam-bs [12:03] *** pizzaiolo has joined #archiveteam-bs [12:04] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [12:28] Wouldn't be too hard to do I guess, if you want to do it [13:52] *** SilSte has joined #archiveteam-bs [14:04] schbirid: sounds possible, take to #warrior and ask if anyone can pick it up? [14:04] I'd try, but I have no clue xD [14:04] or if you have lists, just get a few people to pick up a range of tiles each and you should get it done pretty quick [14:29] *** antomatic has quit IRC (Read error: Connection reset by peer) [14:30] *** antomatic has joined #archiveteam-bs [14:30] *** swebb sets mode: +o antomatic [15:08] *** zhongfu has joined #archiveteam-bs [15:25] http://wilmunder.com/Arics_World/Games.html [15:33] *** zhongfu has quit IRC (Remote host closed the connection) [15:40] *** zhongfu has joined #archiveteam-bs [15:41] *** MrDignity has quit IRC (Remote host closed the connection) [15:50] *** zhongfu has quit IRC (Remote host closed the connection) [15:52] *** zhongfu has joined #archiveteam-bs [15:56] *** zhongfu has quit IRC (Remote host closed the connection) [15:56] *** zhongfu has joined #archiveteam-bs [16:01] SketchCow: i'm digitizing the last tape from that small box [16:16] Thanks. So we should arrange the transfer back along with a note in the box saying "done", and then Mank can send you the next batch. [16:17] *** zhongfu has quit IRC (Remote host closed the connection) [16:52] *** zhongfu has joined #archiveteam-bs [17:04] *** pizzaiolo has quit IRC (Read error: Operation timed out) [17:08] *** pizzaiolo has joined #archiveteam-bs [17:11] *** pizzaiolo has quit IRC (Client Quit) [17:12] *** pizzaiolo has joined #archiveteam-bs [17:15] *** Jusque_ has joined #archiveteam-bs [17:16] *** Jusque has quit IRC (Read error: Operation timed out) [17:16] *** Jusque_ is now known as Jusque [17:49] *** MrDignity has joined #archiveteam-bs [17:49] *** MrDignity has quit IRC (Remote host closed the connection) [17:51] *** MrDignity has joined #archiveteam-bs [17:51] *** MrDignity has quit IRC (Read error: Connection reset by peer) [17:52] *** MrDignity has joined #archiveteam-bs [18:22] if you want to help me archive the strava heatmap tiles partially: https://justpaste.it/1gir1 [18:22] single long-running wgets [18:23] i am doing aa and ab [18:29] schbirid: How long does one list take approximately? [18:34] That strava data is super doomed, so yeah [18:36] JAA: i get ~4 tiles per second so ~3 days [18:37] i am also doing ac [18:38] oh and ad. i have more servers than i remembered =) [18:43] *** pizzaiolo has quit IRC (Remote host closed the connection) [18:46] *** pizzaiolo has joined #archiveteam-bs [18:48] *** pizzaiolo has quit IRC (Client Quit) [18:50] *** pizzaiolo has joined #archiveteam-bs [18:52] *** arbin has joined #archiveteam-bs [18:53] *** dino has joined #archiveteam-bs [18:53] :) [18:53] * astrid waves [18:53] Hi [18:53] bs for everyone [18:54] schbirid, arbin: Interesting, I get the same error with wget, but it works with curl. [18:54] hm, just do --no-check-certificate then. [18:54] maybe wget has a baked in crappy version of openssl [18:54] JAA: what was ur full cmd [18:54] I wouldn't be surprised. [18:54] curl URL > FILE [18:54] Nothing special [18:55] dino: do you have an example url to one of these paranormios? [18:55] i think schbirid intends the leecher to unpack the gz, and iterate over the list of links in it [18:55] oops, time for a meeting [18:55] so just doing curl URL prob wont work [18:55] Maybe to say it again: I'm a photographer and have no idea how the archive process looks like. You're the pros here. I'm just wondering if it possible to create a browsable version with the already saved images [18:55] arbin: You need to download the list, gunzip it, and then run the wget command which reads that URL list. [18:56] *** pizzaiolo has quit IRC (pizzaiolo) [18:56] schbirid: u intend to download the gz list through a normal browser? [18:56] i dont care [18:56] those are just url lists [18:56] you unpack them, then point wget to them [18:58] ehm guys I don't know what your talking about ;) Keeping the question simple: Is there a possiblity to use/actually view and organize the archived files here: [18:58] https://archive.org/details/archiveteam_panoramio?&sort=-downloads&page=2 [18:58] There's another discussion going on about something completely unrelated. [18:58] schbirid: [18:58] A browsable archive of Panoramio would be lovely. I have no idea how that site worked or what the archives look like though. [18:58] E:\schbird>wget -nc -nv --no-host-directories -x -i strava_12_shuffled_ae [18:58] wget: memory exhausted [18:59] arbin: sorry, then this is not a project for you :) [18:59] interesting that's an issue with 16GB RAM [18:59] *** pizzaiolo has joined #archiveteam-bs [18:59] I mean I love to see the images from Panoramio archived but they aren't really useful when you can't actually look at them [18:59] no issues with anything on a 768MB vps myself [18:59] bbl [19:00] schbirid: why do you have to use wget? plenty of other ways to leech that much [19:00] arbin: I don't think he cares. He just wants the data somehow, and wget is a simple way of achieving that. [19:01] Uh, why are many of those Panoramio WARCs empty?? [19:02] Examples: https://archive.org/download/archiveteam_panoramio_20161105184626.upload https://archive.org/download/archiveteam_panoramio_20161106063218.upload https://archive.org/download/archiveteam_panoramio_20161027123253.upload https://archive.org/download/archiveteam_panoramio_20161105012626.upload [19:02] None of these contain any data. [19:03] Ah, further down the page. [19:03] Guys I'm not into the archiving process in any way. (and I don't understand half of what you're talking about) I just would love to know what's the idea of this "Panoramio Archive", If nobody can actually see the images? https://archive.org/details/archiveteam_panoramio?&sort=-downloads&page=2 [19:03] https://archive.org/download/archiveteam_panoramio_20170220050222 is the last file in the collection with data in it, it seems. [19:05] *** dino has quit IRC (Quit: Page closed) [19:05] dino: You can look at the images, it's just really annoying to do so. For example, https://web.archive.org/web/20161013025252im_/http://mw2.google.com/mw-panoramio/photos/medium/11314446.jpg [19:05] Welp [19:06] *** dino has joined #archiveteam-bs [19:06] dino: You can look at the images, it's just really annoying to do so. For example, https://web.archive.org/web/20161013025252im_/http://mw2.google.com/mw-panoramio/photos/medium/11314446.jpg [19:06] ah okay [19:07] Also stuff like this: https://web.archive.org/web/20161027205516/http://www.panoramio.com/userfeed/1412955 [19:07] I'm just discovering this through the files in the collection. I have no idea what the strategy was when it was grabbed (I wasn't here yet when that happened). [19:08] ah alright - thanks - so I need to try to connect to the uploader. [19:09] Maybe there is a way to make the library browsable again [19:09] It appears that photo pages and user profiles were grabbed. [19:09] https://web.archive.org/web/20161028204107/http://www.panoramio.com/photo/106502705 [19:09] https://web.archive.org/web/20161028204109/http://www.panoramio.com/user/1412955?with_photo_id=106502705 [19:10] But the content isn't easily discoverable because there is no search interface etc. [19:10] Which is frequently the case, because that isn't really archiveable. [19:11] Thanks for the info! So do you think there's no way to get the images back on a map again? [19:12] Not without a lot of effort. [19:13] Basically, you'd have to download the archives, extract the coordinates for each image, and build a database of the (image URL, coordinates) pairs. [19:14] I guess the location data is in the .exif data of the images [19:14] let me try to check that [19:15] schbirid: ok, i'll grab ae [19:15] Doesn't look like it, at least not always. [19:15] *** dino has quit IRC (Quit: Page closed) [19:17] *** dino has joined #archiveteam-bs [19:17] The raw image of https://web.archive.org/web/20161029235055/http://www.panoramio.com/photo/91390820 is https://web.archive.org/web/20161029235228im_/http://static.panoramio.com/photos/original/91390820.jpg and contains a ton of EXIF data, but no coordinates as far as I can see. [19:17] So it'd have to be extracted from the web page, I guess. [19:17] *** pizzaiolo has quit IRC (pizzaiolo) [19:18] yea I can't see any either [19:19] Makes sense actually, since it looks like the location of a photo could be changed after upload at Panoramio ("Misplaced? Suggest new location"). [19:19] oh you're right [19:20] crap, I'll contact Jason Scot (the creator of this archive) and ask him if he sees any option to show the images on a map [19:20] Jason is here. [19:20] SketchCow: ^ [19:20] I kind of doubt he knows more about it though. [19:20] *** pizzaiolo has joined #archiveteam-bs [19:21] Note also that at least in that example, the EXIF data was overwritten when the image was edited. It doesn't contain any information about the camera, exposure, etc., but the web page does mention those details (in the lower right corner). [19:22] yea the data is definitely on the website somewhere and not in the .exif of the .fpg [19:22] Yep [19:27] arbin: thx! [19:32] you say these are approx 5gb uncrompressed? [19:36] the set of tiles per url list will be about that [19:37] *** pizzaiolo has quit IRC (Remote host closed the connection) [19:39] neato, i can download 2 sets at once [19:39] stupidly shared university community wifi ftw [19:39] schbirid: grabbing aq as well [19:39] cheers! [19:40] *** pizzaiolo has joined #archiveteam-bs [19:45] *** dino has quit IRC (Quit: Page closed) [20:01] *** pizzaiolo has quit IRC (Remote host closed the connection) [20:03] *** pizzaiolo has joined #archiveteam-bs [20:54] *** MrDignity has quit IRC (Remote host closed the connection) [20:55] *** MrDignity has joined #archiveteam-bs [21:06] *** pizzaiolo has quit IRC (Ping timeout: 247 seconds) [21:14] *** pizzaiolo has joined #archiveteam-bs [21:14] *** jschwart has joined #archiveteam-bs [21:15] For the people who cared about this, there's a show tonight about that 1-tb drive offline internet in Cuba, with some of the artists who help maintain it, in NYC, and I'll be there. [21:16] I was watching that 34C3 talk right now [21:16] JAA is quite right. I don't know about it. The project needs a caring, engaged person to go through the reams of data, derive value, and determine how to re-present it as best as possible. [21:16] a 34C3 talk about that* [21:16] WubTheCap: That talk is not incredibly great. [21:17] It does have a few drabs of trivia about Euro BBSes I wasn't aware of, of course. [21:29] *** octothorp has quit IRC (Remote host closed the connection) [21:35] *** Dimtree has quit IRC (Read error: Operation timed out) [21:43] *** octothorp has joined #archiveteam-bs [21:46] *** Dimtree has joined #archiveteam-bs [21:50] *** pizzaiolo has quit IRC (pizzaiolo) [21:51] SketchCow: last of yours tape is digitized now [22:04] *** schbirid has quit IRC (Quit: Leaving) [22:17] *** Dimtree has quit IRC (Read error: Connection reset by peer) [22:51] *** jschwart has quit IRC (Quit: Konversation terminated!) [22:57] *** Dimtree has joined #archiveteam-bs [23:06] SketchCow: so i got about 95gb of your tapes to upload [23:07] i also got another 30gb from the set of boxes that was odd parts of tapes [23:27] *** Dimtree has quit IRC (Read error: Operation timed out) [23:27] *** MrDignity has quit IRC (Read error: Connection reset by peer) [23:33] *** Dimtree has joined #archiveteam-bs [23:35] *** Dimtree has quit IRC (Client Quit) [23:49] *** Dimtree has joined #archiveteam-bs