#archiveteam-bs 2017-05-25,Thu

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
godaneso this month is sucking for me: https://archive.org/details/@chris85?and[]=addeddate:2017-05
to be fair i'm ripping old original broadcasts from vhs
[00:12]
so this person tons of old McDonalds stuff: http://myworld.ebay.com/uptowngirljeni/
including sets of old happy meal boxs
[00:26]
........... (idle for 52mn)
***Sk1d has quit IRC (Ping timeout: 250 seconds)
j08nY has quit IRC (Quit: Leaving)
[01:18]
Sk1d has joined #archiveteam-bs [01:25]
username1 has joined #archiveteam-bs
schbirid2 has quit IRC (Read error: Operation timed out)
[01:32]
............. (idle for 1h2mn)
ndiddy has quit IRC () [02:37]
..... (idle for 20mn)
nyany has quit IRC (Leaving) [02:57]
.............. (idle for 1h8mn)
godanehttps://www.imzy.com/imzy/post/imzy_is_shutting_down
we have about 1 month to get it
[04:05]
***nyany has joined #archiveteam-bs [04:08]
........... (idle for 54mn)
Lord_Nighgodane: greenie in here is imzy staff [05:02]
***DopefishJ has joined #archiveteam-bs
swebb sets mode: +o DopefishJ
DFJustin has quit IRC (Ping timeout: 260 seconds)
[05:14]
.................. (idle for 1h26mn)
wabu_ has quit IRC (Read error: Operation timed out) [06:42]
wabu has joined #archiveteam-bs [06:52]
w0rp has quit IRC (Read error: Operation timed out)
w0rp has joined #archiveteam-bs
[06:58]
................ (idle for 1h16mn)
icedice has joined #archiveteam-bs [08:15]
.... (idle for 16mn)
user_ has quit IRC (Read error: Operation timed out)
bwn has quit IRC (Ping timeout: 260 seconds)
[08:31]
..... (idle for 20mn)
bwn has joined #archiveteam-bs [08:54]
.... (idle for 16mn)
j08nY has joined #archiveteam-bs
GE has joined #archiveteam-bs
[09:10]
............... (idle for 1h10mn)
j08nY has quit IRC (Read error: Operation timed out) [10:24]
davidar has joined #archiveteam-bs
GE has quit IRC (Remote host closed the connection)
[10:31]
........... (idle for 51mn)
BlueMaxim has quit IRC (Quit: Leaving) [11:25]
....... (idle for 34mn)
JAASpeed: 0.9 B/s \m/ [11:59]
***pizzaiolo has joined #archiveteam-bs [12:04]
.... (idle for 18mn)
GE has joined #archiveteam-bs [12:22]
........ (idle for 36mn)
vitzli has joined #archiveteam-bs
icedice2 has joined #archiveteam-bs
icedice2 has quit IRC (Client Quit)
GE is now known as SHODAN_UI
j08nY has joined #archiveteam-bs
[12:58]
.... (idle for 19mn)
pizzaiolo has quit IRC (Quit: pizzaiolo) [13:24]
vitzli_ has joined #archiveteam-bs
vitzli has quit IRC (Ping timeout: 250 seconds)
vitzli_ has quit IRC (Quit: Leaving)
[13:33]
.......... (idle for 45mn)
jtn2 has quit IRC (Ping timeout: 250 seconds)
godane has quit IRC (Quit: Leaving.)
schbirid2 has joined #archiveteam-bs
jtn2 has joined #archiveteam-bs
username1 has quit IRC (Read error: Operation timed out)
[14:23]
........... (idle for 52mn)
icedice has quit IRC (Quit: Leaving) [15:23]
Sanqui has quit IRC (Remote host closed the connection) [15:31]
Sanqui has joined #archiveteam-bs [15:37]
.......... (idle for 49mn)
DopefishJ is now known as DFJustin
schbirid2 has quit IRC (Quit: Leaving)
godane has joined #archiveteam-bs
[16:26]
.............. (idle for 1h8mn)
j08nY has quit IRC (Remote host closed the connection)
j08nY has joined #archiveteam-bs
wolfpld has quit IRC (Quit: WeeChat 1.6)
[17:36]
SHODAN_UI has quit IRC (Quit: zzz) [17:51]
........ (idle for 35mn)
namespace has joined #archiveteam-bs [18:26]
.... (idle for 18mn)
SHODAN_UI has joined #archiveteam-bs [18:44]
..... (idle for 21mn)
ndiddy has joined #archiveteam-bs [19:05]
.............. (idle for 1h8mn)
wacky_ has joined #archiveteam-bs [20:13]
........ (idle for 35mn)
greenieyeah godane i think my backend engineer is emailing with alembic, we're not sure what format y'all want the dump of url slugs in
but we're gonna try to get that to yall. if folks have questions, i can try to help relay. we have some individual users asking for various data dumps but right now we're focusing on getting users their own data, and secondly working with archivers who are interested or need info from us
[20:48]
............ (idle for 56mn)
arkiverdo we need a warrior project for imzy?
alembic ^
greenie with url slugs you mean a list of URLs?
scripts will be at https://github.com/ArchiveTeam/imzy-grab when created
PurpleSym: do you have the dump you made of imzy somewhere?
let's create a channel for this also
looking forward to this, social media sites are always fun
[21:46]
greeniearkiver: the email we got says "Because the posts are routed via slugs and not incremental IDs, it is currently very difficult to crawl and archive imsy. A list of these slugs would solve this issue." [21:55]
arkiverI see
I'm not sure what information PurpleSym has exactly
but a list of groups like https://www.imzy.com/worldbuilding and users like https://www.imzy.com/@blues_sevenfold would already help a lot
[21:55]
greeniejust in like a txt file or something? [21:56]
timmcarkiver: But a list of posts would be import too, right?
*important
[21:57]
arkiverI think a txt file would be fine
json too
timmc: yes
but else we'll go over the groups and users an get all the posts and comments they have made
however, very large groups with tens or hundreds of thousands of posts would be problematical
I haven't had a very good look yet at the website, so I might be missing stuff
[21:57]
timmcThe one thing I can think of that might not be easy to pick up would be posts that had a chat room instead of a comment section.
No idea how common those were, I never participated in one myself.
[22:01]
greeniewe dont have many of those, comparatively
it was a feature we were planning on killing and totally reworking in a different/better way, because it functioned poorly and wasn't used for the most part. While I'm not an archivist, I don't think theres important stuff in them, tbh
[22:02]
***SHODAN_UI has quit IRC (Remote host closed the connection) [22:03]
greenieokay so what we'll try to get you guys is three files, basically. 1. community URLs, 2. profile URLs, 3. post URLs. we'll only send you the URLs for communities/profiles that actually have content [22:04]
arkiverWell, the more we can of the site the better I think
If there's any other special section on the site other than communities and profiles, please also try to get the URLs for those
or the names/IDs of them in the URLs
timmc: do you have an example of a chat room?
[22:04]
greeniecool. weffey (our backend engineer) is gonna be afk over the weekend, but will try to get yall a "how does this look?" set of files on tuesday-ish [22:05]
arkiverthanks!
I think I'll have the script ready by then
[22:05]
greenierad. weffey can be added to the repo if yall want, weffey@gmail.com and can try to help
i dont really know yalls style, so im sorta just functioning as a middleperson here
https://github.com/weffey
we have some weird ajax paging stuff, that wouldnt just be a simple wget
timmc has maybe figured some of this out already, im not sure
[22:07]
arkiverwe have custom scripts
for example
for flickr: https://github.com/ArchiveTeam/flickr-grab/
where the real custom stuff is done in https://github.com/ArchiveTeam/flickr-grab/blob/master/flickr.lua
[22:09]
greenie(afk, heading home, will check it out in a bit) [22:10]
arkiverthanks [22:10]
timmcarkiver: Each community also has an /about, /rules, and /leaders page. Example: https://www.imzy.com/boston/rules
There are also some static pages linked from the footer (or the bottom of the sidebar on infinite-scroll pages).
[22:17]
arkiverthank you
will have a look at those
I see the IP is saved on the page
[22:21]
timmcIn the footer? [22:23]
arkiverin the script imzy-state
remoteIp in window.__IMZY__.sessionStorageCache
[22:24]
timmcYeah, I noticed that before, and I *think* it's only the viewer's IP... but on some static pages, it might leak a previous viewer's IP. [22:25]
arkiveruh yeah
so in the footer
[22:25]
timmcgreenie: ^ I had mentioned this before, not sure if it was investigated. [22:25]
arkiverso this will be a warning for people running the project [22:26]
timmchmm, yeah
It also saves a session ID that I haven't been able to connect to any other session identifiers.
[22:26]
arkiveryeah, but the sessions ID is not directly identifiable as coming from someone by non-imzy staff
Most forums we save have session IDs stored
[22:27]
.... (idle for 16mn)
greenieI'll run it past weff, timmc. I don't want to speak in absolutes about that particular sort of thing without being quite confident [22:43]
arkiveryes of course [22:43]
***Stiletto has quit IRC (Ping timeout: 246 seconds) [22:45]
timmc(I *think* I'd reported this, at least!) [22:45]
greenieyeah it sounds familiar but rather than me digging through my github stuff i'm just gonna be lazy and double check with weff [22:46]
timmcgreenie: You can probably search issues for "remoteIp" or "less-trafficked" if there's any chance you copied my email into github. [22:51]
***Stilett0 has joined #archiveteam-bs [23:00]
..... (idle for 24mn)
BlueMaxim has joined #archiveteam-bs [23:24]
.... (idle for 18mn)
alembicarkiver unsure if it needs a warrior, but we won't need to do link discovery [23:42]
arkiverI think we'll do a warrior project [23:42]
alembicImsy engineer I'm speaking to would like to know what format is best
for the list of urls
I'm guessing just a \n delimited list?
[23:42]
arkiveryes
we also talked with greenie about it a bit
see log
[23:46]
alembicokidoki [23:46]
greenieyeah ive been sorta relaying, sorry for the multiple means of communication [23:47]
alembicnp, I've held off emailing Lesley at Imsy because I've figured as much
greenie if it's ok with you guys, we'll just to stick to IRC going forward?
[23:47]
greeniei should clarify, lesley = weffey that ive been referring to. The server I host my IRC client on is old and fussy and tends to kill my ssh connection, but yeah I can manage IRC if thats easier for you. However, lesley/weffey is the person who will actually be doing the technical end of things on our side, is our lead backend dev. So may end up being a bit of a game of telephone, unless they hop
on IRC, which I'm not sure they do anymore
[23:52]
alembicahaha ok, we'll stick to our motley hybrid of IRC/email then :P [23:54]
greeniehehe okay [23:54]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)